Forem: Bhavesh Kukreja

A Deep Dive Into Python Dictionaries

Bhavesh Kukreja — Fri, 03 Oct 2025 15:32:57 +0000

my_dict = {'name': 'Guido', 'year': 1991}
print(my_dict['name'])

What you're seeing is my personal favorite and one of the most fascinating data structure in Python. It’s fast, flexible, and probably the most important data structure in Python. Dictionaries are everywhere. We use them for everything, from simple data storage to making your leetcode submission faster.

But ever wondered how they do it? How can my_dict['name'] be almost instantaneous, even if the dictionary has millions of keys? How does it remember the order you added items in? What happens when two different keys try to claim the same spot? That's what we're going to explore today!

Just like with lists, the Python we run is an executable program called CPython, written in C. Every time you create a dictionary or access a key, you're telling CPython to run specific C functions. To find the ground truth, we'll be looking at the dictobject.c file from CPython 3.11.

Note: A lot of the code has several layers to it, I will simplify and abstract away the functions to explain them, without spending too much time on the details of actual implementation to avoid information overload.
Feel free to check out the source code through the hyperlinks!

Hashing and Objects

Before we dive into the dictionary struct itself, we need to understand its principle: the hash table.

A hash table is like a filing cabinet. Instead of searching through every file one by one, you use a special function, a hash function that takes your key (like the word 'name') and instantly tells you which drawer it's in. This is why lookups are so fast. The main challenge for any hash table is what to do when the hash function tells two different keys to go into the same drawer. This is called a collision, and how a dictionary handles collisions is essential to its performance.

Of course, a dictionary is still a Python object. Just like lists, every object in Python starts with the same C struct, the PyObject.

typedef struct _object {
    Py_ssize_t ob_refcnt; // The reference counter for memory management
    PyTypeObject *ob_type; // The pointer that says "I am a dictionary"
} PyObject;

The Data Structure: PyDictObject

Now, what's fascinating about Python dictionaries (since Python 3.6) is that they separate the hash table from the data storage. This separation is what allows them to be fast, ordered, and memory-efficient all at once.

The structure is split into two main parts: the PyDictObject and the PyDictKeysObject.

// The dictionary object
typedef struct {
    PyObject_HEAD
    Py_ssize_t ma_used;        // Number of items in dictionary
    PyDictKeysObject *ma_keys; // Pointer to the main thing
    PyObject **ma_values;      // Usually NULL for a special case explained below
} PyDictObject;

The PyDictObject itself is just a small header. The magic is in the PyDictKeysObject it points to. But before we get to that, let's talk about ma_values. Why is it usually NULL? This points to an optimization for class instances, which the source code calls a "split table".

/*
The DictObject can be in one of two forms.

Either:
    A combined table:
        ma_values == NULL, dk_refcnt == 1.
        Values are stored in the me_value field of the PyDictKeysObject.
Or:
    A split table:
        ma_values != NULL, dk_refcnt >= 1
        Values are stored in the ma_values array.
        Only string (unicode) keys are allowed.
*/

This means most dictionaries use a "combined table," where values are stored right alongside their keys. For memory efficiency, however, many objects of the same class can use a "split table". Think that you have a student class with name and grade attributes, and you have 1000 students. Python knows the keys are gonna be the same, so why bother giving extra memory for it?

They all share one common structure for their keys (like .name and .grade), while each instance stores its unique values in a separate, tiny array. This approach saves a huge amount of memory by not duplicating the key names for every single object.

Now, let's look at the code that drives both types of dictionaries.

// The struct that holds all the important stuff
typedef struct {
    // ... metadata like refcount and size ...
    int8_t *dk_indices;  // The actual hash table!
    PyDictKeyEntry *dk_entries; // The array where data is stored
} PyDictKeysObject;

Attribute Breakdown

dk_indices

What it is: This is the real hash table. But it doesn't store your keys or values. It stores integers.

What it does: Each integer is an index that tells you where to find the actual data in the other array, dk_entries. Think of it as a card catalog in a library, it doesn't hold the book, it just tells you which shelf the book is on.

Why it matters: Keeping the hash table compact with just integers (which can be small, like 1 or 2 bytes each) saves a ton of memory and makes lookups very fast.
dk_entries

What it is: This is a simple array that stores the actual key, value, and the key's hash value together in a struct.

What it does: When you insert a new item into the dictionary, it's simply appended to the end of this array.

Why it matters: This is how dictionaries preserve insertion order. By always adding new items to the end of this list, the dictionary naturally remembers the order they were added in. Iterating over the dictionary is as simple as iterating through this dk_entries array from start to finish.

It can be understood that dk_indices is optimized for fast, unordered lookups, while dk_entries is optimized for ordered storage.

The Dictionary in Action

Now that we understand the two-part data structure, let's connect it to the Python dictionary methods we use every day. We’re going to see how CPython manipulates the dk_indices and dk_entries arrays to do its work.

Creation of a Dictionary - PyDict_New

This is the code you write to create a dictionary, very simple right? It's underlying code is also like it.

d = {}

Above code triggers a call to PyDict_New

// Public function to create a new dictionary.
PyObject *
PyDict_New(void)
{
    // Grab a reference to the global, shared, empty keys object.
    dictkeys_incref(Py_EMPTY_KEYS);
    // Create the dict object, pointing it to the empty keys.
    return new_dict(Py_EMPTY_KEYS, NULL, 0, 0);
}

Snippet Breakdown

Py_EMPTY_KEYS: Instead of allocating a new PyDictKeysObject (with its indices and entries arrays) every time you create an empty dict, CPython maintains a single, global, immutable empty keys object.
new_dict(...): This function just allocates the shell PyDictObject and sets its ma_keys pointer to that shared Py_EMPTY_KEYS object. ma_used is set to 0.

The Takeaway

The creation of an empty dictionary is an incredibly cheap O(1) operation. Python bets that you'll create many empty dictionaries that might not even be used, so it just avoids allocating any real memory for the hash table or entries until you insert the very first item.

Accessing an Item - _Py_dict_lookup

To get a value from a dictionary, you use its key inside square brackets.

value = my_dict['name']

To access an item, _Py_dict_lookup is called.

Py_ssize_t
_Py_dict_lookup(PyDictObject *mp, PyObject *key, Py_hash_t hash, PyObject **value_addr)
{
    PyDictKeysObject *dk = mp->ma_keys;
    size_t mask = DK_MASK(dk);
    size_t i = (size_t)hash & mask; // The initial index calculation.
    size_t perturb = hash;

    // The main probing loop.
    for (;;) {
        Py_ssize_t ix = dictkeys_get_index(dk, i); // Read from dk_indices.

        // If the slot is empty, the key is not here.
        if (ix == DKIX_EMPTY) {
            *value_addr = NULL;
            return DKIX_EMPTY;
        }
        // If the slot is active (not empty or dummy)...
        if (ix >= 0) {
            PyDictKeyEntry *ep = &DK_ENTRIES(dk)[ix]; // Go to the entry.
            // Check if the keys are actually the same.
            if (ep->me_key == key || PyObject_RichCompareBool(ep->me_key, key, Py_EQ)) {
                *value_addr = ep->me_value; // Success! Return the value.
                return ix;
            }
        }
        // Collision! Calculate the next index to probe. 
        // The logic is explained in following paragraph. 
        perturb >>= PERTURB_SHIFT;
        i = (i * 5 + perturb + 1) & mask;
    }
}

The most interesting part of this function is the last two lines inside the loop. This is the collision resolution algorithm. A lot of times, hash calculation for your key points to a slot that's already occupied by a different key. We need to solve this collision problem!

The CPython developers left a fantastic, detailed comment explaining its design.

Source Code comments:

/* Major subtleties ahead:  Most hash schemes depend on having a "good" hash
   function, in the sense of simulating randomness.  Python doesn't:  its most
   important hash functions (for ints) are very regular...
   ...
   The other half of the strategy is to get the other bits of the hash code
   into play.  This is done by initializing a (unsigned) vrbl "perturb" to the
   full hash code, and changing the recurrence to:
    perturb >>= PERTURB_SHIFT;
   j = (5*j) + 1 + perturb;
   use j % 2**i as the next table index;
*/

What this means is that CPython is designed to be robust even with "bad" hash functions. The 5*j + 1 part ensures the probing jumps around the table in a scrambled, pseudo-random way that will eventually visit every slot. The perturb variable is the secret sauce, it's initialized with the full hash value, and with each collision, its higher-order bits are mixed into the calculation. This ensures that every bit of the hash value contributes to finding the next slot, effectively breaking up clusters and making the probe sequence unique for each hash value.

The Takeaway

Item access is O(1) on average because, most of the time, the first or second probe lands on the correct key. It's a direct calculation, not a search. The probing loop is a safety net that handles the inevitable cases where multiple keys hash to the same initial spot.

Inserting an Item - insertdict

Adding a new key-value pair or updating an existing one is a fundamental operation

my_dict['year'] = 1991

Now the insertdict function takes over. It's essentially a lookup followed by a placement.

static int
insertdict(PyDictObject *mp, PyObject *key, Py_hash_t hash, PyObject *value)
{
    PyObject *old_value;
    // First, do a lookup to see if the key already exists.
    Py_ssize_t ix = _Py_dict_lookup(mp, key, hash, &old_value);

    // If the key was not found...
    if (ix == DKIX_EMPTY) {
        // ...check if we need to make the table bigger.
        if (mp->ma_keys->dk_usable <= 0) {
            // This is the expensive O(n) call to resize the whole dict.
            if (insertion_resize(mp, 1) < 0)
                goto Fail;
        }

        // Find an empty or dummy slot in the hash table.
        Py_ssize_t hashpos = find_empty_slot(mp->ma_keys, hash);
        Py_ssize_t n = mp->ma_keys->dk_nentries;

        // Append the new data to the end of the entries array.
        PyDictKeyEntry *ep = &DK_ENTRIES(mp->ma_keys)[n];
        ep->me_key = key;
        ep->me_hash = hash;
        ep->me_value = value;

        // Store the new entry's index (n) in the hash table slot.
        dictkeys_set_index(mp->ma_keys, hashpos, n);

        // Update all the counters.
        mp->ma_used++;
        mp->ma_keys->dk_usable--;
        mp->ma_keys->dk_nentries++;
        return 0;
    }

    // If the key already existed, just update the value.
    if (old_value != value) {
        DK_ENTRIES(mp->ma_keys)[ix].me_value = value;
        Py_XDECREF(old_value);
    }
    return 0;
}

The Takeaway

Like lists, dictionaries have a fast path and a slow path for adding items. Most insertions are O(1) because there's already space. Occasionally, an expensive O(n) dictresize occurs where the entire table is rebuilt. The growth strategy ensures this happens infrequently, so the average or "amortized" cost of insertion remains O(1).

Deleting an Item - delitem_common

To remove an item from a dictionary you use del keyword.

del my_dict['name']     # You can also use my_dict.pop('name')
                        # pop will also return what was deleted

Deleting an item can't simply erase the entry, as that could break the collision search chain. The delitem_common function reveals the solution.

static int
delitem_common(PyDictObject *mp, Py_hash_t hash, Py_ssize_t ix, PyObject *old_value)
{
    PyObject *old_key;
    // Find the item's position in the hash table (dk_indices).
    Py_ssize_t hashpos = lookdict_index(mp->ma_keys, hash, ix);

    // This is the importnat, mark the hash table slot as a DUMMY.
    dictkeys_set_index(mp->ma_keys, hashpos, DKIX_DUMMY);

    // Get the entry from the data array (dk_entries).
    PyDictKeyEntry *ep = &DK_ENTRIES(mp->ma_keys)[ix];
    old_key = ep->me_key;

    // Null out the key and value in the data array.
    ep->me_key = NULL;
    ep->me_value = NULL;

    // Free the old key and value and update the used count.
    Py_DECREF(old_key);
    Py_DECREF(old_value);
    mp->ma_used--;
    return 0;
}

The Takeaway

Deletion is an O(1) operation on average. It's just a lookup and a couple of memory writes. It doesn't require shifting any other elements. The cost is the creation of DUMMY slots, which add a tiny bit of overhead to future lookups until a resize cleans them up.

Bonus Fact: Why d['missing'] Crashes but d.get('missing') Doesn't

You've probably noticed that accessing a missing key with square brackets raises a KeyError, while using the .get() method simply returns None.

This difference in behavior is a cool historical artifact. The .get() method is based on an older, internal C function called PyDict_GetItem, which was built for speed and simplicity in an era when exceptions were not expected for this operation. It was designed to just return NULL (which becomes None) on failure without making a fuss.

Source

/* Note that, for historical reasons, PyDict_GetItem() suppresses all errors
 * that may occur (originally dicts supported only string keys, and exceptions
 * weren't possible).  So, while the original intent was that a NULL return
 * meant the key wasn't present, in reality it can mean that, or that an error
 * (suppressed) occurred while computing the key's hash, or that some error
 * (suppressed) occurred when comparing keys in the dict's internal probe
 * sequence.  A nasty example of the latter is when a Python-coded comparison
 * function hits a stack-depth error, which can cause this to return NULL
 * even if the key is present.
 */

The square bracket access (d['key']), however, is handled by a more modern, user-facing C function. It uses the same fast lookup logic but adds an extra step: it explicitly checks for that NULL result and takes on the responsibility of raising the KeyError we're all familiar with.

The Final Takeaway: What a Dictionary Really Is

The Python dictionary is not just one simple hash table. It's a hybrid data structure. It combines an integer-only hash table for fast lookups (dk_indices) with a compact array for ordered data storage (dk_entries).

This design is its power. It gives us the O(1) average-case speed we expect from a hash table while also providing the guaranteed insertion order we've come to rely on since Python 3.7. It's always so fascinating to learn how things work under the hood!

Thank you for reading!

What Python Lists Really Are

Bhavesh Kukreja — Sat, 13 Sep 2025 04:30:00 +0000

1. How Do Lists Really Work?

myList = [1, 2, 3, 4]

for i in myList:
    print(i)

Pretty simple and intuitive right? All of us who have learned python have used lists at some point. Almost all tutorials, videos, boot camps and blogs talk about how to use them and what cool methods they provide. But have you ever wondered how do lists really work? How do they resize themselves automatically unlike the static arrays in C? What is Python hiding from us? How does it actually add or remove elements?

That's exactly what this article is about. We're going to answer all those questions by looking at the one place that holds the ground truth, the source code. Our mission is to go behind the curtain, peek at the C code that powers Python, and understand what a list really is. No magic, just plain logic.

So, why are we suddenly talking about C? When you type python3 in your terminal, you're running a program. That program, the most popular implementation of the Python language, is called CPython, and it's written almost entirely in C.

Think of it as the software that actually runs your Python script. Every time you create a list or call .append(), you're telling CPython to run a specific C function under the hood. To truly understand how lists work, we have to look at the C code that defines the list.

2. How does Python have Objects?

The first surprising fact is that Python, a famously object-oriented language, is built on top of C, a language that doesn't have classes or objects in the way we think of them. How does this even make any sense!?

The fundamental building blocks of object-oriented programming are really just structs (to group data), pointers (to create references), function pointers (to represent methods), and indirection (a way to decide which function to run at runtime).

These four building blocks let CPython support the key features of object-oriented programming: keeping data and methods together (Encapsulation), letting one class borrow from another (Inheritance), and letting objects behave differently depending on their type (Polymorphism).

Now we'll take a look at how those Objects are implemented using C!

Note: For simplicity's sake we will abstract some of the code, you can still check them out by clicking the hyper links! Since the main branch constantly keeps changing, we stick to 3.11 version.

2.1: The Universal Object - PyObject

This is the root of all Python objects. Every other object struct contains a PyObject within it, making it the absolute base of everything. Below is the definition of PyObject using structs and pointers.

typedef struct _object {
    Py_ssize_t ob_refcnt;
    PyTypeObject *ob_type;
} PyObject;

Attribute Breakdown

1. ob_refcnt

Full Name: Object Reference Count.
What it is: A simple integer counter.
What it does: It keeps a running score of how many variables or other objects are currently referencing this one.
Why it matters: This is the main part of Python's main memory management system. When this count hits zero, Python knows nobody cares about this object anymore and frees its memory.

2. ob_type

Full Name: Object Type.
What it is: A pointer to another big, important struct called PyTypeObject.
What it does: This is the object's ID card. It tells CPython, "Hey, I'm an integer," or "Yo, I'm a list." This PyTypeObject is like a giant lookup table that holds all the "methods" for that type.
Why it matters: This is the magic that makes polymorphism work. When you call len(my_list), CPython looks at my_list's ob_type to find the correct C function to run for calculating the length of a list. This helps len() to run not just for lists but tuples, dictionaries and others too.

2.2: The Variable-Sized Foundation - PyVarObject

Some objects, like the number 7, have a fixed size. Others, like lists and strings, are containers that can hold a variable number of things. PyVarObject is the common foundation for all these variable-sized containers.

typedef struct {
    PyObject_HEAD
    Py_ssize_t ob_size;
} PyVarObject;

That PyObject_HEAD is just a macro, which is a fancy C way of copy-pasting the two fields from PyObject right at the top of this struct. This is basically Inheritance! PyVarObject "inherits" ob_refcnt and ob_type, it then adds one new field which is ob_size.

Attribute Breakdown

ob_size

Full Name: Object Size.
What it is: Another integer counter.
What it does: It stores how many items are currently inside the container.
Why it matters: This is literally the number that len() returns. It's a pre-calculated value, which is why calling len() on a list or tuple is instantaneous, O(1), no matter how big it is. No counting required!

2.3: The List Object - PyListObject

Okay, brace yourself for the big secret, the one that got me into this whole mess. Internally, a dynamic, flexible Python list is basically just a static C array💀. Yeah, I had the same Pikachu face you're having right now.

typedef struct {
    PyObject_VAR_HEAD
    PyObject **ob_item;
    Py_ssize_t allocated;
} PyListObject;

The PyObject_VAR_HEAD macro copy-pastes the fields from PyVarObject (ob_refcnt, ob_type, and ob_size). PyListObject then adds two list-specific fields.

Attribute Breakdown

1. ob_item

Full Name: Object Items.
What it is: The star of the show. It's a PyObject **, which means it's a pointer to a pointer to a Python object. You can think of it as an "Array of pointers". And since the pointers are of type PyObject, they can point to any possible Python object.

What it does: It holds the memory address of a separate, contiguous C array. That array, in turn, holds the memory addresses of all the actual Python objects that are inside the list. It doesn't hold your ints and strs directly, it holds their addresses.
Why it matters: This is the "static C array". Because it's a contiguous block of memory, getting my_list[500] is a simple calculation, which is why item access is a super-fast O(1) operation.

2. allocated

Full Name: Allocated Slots.
What it is: An integer counter and the trusty sidekick to ob_size.
What it does: It tracks the total number of slots that the ob_item C array has reserved in memory.
Why it matters: This is the key to how lists seem "dynamic." Python often allocates more space than it needs right now. This means allocated can be greater than ob_size. The extra, unused slots are waiting patiently for your next .append() call, making it incredibly fast most of the time.

3. A List in Action - The C Functions Behind the Methods

Alright, now that the basics are over, we can finally move to the real functions which power the list methods we all know and love! But first, pat yourself for making this far, you've already learned a couple of new things :D

In this section, we’re going to connect the Python methods we use every day to the C functions that do the real heavy lifting. We'll see how ob_size and allocated change, and witness the famous C array (ob_item) being manipulated directly. First up, the very beginning of a list's life which is its creation.

3.1: Creation of a List (`[] or list()`) - PyList_New

Every list has a beginning. Its life starts the moment you type my_list = []. This simple command triggers a call to a C function, PyList_New, which acts as the creator for all new list objects.

PyObject *
PyList_New(Py_ssize_t size)
{
    PyListObject *op;

    // A quick safety check, because a list with -5 items is just silly.
    if (size < 0) {
        // ... internal error handling ...
    }

// This block only gets compiled if the freelist optimization is enabled.
#if PyList_MAXFREELIST > 0
    // Check if a recycled list object is available.
    if (get_list_state()->numfree > 0) {
        // ... C code to grab an object from the freelist recycling bin ...
    }
    else // The recycling bin is empty.
#endif
    {
        // Create a brand new list object from scratch.
        op = PyObject_GC_New(PyListObject, &PyList_Type);
        if (op == NULL) {
            return NULL; // Failed
        }
    }

    // If the list is empty, don't allocate the array at all!
    if (size <= 0) {
        op->ob_item = NULL;
    }
    else {
        // If a size is given, allocate the C array and zero it out.
        op->ob_item = (PyObject **) PyMem_Calloc(size, sizeof(PyObject *));
        if (op->ob_item == NULL) {
            // ... out of memory error handling ...
        }
    }

    // Finalize the list's properties.
    Py_SET_SIZE(op, size);
    op->allocated = size;
    _PyObject_GC_TRACK(op); // Tell the Garbage Collector to watch this object.
    return (PyObject *) op;
}

Snippet Breakdown

The Freelist Logic : The first half of the function is a performance hack. CPython keeps a "recycling bin" (the freelist) for recently deleted list objects. Before creating a new list, the code first checks this bin. If it finds an old list object it can reuse, it saves the cost of asking the operating system for new memory, which is a surprisingly slow operation.
PyObject_GC_New: If the recycling bin is empty, this is the fallback. It creates a brand new PyListObject struct from scratch and makes sure the Garbage Collector knows about it.
The Empty List Optimization: The if (size <= 0) block is a clever memory-saving trick. If you create an empty list like [], Python doesn't even bother allocating the main C array (ob_item). It just sets the pointer to NULL (C's version of None), saving memory until you actually add the first item.
PyMem_Calloc: If you create a pre-sized list (e.g., [None] * 10), this function is used to allocate the ob_item C array and, importantly, initialize all its slots to NULL.
Finalization: The last few lines set the ob_size and allocated fields (which are the same at birth) and officially register the new object with the garbage collector.

The Takeaway

The creation of a Python list is a highly optimized process. Python's creators knew that creating many small, empty lists is an extremely common pattern. They implemented the freelist to make creating the list object fast and the empty list optimization to make it memory-efficient by delaying the allocation of the main data array.

3.2: Accessing an Item (`my_list[index]`) - PyList_GetItem

Now, let's look at the C code behind the simplest operation: accessing an item by its index, like my_list[i]. This is handled by the PyList_GetItem function.

PyObject *
PyList_GetItem(PyObject *op, Py_ssize_t i)
{
    // Is the object actually a list?
    if (!PyList_Check(op)) {
        // ... internal error handling ...
    }
    // Is the index 'i' in bounds?
    if (!valid_index(i, Py_SIZE(op))) {
        PyErr_SetString(PyExc_IndexError, "list index out of range");
        return NULL;
    }
    // It's a simple access from the "static" C array we know :)
    return ((PyListObject *)op)->ob_item[i];
}

Snippet Breakdown

PyObject * PyList_GetItem(...): Defines a function that takes a generic object and an index i, and returns a pointer to the item found.
if (!valid_index(i, Py_SIZE(op))): This is the bounds check. It makes sure the index i is within the valid range (from 0 to ob_size - 1) before trying to access it. If not, it sets the error string as IndexError.
return ((PyListObject *)op)->ob_item[i]: This is the main action. It casts the generic object op to a specific PyListObject, then directly accesses the C array ob_item at the given index i to get the pointer.

The Takeaway

The reason item access is O(1) (constant time) is because this operation is just direct memory math, also known as pointer arithmetic. The computer can calculate the exact memory address of any item using the formula start_address + (index * pointer_size). It jumps directly to the data, which takes the same amount of time whether the list has 10 items or 10 million.

3.3: What makes the list dynamic? - list_resize

Now for the big one. How does a list magically grow when you append an item? The secret lies in the list_resize function. It's not a method you can call directly from Python, it's the function that methods like .append() and .insert() call upon when they need to change the list's capacity.

static int
list_resize(PyListObject *self, Py_ssize_t newsize)
{
    Py_ssize_t allocated = self->allocated;
    size_t new_allocated;
    PyObject **items;

    // The Fast Path: If there's already room, just update the size.
    if (allocated >= newsize && newsize >= (allocated >> 1)) {
        Py_SET_SIZE(self, newsize);
        return 0;
    }

    // The Slow Path: We need more memory.
    // Calculate the new, larger capacity.
    new_allocated = ((size_t)newsize + (newsize >> 3) + 6) & ~(size_t)3;

    // Ask the OS for a bigger C array and copy the old items over.
    items = (PyObject **)PyMem_Realloc(self->ob_item, new_allocated * sizeof(PyObject *));
    if (items == NULL) {
        // ... out of memory error handling ...
    }
    // Update the list to use the new array and new capacity.
    self->ob_item = items;
    Py_SET_SIZE(self, newsize);
    self->allocated = new_allocated;
    return 0;
}

Snippet Breakdown

if (allocated >= newsize ...): The Fast Path. When a list operation needs to change the size, this first checks if there's already enough allocated space (for growing) or if the list isn't too empty (for shrinking). If the new size fits comfortably within the current capacity, it just updates the ob_size and exits.
new_allocated = ...: The Growth Formula. If the fast path fails, this line calculates a new, larger capacity. It doesn't just add one slot; it adds a proportional amount (roughly 1/8th of the new size) to give the list room to grow.
items = PyMem_Realloc(...): The Slow Path. This is the expensive part. PyMem_Realloc asks the operating system for a bigger chunk of memory and has to copy all of the old item pointers from the old array over to the new, bigger one.
self->ob_item = items; ...: Finally, the function updates the list's internal ob_item pointer and allocated counter to reflect the new, larger C array.

The Takeaway

The efficiency of list growth, especially for .append(), is thanks to this two-path system. Most calls are O(1) because they take the fast path. Occasionally, an expensive O(n) copy happens on the slow path. Python uses a geometric growth formula to ensure these expensive copies happen less and less frequently as the list gets bigger. This spreads the cost of resizing over many appends, making the average cost constant.

3.4 Appending an Item (`my_list.append(item)`) - list_append

Now that we've seen the working of list_resize, let's look at how the simple .append() method actually uses it. The journey starts at the C function list_append.

When you call .append(), the first C function to run is list_append. It's a simple wrapper that handles some paperwork and immediately calls its more interesting helper function, _PyList_AppendTakeRef.

static PyObject *
list_append(PyList_Object *self, PyObject *object)
{
    // Py_NewRef increases the object's ref count before passing it on.
    if (_PyList_AppendTakeRef(self, Py_NewRef(object)) < 0) {
        return NULL;
    }
    Py_RETURN_NONE; // This is why append never returns anything.
}

// The real worker function
static inline int
_PyList_AppendTakeRef(PyListObject *self, PyObject *newitem)
{
    Py_ssize_t len = Py_SIZE(self);
    Py_ssize_t allocated = self->allocated;

    // The Fast Path: Is there an empty slot at the end?
    if (allocated > len) {
        // Yes! Just drop the item in and update the size.
        PyList_SET_ITEM(self, len, newitem);
        Py_SET_SIZE(self, len + 1);
        return 0; // Success!
    }
    // The Slow Path: No room! Call for backup.
    return _PyList_AppendTakeRefListResize(self, newitem);
}

The Takeaway

The .append() method is designed for one thing: speed. It always tries to do the cheapest possible O(1) operation first. It calls the list_resize function to expand only in the case when it's absolutely necessary. This "check-first" logic is what makes appending to a list so efficient in the vast majority of cases.

3.5: Inserting an Item (`my_list.insert(index, item)`) - ins1

Appending is fast because we're just adding to empty space at the end of the list. Inserting an item at the beginning or in the middle of a list can be slow. The C function ins1 shows us the reason.

static int
ins1(PyListObject *self, Py_ssize_t where, PyObject *v)
{
    Py_ssize_t i, n = Py_SIZE(self);
    PyObject **items;

    // First, make sure there's at least one empty slot.
    if (list_resize(self, n + 1) < 0)
        return -1;

    // The Memory Shuffle! Pick the last element and move it one step forward
    // keep doing it till our required index has an empty slot. 
    items = self->ob_item;
    for (i = n; --i >= where; )
        items[i+1] = items[i];

    // Place the new item in the now-empty slot.
    Py_INCREF(v);
    items[where] = v;
    return 0;
}

The Takeaway

Insertion is an O(n) operation because of that for loop. To insert an item at the beginning (where=0), the loop has to touch and move every single element in the list to make room. The amount of work is directly proportional to the size of the list, hence O(n).

3.6: Deleting an Item (`del`) - list_ass_slice

Deleting an item is the reverse problem: instead of making space, we have to "close the gap." This logic is found inside the list_ass_slice function, which uses a C standard library function called memmove.

Note: The actual C function that handles this, list_ass_slice, is quite complex because it's built to handle all kinds of slice operations. The delete_one_item function shown here is a simplified version I've created that focuses on the core logic for deleting a single item.

static int
delete_one_item(PyListObject *self, Py_ssize_t i)
{
    Py_ssize_t n = Py_SIZE(self);
    PyObject **items = self->ob_item;
    Py_ssize_t items_to_move = n - i - 1;

    // ... Decrement reference count of the object being deleted ...

    // The Main Event: close the gap.
    if (items_to_move > 0) {
        memmove(&items[i], &items[i+1], items_to_move * sizeof(PyObject *));
    }

    // Finally, shrink the list's size by 1.
    return list_resize(self, n - 1);
}

Snippet Breakdown

items_to_move = n - i - 1;: Calculates how many items are to the right of the item we're deleting. These are the items that need to be shifted.
memmove(&items[i], &items[i+1], ...): This is the Main Event. memmove is a highly optimized C function that physically moves a whole block of memory from one location to another.
- &items[i]: The destination address (the slot we're closing up).
- &items[i+1]: The source address (the slot right after the deleted one).
- This command effectively takes all items to the right of the deleted one and slides them one spot to the left.

The Takeaway

Deletion is an O(n) operation because of memmove. To "close the gap" left by a deleted item, CPython must physically move all the elements that came after it. If you delete the first item (del my_list[0]), it has to move the entire rest of the array (n-1 items). The work is directly proportional to how many items you have to move.

4. The Final Takeaway: What a List Really Is

So, what was Python hiding from us? After peeling back the layers and staring into the C source code, we found the answer, and it isn't magic. The dynamic, ever-growing list we use every day is built on top of a simple, static C array. That's the secret.

Knowing this changes how you see your own code. That fast access with my_list[99999]? That’s the raw power of a C array doing a simple math problem to find a memory address. But that power has a price. The reason my_list.insert(0, 'x') can feel so slow is that you're witnessing the brute-force reality of that same C array, as it physically shuffles every single element one by one just to make room at the front.

In the end, the list is just a beautiful piece of engineering built on a clever trade-off. It bets that you'll append more often than you'll insert at the beginning, so it optimizes for that case with its over-allocation strategy. And now, you're in on the secret.

Congratulations on making it to the end! 🥳🎉

If you have any questions, suggestions or other topics to recommend, feel free to reach out!

Thank you so much for reading, it means a lot to me! :)

Forem: Bhavesh Kukreja

A Deep Dive Into Python Dictionaries

Hashing and Objects

The Data Structure: PyDictObject

The Dictionary in Action

Creation of a Dictionary - PyDict_New

Snippet Breakdown

The Takeaway

Accessing an Item - _Py_dict_lookup

The Takeaway

Inserting an Item - insertdict

The Takeaway

Deleting an Item - delitem_common

The Takeaway

Bonus Fact: Why d['missing'] Crashes but d.get('missing') Doesn't

The Final Takeaway: What a Dictionary Really Is

What Python Lists Really Are

1. How Do Lists Really Work?

2. How does Python have Objects?

2.1: The Universal Object - PyObject

Attribute Breakdown

1. ob_refcnt

2. ob_type

2.2: The Variable-Sized Foundation - PyVarObject

Attribute Breakdown

ob_size

2.3: The List Object - PyListObject

Attribute Breakdown

1. ob_item

2. allocated

3. A List in Action - The C Functions Behind the Methods

3.1: Creation of a List ([] or list()) - PyList_New

Snippet Breakdown

The Takeaway

3.2: Accessing an Item (my_list[index]) - PyList_GetItem

Snippet Breakdown

The Takeaway

3.3: What makes the list dynamic? - list_resize

Snippet Breakdown

The Takeaway

3.4 Appending an Item (my_list.append(item)) - list_append

The Takeaway

3.5: Inserting an Item (my_list.insert(index, item)) - ins1

The Takeaway

3.6: Deleting an Item (del) - list_ass_slice

Snippet Breakdown

The Takeaway

4. The Final Takeaway: What a List Really Is

3.1: Creation of a List (`[] or list()`) - PyList_New

3.2: Accessing an Item (`my_list[index]`) - PyList_GetItem

3.4 Appending an Item (`my_list.append(item)`) - list_append

3.5: Inserting an Item (`my_list.insert(index, item)`) - ins1

3.6: Deleting an Item (`del`) - list_ass_slice