Why DynamoDB Doesn’t Let You Write A Bad Query

#dynamodb #database #aws #query

You’ve probably read this somewhere.

“DynamoDB doesn’t let you write a bad query”.

But what does it mean and how does DynamoDB accomplish this?

To understand this we have to get into how DynamoDB works under the hood.

DynamoDB Data Structure

DynamoDB is a distributed and fully managed NoSQL database that lets you easily scale.

But how does it let you, the user, scale your database?

Under the hood, DynamoDB stores your data in multiple servers called partitions.

Your data is distributed across partitions, with the item’s partition key dictating the partition (using a hash function) where that item will be stored.

With this architecture, a query you make to a partition (using a partition key) will be processed in O(1) time complexity.

So no matter how you query with a partition key, the query will always be efficient (can’t make a bad query).

That’s fine for partition key queries, but what about composite key queries; when you provide a sort key?

Well, inside each partition, DynamoDB stores your items as a B-tree data structure. Think of this as an upside down tree where the root is the partition and all items inside it are leafs.

As you add items to that partition, they are stored in an alphanumerically sorted order.

For example, “ha” will be at the top of the structure, followed by “har”, then “hard”, “harm” and “harsh”. After all “ha…” items have been stored, items that start with “he…” will be stored (refer to screenshot above).

Here’s what this means in terms of efficiency:

Writing items to this partition will be done in worst case O(log n) — pretty fast.

Reading data from this partition will be done in the same time complexity (O(log n)).
From a high level overview, this is how data is stored in DynamoDB.

How DynamoDB doesn’t let you write a bad query

So why does DynamoDB prevent you from writing bad or inefficient queries?

It does this based on the data structure and some “clever limitations” it places around querying and writing.

First, any single item write or read you make will remain efficient due to the data structure, as these operations would take O(log n) no matter how many items there are in the partition.

Second, a query has a limit result set of 1mb. This limitation is designed so that the read latency stays low.

While other NoSQL databases have a much higher limit, DynamoDB places a relatively low limit to keep queries fast.

Additionally, for multiple item reads and writes, the partition key will greatly limit the search zone of your query.

Similarly, the sort key is as well limited to a few query methods only, again to ensure efficiency and low latency.

These query methods are:

equality operators (<, < =, >, > = and =): this lets you query for number or string values that are greater alphabetically or numerically than another value. (e.g. 5 > 7 or “ha” greater than “he”).
begins_with() method: this lets you say “get me all items whose sort key begins with the substring x”. (e.g. begins_with(“ha”) will return “hard”, “harm”, “harsh”, etc.
BETWEEN method: This lets you say “get me all items whose sort keys are between x and y”. (e.g. sort key BETWEEN “ha” and “hi”, will return “hard”, “harm”, “harsh”, “hello”, “hey”, until “hi”.

Now here’s why these query methods remain efficient even if your database contains billions of items.

Notice how these methods respect the B-tree data structure.

Think of it as a dictionary. If you want to get all words that begin with the word “ha”, you could do it easily (and efficiently).

You’d identify the “h” section (the partition) and find “ha”. From there you can sequentially trace through every word until you reach “he…”. (that’s the begins_with() method)

You can also simulate the BETWEEN method with the dictionary approach. You can trace through every word that starts with “ha…” and ends with “hi”.

However, in DynamoDB you can make a query in any other way such as saying “get me all items that contain a particular string like ‘er’”, just like with a dictionary you wouldn’t be able to find all words that contain the letters “er”.

Or you might be able to but that would take you a really long time and require a full dictionary search.

This is the strategy DynamoDB uses to make your queries scalable, fast and efficient.

Now lets say you tried to design a bad, inefficient database structure and attempted to make a bad query, here’s what would happen.

The query would still require a single partition (efficient).
You can choose to specify a sort key but can only run it on sequential data, which is fast.

If all items are in one partition and its a large partition, the query result would be limited to 1mb (wouldn't take too long to fetch).

So no matter how you query your “badly designed” table, your query would remain relatively fast.

Conclusion

DynamoDB’s architecture and query limitations are designed to enforce efficiency, and make sure that even a poorly designed schema won’t result in an inefficient query.

Through the use of partitions, B-tree structures, and a limited set of query methods, DynamoDB guarantees predictable performance, preventing you from writing a “bad” query.

ACI.dev: Best Open-Source Composio Alternative (AI Agent Tooling)

100% open-source tool-use platform (backend, dev portal, integration library, SDK/MCP) that connects your AI agents to 600+ tools with multi-tenant auth, granular permissions, and access through direct function calling or a unified MCP server.

Star our GitHub!