Dhanush B

Posted on Apr 13

📍The Art of Choosing the Right Database

#programming #database #design #architecture

"Should I use SQL or NoSQL? B-Trees or LSM Trees?" If you've ever felt overwhelmed choosing the right database for your app, you're not alone. Underneath every database is a rich ecosystem of storage engines and transaction protocols — and choosing right can mean the difference between blazing-fast performance or painful bottlenecks.

In this blog, we step into the world of database internals — through the lens of story — and explore how systems like MySQL, MongoDB, Cassandra, and PostgreSQL really work under the hood.

Let me tell you a story.

It all began when I was designing a backend system for a fast-growing e-commerce app. It needed to handle thousands of concurrent users, real-time product inventory updates, personalized recommendations, and a dashboard that updated faster than you could say “out of stock.” And like many developers before me, I hit the question:

“Which database should I use?”

What followed was a deep dive down the rabbit hole of database internals — a world where storage engines clash, transactions dance in delicate sync, and choices aren’t always black and white.

This post is that journey — and maybe, it’ll help you find your own answer.

📁 Behind the Scenes of a Database

At first glance, databases look simple. You insert data. You query it. Maybe update or delete a few rows.

But under the hood, it’s a wild machine — made up of layers like:

Transport: How your queries travel to the server
Query Parser & Optimizer: What your SQL actually becomes
Execution Engine: Where it all gets done
Storage Engine: The core, the vault, the thing that makes it all possible

That’s where our journey truly begins.

🍊 Storage Engine Showdown: B-Tree vs LSM Tree

🎓 The Classic Hero: B-Trees

Imagine a grand old library with neatly sorted sections and labeled drawers. That’s a B-Tree.

Efficient, organized, and time-tested. Every insert knows where to go. Every query finds what it needs quickly.

It works like this:

Your data is stored in sorted blocks
Every read is fast (like O(log n) fast)
Updates happen in-place — meaning some random disk I/O, but that's fine for OLTP systems

Databases like MySQL (InnoDB) and PostgreSQL love B-Trees. They're rock solid when you need strong consistency, fast lookups, and ACID transactions.

But…

🔥 The Young Disruptor: LSM Trees

Then you meet LSM Trees — Log-Structured Merge Trees.

This one doesn’t bother with in-place updates. It writes everything to memory first, and flushes it to disk in sorted chunks called SSTables. Every now and then, it cleans up by merging — a process called compaction.

It’s like writing notes on sticky pads, then compiling a clean notebook later.

The result? Blazing-fast write performance. Perfect for logs, metrics, IoT streams, and other write-heavy systems.

LSM Trees power Cassandra, RocksDB, HBase, and even parts of MongoDB.

⚖️ When You Have to Choose

It felt like I was in a Western showdown:

B-Tree Wins If…	LSM Tree Wins If…
Reads are frequent	Writes are frequent
You need ACID compliance	Eventual consistency is okay
OLTP-style transactions	Streaming or time-series data

But that wasn’t the end. A good database isn’t just about reading or writing.

🔐 The Transaction Tapestry

Remember that moment in every heist movie where timing is everything?

That’s what transactions are like. You need your operations to be Atomic, Consistent, Isolated, and Durable — aka ACID.

✪️ SQL Databases (Relational)

In systems like MySQL or PostgreSQL, this is handled with:

Undo logs
WAL (Write-Ahead Logs)
MVCC (Multi-Version Concurrency Control)

Everything is locked, tracked, and reversible.

🌐 NoSQL Databases

In contrast, systems like Cassandra and DynamoDB favor eventual consistency — they go for BASE (Basically Available, Soft state, Eventual consistency).

They work like eventually synced notebooks:

Updates hit one node, and others catch up in the background. Fast, distributed, but less strict.

🧵 A Thread on Concurrency

Concurrency is where things got trickier.

With B-Trees, concurrency is controlled through careful locking:

Shared locks, exclusive locks, even update locks
B-Link trees (a clever enhancement) let reads flow even during writes

With LSM Trees, it’s much more lock-free:

MemTables take writes concurrently
SSTables are immutable
Compaction is background work

It was like comparing a bank vault with a revolving door system.

🧬 The Hybrid Age

In the real world, no one-size-fits-all database exists.

Some systems started combining the best of both:

MySQL has plugins for RocksDB
MongoDB switched to an LSM-like engine (WiredTiger)
Aurora blends SQL compatibility with NoSQL performance

🧠 My Takeaway

Choosing the right database isn’t about trends — it’s about trade-offs.

Ask yourself:

Is your workload read-heavy or write-heavy?
Do you need strict transactions, or is speed more important?
Are you handling structured business data, or millions of streaming events?

Once you answer these, your storage engine almost picks itself.

✍️ Final Thoughts

That little "insert into users" line in your code? It kicks off a cascade of logic and engineering brilliance that spans decades.

Understanding database internals made me a better backend engineer — and hopefully, now it’ll do the same for you.

Build gen AI apps that run anywhere with MongoDB Atlas

MongoDB Atlas bundles vector search and a flexible document model so developers can build, scale, and run gen AI apps without juggling multiple databases. From LLM to semantic search, Atlas streamlines AI architecture. Start free today.

Start Free

Top comments (3)

Maciej Wakuła • Apr 14

I am happy you are learning that. However you have not even touched the tip of the iceberg.
One important correction: NOSQL which stands for not only SQL.
Learn about ACID and CAP. Once you learn fundamentals of some RDBML (ex. MySQL, postgres, Oracle) try programming language inside the DB and triggers (but only try - in most cases this is the wrong way). Read about change journal (log). Try messaging and event sourcing. Then try the other area (mongo, redis). Learn why you might wish to sacrifice part of ACID and focus on CAP. Finally go to NewSQL. Good luck