"Should I use SQL or NoSQL? B-Trees or LSM Trees?" If you've ever felt overwhelmed choosing the right database for your app, you're not alone. Underneath every database is a rich ecosystem of storage engines and transaction protocols — and choosing right can mean the difference between blazing-fast performance or painful bottlenecks.
In this blog, we step into the world of database internals — through the lens of story — and explore how systems like MySQL, MongoDB, Cassandra, and PostgreSQL really work under the hood.
Let me tell you a story.
It all began when I was designing a backend system for a fast-growing e-commerce app. It needed to handle thousands of concurrent users, real-time product inventory updates, personalized recommendations, and a dashboard that updated faster than you could say “out of stock.” And like many developers before me, I hit the question:
“Which database should I use?”
What followed was a deep dive down the rabbit hole of database internals — a world where storage engines clash, transactions dance in delicate sync, and choices aren’t always black and white.
This post is that journey — and maybe, it’ll help you find your own answer.
📁 Behind the Scenes of a Database
At first glance, databases look simple. You insert data. You query it. Maybe update or delete a few rows.
But under the hood, it’s a wild machine — made up of layers like:
- Transport: How your queries travel to the server
- Query Parser & Optimizer: What your SQL actually becomes
- Execution Engine: Where it all gets done
- Storage Engine: The core, the vault, the thing that makes it all possible
That’s where our journey truly begins.
🍊 Storage Engine Showdown: B-Tree vs LSM Tree
🎓 The Classic Hero: B-Trees
Imagine a grand old library with neatly sorted sections and labeled drawers. That’s a B-Tree.
Efficient, organized, and time-tested. Every insert knows where to go. Every query finds what it needs quickly.
It works like this:
- Your data is stored in sorted blocks
- Every read is fast (like
O(log n)
fast) - Updates happen in-place — meaning some random disk I/O, but that's fine for OLTP systems
Databases like MySQL (InnoDB) and PostgreSQL love B-Trees. They're rock solid when you need strong consistency, fast lookups, and ACID transactions.
But…
🔥 The Young Disruptor: LSM Trees
Then you meet LSM Trees — Log-Structured Merge Trees.
This one doesn’t bother with in-place updates. It writes everything to memory first, and flushes it to disk in sorted chunks called SSTables. Every now and then, it cleans up by merging — a process called compaction.
It’s like writing notes on sticky pads, then compiling a clean notebook later.
The result? Blazing-fast write performance. Perfect for logs, metrics, IoT streams, and other write-heavy systems.
LSM Trees power Cassandra, RocksDB, HBase, and even parts of MongoDB.
⚖️ When You Have to Choose
It felt like I was in a Western showdown:
B-Tree Wins If… | LSM Tree Wins If… |
---|---|
Reads are frequent | Writes are frequent |
You need ACID compliance | Eventual consistency is okay |
OLTP-style transactions | Streaming or time-series data |
But that wasn’t the end. A good database isn’t just about reading or writing.
🔐 The Transaction Tapestry
Remember that moment in every heist movie where timing is everything?
That’s what transactions are like. You need your operations to be Atomic, Consistent, Isolated, and Durable — aka ACID.
✪️ SQL Databases (Relational)
In systems like MySQL or PostgreSQL, this is handled with:
- Undo logs
- WAL (Write-Ahead Logs)
- MVCC (Multi-Version Concurrency Control)
Everything is locked, tracked, and reversible.
🌐 NoSQL Databases
In contrast, systems like Cassandra and DynamoDB favor eventual consistency — they go for BASE (Basically Available, Soft state, Eventual consistency).
They work like eventually synced notebooks:
Updates hit one node, and others catch up in the background. Fast, distributed, but less strict.
🧵 A Thread on Concurrency
Concurrency is where things got trickier.
With B-Trees, concurrency is controlled through careful locking:
- Shared locks, exclusive locks, even update locks
- B-Link trees (a clever enhancement) let reads flow even during writes
With LSM Trees, it’s much more lock-free:
- MemTables take writes concurrently
- SSTables are immutable
- Compaction is background work
It was like comparing a bank vault with a revolving door system.
🧬 The Hybrid Age
In the real world, no one-size-fits-all database exists.
Some systems started combining the best of both:
- MySQL has plugins for RocksDB
- MongoDB switched to an LSM-like engine (WiredTiger)
- Aurora blends SQL compatibility with NoSQL performance
🧠 My Takeaway
Choosing the right database isn’t about trends — it’s about trade-offs.
Ask yourself:
- Is your workload read-heavy or write-heavy?
- Do you need strict transactions, or is speed more important?
- Are you handling structured business data, or millions of streaming events?
Once you answer these, your storage engine almost picks itself.
✍️ Final Thoughts
That little "insert into users" line in your code? It kicks off a cascade of logic and engineering brilliance that spans decades.
Understanding database internals made me a better backend engineer — and hopefully, now it’ll do the same for you.
Top comments (3)
I am happy you are learning that. However you have not even touched the tip of the iceberg.
One important correction: NOSQL which stands for not only SQL.
Learn about ACID and CAP. Once you learn fundamentals of some RDBML (ex. MySQL, postgres, Oracle) try programming language inside the DB and triggers (but only try - in most cases this is the wrong way). Read about change journal (log). Try messaging and event sourcing. Then try the other area (mongo, redis). Learn why you might wish to sacrifice part of ACID and focus on CAP. Finally go to NewSQL. Good luck
Love the story! Thank you for sharing!
Nice
Beginner friendly explanation, loved it