Forem: Farhan Syah

How a Database Really Works Underneath

Farhan Syah — Sun, 05 Apr 2026 04:16:12 +0000

One question keeps coming up when people start going deeper into databases:

How does a database actually work underneath?

Not the SQL part. Not the API. Not the dashboard.

The real part:

Where is the data actually stored?
Is it kept in memory, on disk, or both?
How are rows and columns laid out?
Why does a query return quickly instead of scanning everything forever?

This article is a general answer to that question.

Different databases make different tradeoffs, but most serious databases are built from the same small set of ideas.

The short answer

A database is usually not one magical structure.

It is several layers working together:

A Storage Format For Data On Disk
An In-Memory Layer For Hot Data
One Or More Index Structures
A Query Engine
A Transaction / Recovery Layer

When you run a query, the database does not start from scratch.

It uses those layers together:

find the right pages or files
use indexes if they help
load needed data into memory
execute the query
return only the rows, columns, or records you asked for

That is the basic picture.

Where the data is really stored

Most databases store durable data on disk.

That can mean:

regular files on SSD or HDD
memory-mapped files
append-only log files
page files
SSTables in LSM-based systems
custom binary file formats

The exact layout changes from one database to another, but the key point is simple:

Memory is fast, but temporary. Disk is slower, but durable.

So the normal design is:

disk is the long-term source of truth
memory is the fast working area

That is why databases usually use both.

What lives in memory

A database does not usually load the whole dataset into RAM.

Instead, it keeps the parts it needs most often in memory.

This memory layer is often called a:

buffer pool
page cache
block cache
memory table

depending on the database design.

The job is the same:

keep hot data close
avoid repeated disk reads
batch writes when possible
make queries much faster than raw disk access

So when people ask, "Is the data stored in memory or in files?" the honest answer is usually:

Both.

The durable copy is on disk. The actively used working set is often in memory.

How the data is laid out on disk

This depends on the database model.

Row-oriented storage

Traditional relational databases usually think in rows.

If you have a table like:

users(id, name, email, created_at)

a row-oriented system tends to store one full row together:

[id, name, email, created_at]

then the next row:

[id, name, email, created_at]

and so on.

This is good for:

OLTP workloads
reading one record at a time
inserts and updates on complete records
transactional business data

That is why row storage is common in systems like PostgreSQL and MySQL.

Column-oriented storage

Analytical databases often store data by column instead:

all id values together
all name values together
all email values together
all created_at values together

That is useful when queries read a few columns across many rows, such as:

SELECT created_at, count(*) ...

instead of loading every field of every row.

This is good for:

analytics
scans
aggregations
compression

because values in the same column are often similar and compress well.

That is why columnar engines show up in analytics, time-series, and warehouse-style systems.

Document storage

Document databases usually store self-contained records, often in binary JSON-like formats.

One record may contain:

{
  "id": 1,
  "name": "Farhan",
  "tags": ["db", "rust"],
  "profile": {
    "city": "KL"
  }
}

Instead of fixed columns, the document keeps structure inside the record itself.

This is useful when:

fields vary across records
nested data matters
schema flexibility matters

Key-value storage

A key-value database stores data as:

key -> value

This is the simplest mental model.

It is often very fast for direct lookup, but weaker when you want rich relational or analytical behavior unless more layers are added.

What a page is

Most databases do not read and write one row at a time directly from disk.

They usually organize storage into fixed-size blocks called pages.

A page might be:

4 KB
8 KB
16 KB
or some other fixed size

Why pages?

Because disks and operating systems work better with chunked reads and writes than with tiny scattered operations.

So instead of asking the disk for "row 187" directly, the database often asks for:

"load the page that contains row 187."

That page is then decoded in memory, and the database finds the exact row inside it.

This is one of the most important ideas in database internals.

Rows, documents, or index entries are usually inside pages, not floating around individually on disk.

How indexes make lookups fast

Without an index, a database often has to scan everything.

If you ask:

SELECT * FROM users WHERE email = 'a@example.com'

and there is no index on email, the database may need to inspect row after row until it finds matches.

That is called a full scan.

Indexes exist to avoid that.

An index is another structure built beside the main data so the database can find locations quickly.

The two most common families are:

B-tree indexes

This is the classic index structure in many relational databases.

A B-tree keeps values in sorted order and makes lookup efficient.

That helps with:

equality lookup
range queries
ordered scans

Example:

If an index exists on email, the database can search the index first, find the location of the matching row, then jump to the real data.

Instead of scanning 10 million rows, it may touch only a small number of pages.

Hash indexes

Hash indexes are optimized for direct equality lookup.

They are good for:

id = 123
exact key lookup

They are usually weaker for range queries like:

id > 1000

LSM-tree style indexes

Many modern write-heavy systems use log-structured merge trees.

Instead of updating one sorted tree in place all the time, they:

write sequentially
buffer changes in memory
flush sorted files to disk
merge those files later

This often improves write throughput, but it changes the read path and compaction behavior.

Systems like RocksDB and Cassandra use LSM-style ideas.

How a database finds a row quickly

Suppose a table has an index on email.

When you search for one email, the general path looks like this:

the query parser understands what you asked for
the planner checks available indexes
the engine chooses the email index
the index search finds the row location
the relevant data page is loaded into memory if needed
the row is read from that page
the result is returned

That is why search can feel fast.

The database is not "being smart" in a mystical way. It is using pre-built structures to avoid unnecessary work.

Why some queries are still slow

Indexes help, but they do not solve everything.

Queries can still be slow when:

there is no useful index
the query needs too many rows anyway
the planner chooses badly
the data is fragmented
too much data must be read from disk
joins, sorts, or aggregations become expensive

So performance is not only about "having an index."

It is about:

storage layout
memory pressure
access pattern
query plan
workload shape

What happens on writes

Reads are only half the story.

When a database writes data, it usually does not just overwrite the file carelessly.

Serious databases care about crashes, partial writes, and recovery.

So many of them use a write-ahead log, often called a WAL.

The general idea is:

record the change in a durable log first
acknowledge the write
apply or flush the data pages after that
recover from the log if the system crashes mid-way

This is one reason databases can survive crashes better than a naive file format.

The log is not only about speed. It is also about correctness and recovery.

Why databases do not just use normal files directly

People sometimes ask:

Why not just save JSON files, CSV files, or objects directly?

You can, for simple systems.

But a real database gives you things normal files do not handle well on their own:

indexing
concurrency control
transactions
recovery
query planning
caching
data integrity

A file can store data.

A database is the machinery that makes that data searchable, writable, recoverable, and safe under load.

The simple mental model

If you want the simplest mental model, think of a database like this:

data is stored durably on disk
hot parts are kept in memory
storage is organized into pages or files
indexes point to where data lives
a planner decides the cheapest way to answer a query
logs protect writes and recovery

That is not the whole story, but it is enough to start reading database internals without feeling lost.

Final thought

A database is not just "a place to keep data."

Underneath, it is a storage engine, memory system, indexing system, execution engine, and recovery system working together.

The exact implementation changes from NodeDB to PostgreSQL to SQLite to RocksDB to ClickHouse to MongoDB, but the deeper ideas repeat.

If you care about how databases really work, follow NodeDB. I will keep writing more about storage engines, query planning, indexing, execution, and why database internals matter more than most application developers realize.

How NodeDB Handles Multi-Model Differently

Farhan Syah — Sun, 05 Apr 2026 00:09:47 +0000

In the previous article, I explained where NodeDB stands in the current multi-model landscape.

This post is about the next question: how does it actually work?

The short answer is:

one engine cannot do everything

NodeDB splits the problem into core engine families, then builds specialized models on top of the right base, while keeping everything inside one database boundary.

That is the core idea:

Some models get their own real engine
Some models grow from another strong native engine
The planner knows the difference
The user does not have to split the stack into more databases just because the workload widened

Start from the core, not from the label

When people say "multi-model database", the phrase usually hides one of several shortcuts.

It can means:

One strong database with extensions.
One document system with a broader marketing surface.
Several services under one product name.
One generalized storage shape being stretched across too many models.

That is not the route I wanted.

The starting question for NodeDB was simpler and harder at the same time:

What are the real engine families needed if this database is supposed to handle broad workloads without turning every new requirement into more stitching?

The current answer is:

Document
Strict
Graph
Vector
Columnar
Key-Value
Full-Text Search

Then there are models that should be native, but do not need a fully separate engine family if they already have the right base:

Time-Series On Columnar
Spatial On Columnar

That split matters because it avoids two bad extremes:

Pretending every model is the same under the hood
Creating a separate mini-database for every capability

The core engine families

The concrete part is what these engines actually are.

Document is the flexible record engine. In NodeDB that means schemaless records, MessagePack storage, and CRDT-oriented sync paths for local-first use cases.
Strict is the structured record engine. It is closer to row-oriented access, uses a different storage shape, and is meant for workloads that care about predictable fields and faster direct access, not just schema validation on top of documents.
Graph is not document plus links. It has its own adjacency structures, traversal path, and graph-native operations.
Vector is not "just add embeddings somewhere." It has its own ANN path, quantization path, and distance behavior.
Columnar is the analytical base: compression, scan-heavy reads, predicate pushdown, and the kind of layout you want when the workload is closer to analytics than OLTP.
Key-Value exists for direct lookup workloads that should be simple and cheap instead of routed through a heavier model.
Full-Text Search has its own ranking and tokenization behavior. BM25, stemming, stop-word handling, fuzzy matching, and hybrid retrieval should not be faked with normal filtering.

This is the part many multi-model products blur. The names are easy. The backing engine choices are the hard part.

Why strict matters so much

Strict is especially important here, because this is where a lot of multi-model systems get vague.

From the outside, Strict can look like document plus schema rules. That is not enough.

In NodeDB, Strict exists because structured workloads need a different path:

Fixed field expectations
Different storage behavior
Different access patterns
Different planner assumptions
Different performance goals

In the repo direction today, strict is treated as a row-like storage mode with direct field access, not just a document collection that rejects invalid writes.

That distinction matters. If a multi-model database cannot give structured data a serious path of its own, then the "multi-model" story is already weak for a large class of real applications.

Specialized models should grow from the right foundation

This is where Time-Series and Spatial fit.

I do not think every model deserves its own isolated engine family just for the sake of appearances. Sometimes the right design is to start from a strong native base and specialize from there.

In NodeDB, that base is Columnar.

That is why time-series sits there:

Columnar layout fits scan-heavy reads
Compression matters
Aggregation matters
Retention and rollups matter more than point-update behavior

The current repo direction already reflects this. Time-series is described as a columnar profile with ILP ingest, continuous aggregation, PromQL support, and time-oriented SQL functions.

Spatial follows the same idea. It belongs near columnar because analytical and geospatial workloads often overlap, but it still has its own native behavior: spatial indexes, geohash/H3-style locality tools, and geometry predicates.

So the pattern is:

Separate engine families where the workload is genuinely different
Specialized native profiles where another engine already gives the right foundation

How NodeDB keeps coherence

This is the part that decides whether the whole design works or falls apart.

If you only collect engines under one brand, you still have not solved multi-model. You have just moved the boxes closer together.

The harder problem is coherence:

How does a mixed workload stay in one system?
How does the planner choose the right path for each model?
How do you avoid weakening every model just to make the interface look uniform?

For NodeDB, the answer is not one generic query path for everything. That would flatten the models.

The answer is dedicated query and planner treatment where model semantics really differ, while still keeping them in one database boundary.

That is especially important for Strict, because strict should not be planned like schemaless document. It should give the planner stronger assumptions.

It is also important for Vector, Graph, and Full-Text Search, because each of them has search and ranking behavior that should not be reduced to ordinary filtering.

So coherence here means:

One database boundary
Shared system-level coordination
Model-specific execution paths where needed
Fewer cross-system hops when workloads mix

That is a more useful definition than simply saying "one query language" or "one product."

Why this is different from pseudo multi-model

This is the distinction I care about most.

Pseudo multi-model usually fails in one of two ways:

Everything is pushed through one generic core, so the models exist but feel weak
Every serious model becomes another extension, service, or external database, so the models stay stronger but the user absorbs the integration cost

NodeDB is trying to stay between those two failures.

That is why the design is split the way it is:

Real engine families where the workload needs real specialization
Native profiles where a strong base already exists
Planner-level respect for the differences
One database boundary for mixed workloads

What this means in practice

If NodeDB works the way I want it to work, the practical result should be different in a few concrete ways.

Graph queries should follow graph-native structures and algorithms instead of pretending edges are just another document field.
Vector search should use a real ANN path with its own indexing and quantization choices, not a thin side feature.
Strict collections should behave like a serious structured model with stronger planner assumptions and faster direct field access.
Time-series should inherit the strengths of columnar execution instead of being simulated on top of an unsuitable engine.
Spatial should get native spatial behavior without forcing the user into another database.
A workload that mixes records, vectors, search, and graph should stay inside one system instead of turning into a chain of remote calls.

Those tests are better than asking whether the product can list many model names.

The risk

This approach is harder for obvious reasons.

It is easier to start with fewer engines.
It is easier to flatten more behavior into one generic layer.
It is easier to delay the planner work.
It is easier to turn more models into later add-ons.

The risk is whether NodeDB can keep this depth and coherence once real workloads start pushing every corner of the design.

Why I still prefer this route

Even with that risk, I still think this is the more honest way to build a true multi-model database.

The point is not only to support many models.

The point is to support them with the right engine shape, the right planner behavior, and the right database boundary.

If that holds up, then NodeDB becomes interesting for a real reason, not just because it can claim a long feature list.

If you want to follow how NodeDB works at the engine level, where this design holds up, and where it still needs to prove itself, follow NodeDB. I will keep sharing the architecture, tradeoffs, and deeper implementation decisions as the database evolves.

Why NodeDB Might Be a Better Multi-Model Database

Farhan Syah — Sat, 04 Apr 2026 22:34:22 +0000

After comparing the current multi-model databases, I think the category has a real opportunity.

The opportunity is obvious:

Modern applications keep pushing different data models into the same product.

Document, graph, vector, time-series, search, spatial, sync, analytics pressure, and relational workloads do not stay in neat separate boxes for very long.

The problem:

Most current solutions require a bad compromise.

Some systems cover many models, but do not make them equally worth trusting. Some reduce the number of boxes in the architecture diagram, but still leave the user carrying too much integration work. Some look fine early and then start pushing the real cost into the next stage of the product.

That is the gap NodeDB is trying to enter. I am not claiming it has already solved it. I am saying it is aimed at the right problem.

Where NodeDB stands

If PostgreSQL represents seriousness, and ArcadeDB represents engine ambition, those are two of the things I want NodeDB to preserve.

I do not want a database that wins by appearance. I do not want a product that looks broad only because it can list many model names on a landing page. And I do not want a system that becomes "multi-model" only after the user starts adding more external databases, more services, and more coordination outside the database itself.

So where does NodeDB stand in this landscape?

Somewhere between respect for what already works and frustration with what still feels unfinished.

PostgreSQL is still the_ serious baseline for engineering quality, operational trust, and technical discipline_. I take that seriously. If NodeDB cannot compete with that level of seriousness, then there is no point.

ArcadeDB matters for a different reason. It shows that a broader native multi-model engine can still care about engine quality. It is one of the few systems in this space that feels like it actually wants to own the hard parts instead of only smoothing over them at the product layer.

That combination matters to me:

Seriousness
Engine ambition
Broad native model support

That is the bar.

What NodeDB is trying to avoid

There are THREE patterns I want NodeDB to avoid.

The first is broad claims without serious implementations. This is the easiest trap in the category. A database says it supports document, graph, vector, search, and time-series, but once you look closer, too many of those models do not feel strong enough to carry real workloads. I do not want NodeDB to win by vocabulary.

The second is one database on the surface, but too much stitching outside the database. This is the problem with extension sprawl, service sprawl, and platform sprawl. Even if each extra piece is defensible on its own, the user is still the one carrying the integration burden. I want NodeDB to reduce that burden, not rename it.

The third is a design that works early but forces re-architecture later. This is the long-term failure mode I care about most. A lot of systems feel good at the first stage, then requirements widen, and suddenly the user is back to splitting the stack again.

What NodeDB is trying to do differently

The goal is not just to support more models. The goal is to support them in a way that still feels coherent in one database.

That means keeping more capability inside one real database boundary instead of pushing the user toward more external systems. It means making wider support and stronger implementations grow together. And it means treating future architecture churn as a design problem now, not as something to dump on the user later.

That is also why I keep all the major models in view.

If NodeDB is going to be worth building, it cannot stop at only one or two native models and call that enough. It has to think seriously about document, graph, vector, time-series, spatial, search, and relational-style workloads as part of one coherent database direction.

Not because every project needs every model on day one.

But because the problem does not get better if every database starts narrow and then spends years teaching users how to bolt the rest around it.

Why that makes NodeDB interesting

NodeDB is still the new kid in the block. That is a disadvantage in obvious ways: less market trust, less ecosystem gravity, less history, less proof.

It is also an advantage. It does not have to inherit every compromise that older systems normalized. It can start from a harder question: what would a multi-model database look like if it took both width and depth seriously from the beginning?

That is what makes it interesting to me. Not because I think the answer is easy, but because too many current systems still make me choose between:

Narrow but dependable
Broad but not convincing enough
Practical, but still held together by too many moving pieces

NodeDB is my attempt to push against that tradeoff. Whether it succeeds is a different question. But that is the direction.

So how would I score NodeDB right now?

If I use the same seven dimensions from the previous article, this is where I would place NodeDB today.

These are not victory scores.

They are directional scores from someone building the system and using it, while also being honest about what still needs time, proof, and pressure.

Category	Score	Why
Native multi-model support	9/10	NodeDB is being built around broad native model support inside one real database boundary rather than around extensions, outside services, or separate databases.
Multi-model depth	9/10	The whole point of NodeDB is to make the models strong enough to stand on their own instead of existing only as surface-level support.
Performance potential	9/10	The design is aiming for serious engine-level control rather than convenience-first layering, which gives it very strong upside if the implementation keeps holding up.
Developer experience	6/10	I am trying to make it friendly, but that still needs to be proven in wider use, not just intended.
Operational simplicity	6/10	Keeping more capability inside one database boundary should help, but operational simplicity still has to be proven over time.
Ecosystem / maturity	3/10	This is where NodeDB is obviously weak compared with established systems. It is still young.
Production confidence	5/10	It is already useful for my own work, but it has not yet been tested across enough domains and use cases to justify a higher score.

That is the honest picture.

NodeDB scores high where I think the category still needs a stronger answer: native breadth, model strength, and the attempt to keep them inside one real database boundary instead of spreading them across too many external pieces.

It scores lower where new systems are supposed to score lower: ecosystem, maturity, and broad production trust.

If a new database starts with weak depth and weak direction, then low maturity just makes it easier to dismiss. But if a new database starts with a stronger technical direction, then the real question becomes whether it can survive long enough to earn the rest.

Who should try it now?

At this stage, I would not advise established production teams to switch databases just because NodeDB looks interesting.

That would be irresponsible.

If you already have a production system running on something stable, the migration cost, operational risk, and unknowns are still too high.

But I do think NodeDB is worth considering for:

Startups
New applications
Greenfield systems
Teams that know they will need broader native model support early
Builders who want to try a different multi-model direction before they get locked into extension sprawl or service sprawl

Not "everyone should move now."

More like: if you are early, if your requirements are broad, and if you want to bet on a database trying to solve this problem differently, then NodeDB is worth serious attention.

In the next post, I will go deeper into how NodeDB handles multi-model differently and why I think that design matters more than just claiming support for more models

Repo: NodeDB

Comparing Today's Multi-Model Databases

Farhan Syah — Sat, 04 Apr 2026 02:41:42 +0000

Multi-model databases sound simple on paper.

One database, many data models, less architecture sprawl.

That is the pitch.

The reality is less neat.

Once you look closely, "multi-model" can mean very different things. Sometimes it means one engine with several models that actually work together. Sometimes it means one database plus a growing pile of attached capabilities. Sometimes it means multiple APIs over one cloud platform. Sometimes it means a strong idea that still has rough execution.

So if you are evaluating the current landscape, the label alone is not enough.

You need to ask harder questions.

How native is the multi-model support?
Does the database feel coherent, or just broad?
Does it stay pleasant once the workload becomes real?
Are you buying a database, or a stack of compromises with a friendly landing page?

This post is my attempt to compare some of the most prominent multi-model options people talk about today:

PostgreSQL + extensions
SurrealDB
Couchbase
ArangoDB
OrientDB
ArcadeDB

For context, this is not a benchmark post.

The ratings here are closer to review scores than benchmark numbers. They are editorial judgments about architecture, feature depth, workflow quality, operational shape, and how convincing each database feels as a serious multi-model system today.

The real fault lines

Most comparison posts flatten all of these databases into one category and then act surprised when the tradeoffs look inconsistent.

They are inconsistent because these systems are solving different versions of the problem.

There are at least five distinct approaches hiding under the same "multi-model" label:

Extension-driven breadth: PostgreSQL + extensions
Native engine multi-model: ArangoDB, OrientDB, ArcadeDB
Document-first platform expansion: Couchbase
Modern unified developer story: SurrealDB
API / platform interpretation: Cosmos DB and similar systems

I do not think all five belong in the same category in the strict database sense, even if vendors market them that way.

If you do not separate those approaches, the category stops making sense.

That is also why direct one-line comparisons are often misleading. A database can score lower on native support and still be the safer production choice. Another can score higher on native support and still be the riskier operational bet.

How I am judging them

Each database gets rated on the same seven dimensions:

Native multi-model support
Multi-model depth
Performance potential
Developer experience
Operational simplicity
Ecosystem / maturity
Production confidence

The first category matters most.

I am not asking whether a database can be stretched into several roles. Many can. I am asking whether multiple models are part of the actual design, or whether the user is doing the integration work by hand.

Native multi-model support and multi-model depth are not the same thing.

A database can be very native and still be shallow. That usually means the models are built into the product and query language, but the actual implementations still feel lighter than specialized systems.

A database can also be less native and still be deep. PostgreSQL is the clearest example: many of its strongest non-relational capabilities come from extensions, but some of those extensions are still serious pieces of engineering.

For this article, a higher nativeness score means the user pays less integration penalty when new models show up.

If you need to add a separate external database just to get one more model, that should hurt the score a lot.

If you can stay inside the same database but need an extension, that is still a penalty, but a smaller one.

If the model is already part of the same core engine or the same native database story, that should score much better.

So nativeness here is not just about marketing language. It is about how much stack stitching the user still has to do when requirements expand.

The other categories matter because the category has a bad habit of hiding tradeoffs behind a neat label. A database can look impressive in a product diagram and still feel second-class once real workloads arrive.

PostgreSQL + extensions

PostgreSQL is not a native multi-model database in the strict sense.

Supported models:

Model	Support	Notes
Relational	Native	Core identity and strongest model.
Document / JSON	Native	Via `json` / `jsonb`, but still within PostgreSQL’s relational engine.
Spatial	Extension	Usually via `PostGIS`; powerful, but not part of the core engine identity.
Time-series	Extension	Usually via `TimescaleDB`; serious in practice, but not core-native PostgreSQL.
Vector	Extension	Usually via `pgvector`; practical, but still extension-led.
Graph	Extension / Application-level	Usually via `Apache AGE` or application-side modeling, not core-native graph semantics.

It is still the practical benchmark that every serious multi-model system ends up being measured against.

Why? Because PostgreSQL keeps absorbing more territory.

Relational data is its home ground. JSON and jsonb made document-style usage normal. PostGIS covers geospatial workloads. TimescaleDB pushed hard into time-series. pgvector made vector search part of the conversation for mainstream teams. There are also graph-style options like Apache AGE if you want to keep pushing further.

That is a remarkable amount of coverage.

The problem is that coverage is not the same as coherence.

PostgreSQL + extensions is powerful, but the integration burden shifts toward the user. You decide which extension owns which problem. You learn the quirks, upgrade risks, operational differences, and uneven ergonomics between those pieces. The core database is stable. The workload-specific stack becomes your responsibility.

That is the PostgreSQL tradeoff in one sentence: Best-in-class practical confidence, weaker multi-model integrity.

Category	Score	Why
Native multi-model support	4/10	It is not truly native multi-model. Even when extensions keep you inside the same database, the user still pays a large stitching and compatibility penalty.
Multi-model depth	7/10	The feature depth can be excellent, but it is uneven and fragmented by extension boundaries.
Performance potential	8/10	PostgreSQL is a serious engine, and some extensions are very strong, but cross-model optimization is not one coherent story.
Developer experience	8/10	SQL, tools, docs, and ecosystem are excellent, though the experience becomes rougher as extensions pile up.
Operational simplicity	4/10	Running PostgreSQL is easy by industry standards. Running a stitched stack of extensions across many projects is not.
Ecosystem / maturity	10/10	Nobody in this list beats PostgreSQL here.
Production confidence	10/10	If you want the safest broad platform, this is still the default answer.

Trust review:

extremely high, but only if you accept that you are trusting a platform plus an integration strategy, not one native multi-model engine.

Query model:

PostgreSQL’s center is still SQL. JSON support is strong, jsonb has serious operator support, and GIN indexing makes document-style querying practical. But every added model brings its own syntax, extension behavior, and planning assumptions. There is no single multi-model query language tying the whole thing together.

Engine / storage shape:

This is still one of the strongest storage and execution engines in the industry, but the important part is where the extra models live. jsonb stays inside PostgreSQL. PostGIS, TimescaleDB, pgvector, and graph-oriented options live in different layers around that core. The engine is first-class. The multi-model story is distributed across extensions.

How native is the multi-model claim?

Only partially native. PostgreSQL can absolutely play multi-model in real production systems, but it does that through a strong core plus specialized additions, not through one database designed from day one around equally native relational, graph, vector, and time-series semantics.

Where it feels first-class:

Relational workloads
Operational SQL
Mature production tooling
Extension-backed specialization when the team knows exactly what it is doing

Where it feels second-class:

Cross-model coherence
Unified multi-model query experience
Workloads that force several extensions to behave like one native system

Best fit:

Teams that want the safest broad platform
Companies with strong PostgreSQL expertise
Workloads where extension sprawl is still manageable

Poor fit:

Teams that want one native multi-model system instead of a strong core plus many additions

Short verdict:

The safest powerful option, and still the practical benchmark, but the user pays for multi-model breadth with growing integration debt.

SurrealDB

SurrealDB has one of the strongest product visions in this space.

Supported models:

Model	Support	Notes
Document	Native but shallow	SurrealDB itself describes the system as being, at its core, document-oriented.
Graph / relations	Native but shallow	Relations and traversal are real, but this still does not look like a deep graph-tooling database.
Key-value style access	Backed by KV storage layer	The engine runs over KV backends such as RocksDB, SurrealKV, TiKV, and browser IndexedDB. This is more a storage foundation than a separate first-class model.
Vector	Native but shallow	Present in the product, but not obviously a deep specialized vector engine.
Time-series	Native but shallow	Present in the product, but their own docs admit specialized TSDBs currently do retention and compression better.
Full-text / search	Native direction	Present, but not a deep search-first system.

It understands the modern developer instinct very well: fewer hops, one system, SQL-like querying, document and graph ideas in one place, and a cleaner story than bolting together half a dozen tools. That is why it gets attention so quickly.

The problem is not coverage. SurrealDB covers a lot.

The problem is depth.

Too many of its models still feel like lighter versions of the real thing. Graph is present, but not as a rich graph-tooling environment. Time-series is present, but not like a serious analytics-grade TSDB. Vector is present, but not in a way that makes it look like a best-in-class vector engine.

The architectural reason matters here. Much of the multi-model story is being pushed through one generalized KV-backed foundation. And because that foundation can itself sit on top of other KV engines, many models end up feeling like interpretations over the same storage pattern rather than deeply specialized implementations. The product surface is broad. The engine-level realization is still shallow.

Category	Score	Why
Native multi-model support	9/10	Multi-model is deeply part of the core product and engine identity, not just an add-on.
Multi-model depth	3/10	Broad and ambitious, but too many models feel much lighter than the real specialized versions they resemble.
Performance potential	6/10	Good enough to be interesting, but I still see more promise than proof.
Developer experience	8/10	Strong product storytelling, approachable model, modern appeal.
Operational simplicity	6/10	Simpler than managing a stitched stack, but simplicity in concept is not the same as maturity in production.
Ecosystem / maturity	5/10	Still young compared with the more established systems here.
Production confidence	4/10	I would watch it closely, but I would still treat it as a bet rather than a settled answer.

Trust review:

Medium. Easy to understand, easy to want, still much harder to trust deeply than the product story suggests.

Query model:

SurrealQL is one of SurrealDB’s real strengths. It gives the database a strong unified surface and makes the system feel coherent much earlier than many other multi-model products.

Engine / storage shape:

SurrealDB presents itself as one native multi-model database rather than a database plus extensions. That part is fair. But once you look at the architecture docs, the system is clearly layered over a key-value backend story. That does not make it fake. It does explain the shallowness. The “many models” claim is being realized through one broader document/KV-oriented engine shape, sometimes itself sitting on top of another KV engine, rather than through several equally deep model-specific engines.

How native is the multi-model claim?

More native than PostgreSQL + extensions or Couchbase’s service-style expansion, but still not equally convincing across every claimed model. The claim is structurally native at the product and query layer. At the engine layer, it still feels more like one broader KV-backed system interpreting many models than several deeply realized model-specific implementations.

Where it feels first-class:

Product vision
Modern developer appeal
Unified story around documents, relations, and fewer system hops

Where it feels second-class:

Hard production confidence
Deep capability maturity across every advertised area
Trust under heavier or less forgiving workloads
Several models feeling lighter than their specialized counterparts

Best fit:

Teams willing to bet on direction
Builders who care a lot about developer-facing coherence
Projects where the vision aligns closely with the workload

Poor fit:

Conservative production environments that need long-proven operational confidence

Short verdict:

One of the strongest visions in the category, but still too broad and too light in too many places to count as one of the most convincing finished implementations.

Couchbase

Couchbase sits in an interesting middle ground.

Supported models:

Model	Support	Notes
Document	Native core	Main center of gravity.
Key-value	Native core	Part of the core operational model.
Query over JSON documents	Native query layer	Via `SQL++`; not a separate model so much as a strong query surface over document data.
Full-text search	Platform service	Official part of the platform, but not an equal core data model.
Analytics	Platform service	Official part of the platform, but not an equal core data model.
Eventing	Platform service	Official part of the platform, but not an equal core data model.
Mobile sync / edge sync	Platform service	Official part of the platform, but not an equal core data model.
Graph	Not core native	Not a real native graph database identity.

It is not usually the first database people name in multi-model debates, but it has quietly built a broader platform than many people realize. Document data is still the center, but key-value access, SQL++ querying, full-text search, analytics, eventing, and mobile sync make it much more than "just a document database."

That broader platform story is real.

So is its production credibility.

Couchbase has been around, it has enterprise weight, and it knows how to solve operationally serious problems. If you care about distributed document-heavy workloads with search, analytics, eventing, and mobile sync, it has more depth than a lot of trendier names.

But it still does not feel like the cleanest native multi-model database in the stricter sense.

It feels more like a strong document-first platform that expanded intelligently around its center of gravity, with multiple surrounding services rather than multiple equal native models.

That is why I would place it closer to Cosmos DB than to ArangoDB in category shape, even though it is still stronger than Cosmos DB at the database-core level. Couchbase has a real native database center. The broader multi-model feeling comes from the platform built around that center, not from one engine where every model is equally native.

Category	Score	Why
Native multi-model support	4/10	Broader than a simple document store, but the surrounding services still mean the user is assembling a platform rather than getting one truly native equal-model engine.
Multi-model depth	5/10	Real depth in document-heavy environments plus strong surrounding services, but still not equal native depth across several first-class models.
Performance potential	8/10	Strong distributed performance profile in the workloads it targets.
Developer experience	7/10	Solid, especially if you buy into SQL++ and the surrounding platform.
Operational simplicity	7/10	Much cleaner than stitching many systems yourself, though still a substantial platform to run well.
Ecosystem / maturity	9/10	Mature, enterprise-known, operationally serious.
Production confidence	8/10	Strong for teams whose needs align with its shape, even if it is not the cleanest true multi-model engine.

Trust review:

High in its zone. Less compelling as a general answer to every multi-model ambition.

Query model:

Couchbase has a clearer center than many people expect. SQL++ gives it a SQL-like way to query JSON documents, and that matters. But the broader platform also depends on distinct services like Search, Analytics, and Eventing. That means the user experience is broader than a plain document store, while still not feeling like one single multi-model language across equal models.

Engine / storage shape:

Couchbase feels like a distributed document database that grew outward into a platform. That platform is substantial: query, search, analytics, eventing, and mobile sync are all real capabilities. But they are also separate services with their own architecture, which is why the system feels platform-first more than engine-pure.

How native is the multi-model claim?

Moderately native in the sense that the capabilities are part of the official platform, not random side tools. But it is still not native in the stricter sense of one engine equally expressing multiple models as first-class peers. It is closer to a document-first database plus first-party services than to a truly native multi-model engine.

Where it feels first-class:

Document-heavy applications
Mobile sync and edge-aware scenarios
Distributed operational deployments with a strong platform wrapper

Where it feels second-class:

Native multi-model purity
Graph-first or deeply unified model semantics
"one engine, many truly equal models" expectations

Best fit:

Teams centered on document workloads that also need search, analytics, events, and sync
Organizations that want a mature platform more than a category-pure database design

Poor fit:

Teams specifically looking for a graph-document-key-value engine designed as one native core from the beginning

Short verdict:

More substantial than hype-driven newcomers, especially for document-heavy and sync-aware systems, but closer to a document-first platform than to a truly native multi-model engine.

ArangoDB

ArangoDB is one of the clearest examples of a native multi-model database.

Supported models:

Model	Support	Notes
Document	Native	First-class part of the core design.
Graph	Native	First-class part of the core design.
Key-value	Native	Expressed through the same overall engine story.
Full-text search	Native	Via `ArangoSearch`, natively integrated into the database.
Vector	Native, newer	Officially supported, but still newer than the older document/graph/key-value core.
Time-series	Not core native model	Not a real native time-series identity in the way ArcadeDB, NodeDB, or SurrealDB claim it.

Its core story has been consistent for a long time: document, graph, and key-value in one engine, with one query language and a relatively coherent mental model.

That coherence is its real strength.

ArangoDB does not feel like a database that accidentally became multi-model. It feels designed around that identity. And unlike some younger systems, it has had enough time in the market to turn that identity into something operationally credible.

Its biggest advantage is discipline.

AQL gives it a strong center. The system is not pretending to be every data product at once. That restraint makes it feel more trustworthy than several flashier databases in this category.

Category	Score	Why
Native multi-model support	7/10	Clearly native and coherent, but with a narrower native model spread than the strongest broader true multi-model systems.
Multi-model depth	6/10	Document, graph, key-value, search, and newer vector support give it real strength, but not the same broader all-model depth as the strongest true multi-model engines.
Performance potential	8/10	Strong enough to take seriously, without feeling like a toy unifier.
Developer experience	8/10	Coherent query model, coherent mental model, less fragmentation than extension-led systems.
Operational simplicity	6/10	Reasonable, though not trivial.
Ecosystem / maturity	8/10	Mature enough to be credible without having PostgreSQL-scale gravity.
Production confidence	8/10	One of the stronger native multi-model bets.

Trust review:

High. Not because it promises the most, but because it promises a coherent set of things and mostly behaves like it means it.

Query model:

ArangoDB benefits enormously from AQL. That one choice gives it much of its credibility. AQL is not just a convenience layer. It is the reason document, graph, key-value, search, and newer vector support still feel closer to one database than to a bundle of unrelated capabilities.

Engine / storage shape:

ArangoDB is explicit about being a native multi-model database, and its storage engine story supports that claim better than many competitors. Its current storage engine is based on RocksDB, but that does not reduce it to an API wrapper over someone else’s database. The important point is that the engine and query model were designed around document, graph, key-value, search, and now vector support working together as one product.

How native is the multi-model claim?

Very strong. This is one of the clearest cases where the label matches the design, not just the marketing.

Where it feels first-class:

Native document + graph + key-value integration
Coherent query experience
Teams that value discipline over hype

Where it feels second-class:

Newer categories where broader market buzz has moved faster
Teams that want the biggest ecosystem gravity or default-enterprise familiarity

Best fit:

Engineers who want a real native multi-model database
Teams that care about coherence more than trendiness
Workloads where graph and document both matter

Poor fit:

Buyers who mainly optimize for market dominance, vendor comfort, or the broadest default talent pool

Short verdict:

One of the cleanest narrower native multi-model databases on the market, but not the broadest expression of what a true multi-model system could become.

OrientDB

OrientDB deserves more than a polite historical footnote.

Supported models:

Model	Support	Notes
Document	Native	Part of its original identity.
Graph	Native	Part of its original identity.
Key-value / object-style access	Native	Part of the broader platform identity.
Search	Supported	Present, but not a central reason people choose it.
Vector	Not original native identity	Not part of the original core story.
Time-series	Not original native identity	Not part of the original core story.

It was early, and in many ways it helped make the category legible.

Its blend of document and graph ideas was ahead of what many teams were ready to appreciate at the time, and the database still has a clear point of view: relationships should be first-class, and the graph model should not be bolted awkwardly onto a system that only really wants to be a document store.

More importantly, OrientDB matters because it is part of the lineage that leads directly to ArcadeDB.

Luca Garulli created OrientDB, and after SAP acquired OrientDB and the project lost momentum, he started ArcadeDB from scratch. That history matters because ArcadeDB is not just "another graph/document database." It is also a reaction to the limits of an earlier generation.

That said, being early is not the same as staying strong.

OrientDB still has a real multi-model identity, and its SQL-extended graph approach is more approachable than some people expect. But it does not feel like the strongest current answer anymore. Momentum matters. Ecosystem energy matters. Confidence compounds around the systems that keep attracting builders, operators, and production stories.

This is where OrientDB now feels split between historical importance and current competitiveness. Historically, it helped define the category. Today, it feels more like an older expression of the idea than the place where the idea is moving fastest.

Category	Score	Why
Native multi-model support	7/10	Clearly part of its design identity, but narrower and more dated than the strongest current native multi-model systems.
Multi-model depth	4/10	Real graph-document roots are there, but the current implementation now feels much shallower than the stronger systems in this category.
Performance potential	6/10	Respectable, though the category has moved on.
Developer experience	6/10	More approachable than some graph systems, but not especially modern-feeling now.
Operational simplicity	6/10	Usable, but not where it shines.
Ecosystem / maturity	5/10	Historically important, currently less energetic.
Production confidence	5/10	Still capable, but not where I would place the strongest trust today.

*Trust review: *

Medium-low. Historically important, still usable, but no longer the place I would look first for the strongest current answer.

Query model:

OrientDB took a pragmatic route that still deserves respect: use SQL, then extend it for graph behavior. That made the system more approachable than graph databases that demanded a whole new way of thinking from the user.

Engine / storage shape:

OrientDB’s identity was always more engine-native than extension-driven. Graph relationships as physical links were part of its pitch and part of why it felt different from databases that only simulated graph behavior through foreign keys or application logic.

How native is the multi-model claim?

Strong historically. This is not a fake multi-model database. The real problem is not the category claim. The problem is that the implementation no longer feels like the sharpest current form of the category.

Where it feels first-class:

Historical graph-document design thinking
SQL-extended graph accessibility
Category significance

Where it feels second-class:

Current momentum
Ecosystem energy
Feeling like the sharpest modern implementation

Best fit:

Readers trying to understand the category’s evolution
Teams already invested in OrientDB
Cases where its model still fits and organizational trust already exists

Poor fit:

Greenfield teams looking for the strongest current multi-model bet

Short verdict:

Important historically, still usable, but now much more of a middle-to-shallow legacy native system than a leading current expression of the idea.

ArcadeDB

ArcadeDB makes the most sense when you see it as a second attempt at part of the same problem.

Supported models:

Model	Support	Notes
Document	Native	First-class part of the core design.
Graph	Native	First-class part of the core design.
Key-value	Native	First-class part of the core design.
Search	Native engine capability	Part of the broader native engine story.
Vector	Native claim	Part of the broader native engine story, but still less validated than older core models.
Time-series	Native claim	Part of the broader native engine story, but still less validated than older core models.
Relational-style querying	SQL surface	Available through SQL, but not a classic relational engine identity.

That is why I would not compare it to OrientDB as if they were unrelated products from unrelated teams. ArcadeDB is what happens when the original creator of OrientDB comes back and says, in effect:

the category idea was right, but the implementation needs to be rebuilt on a stronger foundation.

That is the interesting part.

ArcadeDB is not just carrying over the old multi-model ambition. It is trying to fix engine-level problems, performance limits, and architectural constraints that older systems ran into. That is why the "from scratch" part matters. If you believe multi-model databases need tighter control over storage, execution, and concurrency to compete seriously, then ArcadeDB is one of the more credible efforts in the space.

Its native multi-model story is strong: graph, document,** key-value*, **search, **vector, and **time-series* are part of the same broader engine story, not just a loose product slogan. It also supports multiple query styles, which makes it unusually flexible for mixed workloads.

More importantly, the system looks like it cares about engine quality.

That matters because many multi-model databases are conceptually attractive but mechanically thin. ArcadeDB feels much closer to a database that actually wants to own the hard parts instead of only smoothing over them at the API layer.

Category	Score	Why
Native multi-model support	9/10	One of the strongest native stories in the category, with a broader native model claim than ArangoDB.
Multi-model depth	8/10	Serious scope and strong implementation ambition, with broader engine-level depth than most of the field even if some newer model claims still need more long-term validation.
Performance potential	8/10	The engine ambition is real, not just marketing.
Developer experience	7/10	Good enough, though not the smoothest or most mainstream experience.
Operational simplicity	6/10	Reasonable, but still not a default-operational choice for most teams.
Ecosystem / maturity	5/10	Under-recognized and still relatively small in ecosystem gravity.
Production confidence	8/10	The implementation quality and technical direction inspire real confidence, even if the market signal is weaker than larger better-funded names.

Trust review:

High on implementation quality, medium on market-proven confidence.

Query model:

ArcadeDB is unusually broad here. It supports SQL, Cypher, Gremlin, GraphQL, a Java API, and MongoDB query compatibility. Normally I would see that as a warning sign. Too many query surfaces often mean a weak center. In ArcadeDB’s case, the more interesting question is whether the engine underneath is good enough to justify that breadth.

Engine / storage shape:

This is where ArcadeDB gets serious. Its official positioning is not just "we support many models." It is "we natively store graphs, documents, key-value, search, vectors, and time-series in a single engine." It also emphasizes ACID transactions, a persistent journal/WAL, embedded mode, server mode, and Raft-based clustering. Whether every claim is equally mature is one question. But the architectural ambition is much closer to a real database-engine argument than to pure developer-experience packaging.

How native is the multi-model claim?

Very strong in intent and design. If you take the official architecture story seriously, ArcadeDB is one of the most native multi-model claims in the whole category. The remaining issue is not whether it is native enough. The issue is whether the rest of the market has validated that depth at scale.

Where it feels first-class:

Native multi-model ambition
Engine seriousness
Graph-document-key-value style workloads that benefit from tighter core control

Where it feels second-class:

Ecosystem gravity
Market awareness
Broad mainstream production trust

Best fit:

Engineers who care about engine quality
Teams open to a smaller ecosystem in exchange for stronger design conviction
Builders who see value in the OrientDB lineage but want a more modern rebuild

Poor fit:

Organizations that optimize primarily for ecosystem size, hiring familiarity, or mainstream adoption signals

Short verdict:

One of the strongest real multi-model designs available today, and better than its current market visibility suggests.

So which ones feel the most real?

If I strip away brand power, cloud packaging, and category marketing, the databases that feel most convincing as actual multi-model databases are:

ArcadeDB
ArangoDB - but with a narrower native model spread
SurrealDB - but with much more caution because breadth is not the same as depth
OrientDB - mostly for historical importance rather than current strength

If I include practical production reality, the picture changes:

PostgreSQL + extensions remains the most dependable broad platform
ArcadeDB is more promising than its market visibility suggests, especially if you care about engine quality more than market noise
ArangoDB is one of the cleanest native multi-model bets
Couchbase and Cosmos DB are strong if your priorities align with their platform shape
SurrealDB is still more of a bet on direction

That is the pattern I keep coming back to.

And it is also why I do not use a very loose definition of multi-model here.

For me, a true multi-model database is not just a system that does two or three models reasonably well. A lot of databases can do that.

The harder question is whether the database can natively support a broader set of serious data models, and whether those models still feel like they belong to one coherent engine instead of a stitched story.

That is why ArangoDB still earns respect from me, but it also explains why I do not treat it as the final expression of a broader true multi-model design. It is strong, but narrower.

The category is real. The need is real. But the implementations are not interchangeable.

Some systems are broad but not very native. Some are native but not mature enough. Some are production-safe but conceptually patchy. Some are elegant but still proving themselves.

Bottom-line summary table

Database	Native multi-model support	Multi-model depth	Performance potential	Developer experience	Operational simplicity	Ecosystem / maturity	Production confidence
PostgreSQL + extensions	4/10	7/10	8/10	8/10	4/10	10/10	10/10
SurrealDB	9/10	3/10	6/10	8/10	6/10	5/10	4/10
Couchbase	4/10	5/10	8/10	7/10	7/10	9/10	8/10
ArangoDB	7/10	6/10	8/10	8/10	6/10	8/10	8/10
OrientDB	7/10	4/10	6/10	6/10	6/10	5/10	5/10
ArcadeDB	9/10	8/10	8/10	7/10	6/10	5/10	8/10

Or if you prefer visual summary:

Why Azure Cosmos DB is not in the table

Azure Cosmos DB is often marketed as a multi-model database, but I do not think it belongs in the same bucket as ArangoDB, OrientDB, ArcadeDB, SurrealDB, or even PostgreSQL + extensions.

Cosmos DB is best understood as a managed distributed platform with multiple APIs and compatibility layers. That can be commercially strong. It can be operationally useful. It can be the correct product choice for many Azure-heavy teams.

But that is not the same thing as being a native multi-model database.

If a platform exposes different APIs over its storage engine, that does not automatically make it equivalent to a database that natively models graph, document, key-value, vector, or time-series semantics inside one coherent engine and query design.

In that looser definition, almost any serious platform can start claiming "multi-model" once enough compatibility layers, services, or access paths are added.

That is the category confusion I want to avoid here.

So I left Cosmos DB out of the actual review table.

Its real strengths are:

Managed global distribution
Enterprise cloud operations
API-level flexibility
Platform packaging

So where does NodeDB stand in all of this?

It is the new kid in the block, and that is exactly what makes it interesting to me.

It is trying to compete in a space full of established names, but it is not trying to copy their tradeoffs. The whole point is to chase both width and depth at the same time: broader native model support than the narrower systems, but with more serious engine-level realization than the systems that only look broad at the product layer. That is an ambitious target, and it is obviously much easier to say than to deliver. But that is also why it is worth talking about.

In the next post, I will shift from industry comparison back toward NodeDB and explain what I want to preserve, avoid, and do differently after looking at the current landscape.

What Is a Multi-Model Database and Why It Matters

Farhan Syah — Fri, 03 Apr 2026 22:13:27 +0000

If you work on modern applications long enough, you will eventually run into the term "multi-model database".

At first, it sounds simple. A database that supports more than one data model.

That is true, but it is still too vague to be useful.

A multi-model database is a database that lets you work with different kinds of data in one system instead of forcing you to split them across several databases from the start.

That usually means some combination of:

Relational data
Document data
Key-value access
Graph relationships
Time-series data
Vector embeddings

Not every multi-model database supports all of these. Some support only two or three. Some support more. The point is not "everything at once." The point is that one database tries to cover more than one kind of workload in a meaningful way.

Real applications rarely stay inside one clean data shape for long.

An application may start with relational data like users, teams, billing records, and permissions. Then it needs flexible profile data, so document storage becomes useful. Then it needs recommendation or fraud-style relationship analysis, so graph-like queries start to matter. Then search becomes important. Then event streams pile up. Then AI features arrive and now embeddings and vector similarity are part of the stack too.

At that point, the classic answer is simple: use multiple databases.

And sometimes that is still the right answer.

But it comes with a cost

You now have to move data between systems, learn different query models, maintain different operational habits, watch consistency boundaries more carefully, and decide which database is the source of truth for which feature. That can be fine for large teams with strong platform discipline. It is much less pleasant when the application is growing quickly, the team is small, or the architecture keeps changing.

That is why multi-model databases keep attracting attention.

They promise a simpler center of gravity. Instead of saying "use Postgres for this, Redis for that, Elasticsearch for this, Neo4j for that, and a vector store for something else," the multi-model pitch is: maybe more of this should live together.

What "multi-model" actually means

The easiest way to understand multi-model is to compare it with single-model thinking.

A classic relational database is optimized around tables, rows, columns, joins, and transactions. A document database is centered on JSON-like documents. A graph database is centered on nodes, edges, and traversals. A key-value store is centered on fast lookup by key.

Each model is good at something.

The problem is that applications do not care about database categories. Applications care about requirements.

If one product needs structured business records, flexible metadata, relationship-heavy queries, and semantic search, the team does not experience those as four separate academic categories. They experience them as one product that now has four kinds of data problems.

A multi-model database tries to reduce that split.

In the best case, it gives you one place to store and query different kinds of data without making you bolt together a whole mini data platform too early.

In the weaker version, it only gives you the label.

I will save that distinction for the next article.

Why it is needed

There are three common reasons people care about multi-model systems.

1. Application Complexity.

Modern software is not just CRUD over tables anymore. Even fairly normal products now mix operational records, user-generated content, events, relationships, search, recommendations, and AI features. The more a product grows, the harder it becomes to pretend one narrow data model will always be enough.

2. Business Evolution.

A product rarely knows all of its future data needs up front. A team may start with a clean transactional core, then later need analytics-like queries, richer content, recommendations, sync, graph-style relations, or vector search. If every new requirement forces a new database choice, the architecture keeps getting pulled apart over time.

3. Operational Cost.

Every extra database adds more than a feature. It adds deployment decisions, backup strategy, monitoring, scaling behavior, failure modes, access control, migration work, and another thing the team has to understand well enough to trust in production.

That is why multi-model keeps coming back as an idea.

It is not only about convenience. It is also about reducing unnecessary system sprawl.

When people started looking at it

Interest in multi-model databases did not suddenly appear in the 2020s.

The idea has been around for a long time, but the reasons for caring about it changed over time.

1. Early 2000s

You can already see early forms of this thinking in systems that tried to support multiple representations of data in one platform. MarkLogic is one of the best-known examples from that era. It started from document-oriented needs, especially XML, and later expanded to support JSON, RDF, search, and more within one system.

The label "multi-model database" was not as common yet, but the underlying idea was already there: one database, more than one useful model.

2. Late 2000s

Then NoSQL took off.

This period pushed developers to think beyond the relational model much more openly. Key-value stores, document databases, column-family systems, and graph databases all became more visible. Systems like Redis, MongoDB, Cassandra, and Neo4j helped define that shift. That was useful, but it also created the opposite problem: polyglot persistence started becoming normal.

In other words,

instead of one database doing more, teams often ended up with many databases doing different things.

That gave people more flexibility, but it also made architecture and operations messier.

3. Early to mid 2010s

This is when "multi-model database" became a much clearer category.

More systems started presenting themselves explicitly that way. Databases such as OrientDB and ArangoDB are common examples from that period. The message was straightforward: why force teams to choose between document, graph, key-value, and other models if one system can cover several of them?

This was a response to a real pain. Teams liked specialized databases, but they did not always like running so many of them.

4. Late 2010s

Cloud platforms helped make the idea more visible to a wider audience.

Azure Cosmos DB **is one of the clearest examples from this period. It pushed the idea of a globally distributed database platform that could expose multiple APIs and support different access patterns. **Couchbase also moved further in the direction of a broader platform story, mixing document workloads with search, analytics, events, and mobile synchronization.

By this point, multi-model was not just a niche database argument anymore. It had become part of mainstream architecture discussions.

2020+

The pressure increased again.

Applications did not become simpler. They became more mixed.

By the 2020s, many teams were dealing with some combination of transactional records, flexible content, search, event streams, sync requirements, graph-like data, and later vector embeddings. Even when they did not call all of that "multi-model," they were still feeling the problem that multi-model databases were trying to address.

You can see that in the kinds of systems getting attention during this period. SurrealDB gained interest by promising a unified model across documents, relations, and SQL-style querying. Couchbase kept pushing a broader platform story across document data, search, analytics, events, and mobile sync. Azure Cosmos DB remained a prominent managed multi-model option. And once vector search became part of normal product design, many teams started asking whether embeddings should live closer to the rest of their application data too.

AI workloads made it even more obvious.

Once vector search and embeddings started showing up in normal product design, many teams had to ask a new version of the same old question: do we add one more specialized system, or should more of this live together with the rest of the application data?

That is one reason the topic feels more relevant again today.

Does every application need a multi-model database?

No.

A lot of applications are still better served by a single strong database plus a few carefully chosen supporting tools. In many cases, that is the most sensible option.

And sometimes a specialized database really is the right tool. If graph traversal is the center of the product, or time-series scale is extreme, or vector retrieval dominates the workload, a focused system may still win.

So the point is not that multi-model is always better.

The point is that the demand is real because the underlying needs are real.

Multi-model databases exist because many teams are tired of solving one growing product with a growing pile of disconnected data systems.

That is the appeal.

Whether a specific database actually delivers on that promise is a separate question.

In the next post, I will compare some of the current multi-model databases and where they differ in practice.

What Kind of Database I Want NodeDB to Be

Farhan Syah — Fri, 03 Apr 2026 08:19:35 +0000

When I think about NodeDB, I am not thinking about the longest feature list or the flashiest demo.

I am thinking about a database I can trust before and after an application grows.

In the long run, I want NodeDB to be easy to use, reliable in different scenarios, and secure enough that I do not have to keep second-guessing it. I want it to be something I can start with early, keep using later, and not feel forced to replace once the project becomes more serious.

I should not have to rethink the whole stack every time product requirements change. I should not have to move data somewhere else just because a new use case shows up. I should not have to accept that one part of the database is “real” while another important part is just a workaround. If the business grows, the database should still feel like a stable base, not the next reason to re-architect.

But that is far in the future. The current reality is simpler: I am still building toward it.

Right now, my main concern is not polish. It is not making NodeDB look finished before it is finished. It is the foundation.

I want to build enough core capability early, and build it deeply enough, that I do not spend the next few years patching around missing pieces.

Many databases grow by accumulation. A feature becomes important, so it gets added. Another workload appears, so another layer gets introduced. Then another extension, another plugin, another wrapper, another sidecar. Over time, the system may cover more ground, but it does not always become more coherent.

From the user side, that has a cost. Query behavior becomes uneven. Operational expectations stop being consistent. One feature feels mature, another feels awkward, another works only if you accept a few strange rules. At that point, you are not really using one clean system anymore. You are managing the boundaries between several pieces that happen to live near each other.

That is one of the reasons PostgreSQL started feeling heavy for me across multiple projects.

PostgreSQL is good. Its ecosystem is strong. I am not arguing otherwise. But extensions do not magically become one deeply integrated system just because they run around the same database core. In practice, the burden shifts to the user. You are the one stitching capabilities together, working around different limitations, and dealing with the gaps between them.

I have seen a similar pattern in databases that try to unify more from the start.

SurrealDB has a vision I understand. But my concern is the same: I do not want a database to keep piling things on top if the foundation was not designed to carry them well. Systems should evolve, of course. That is normal. But there is still a difference between growing a system and collecting features.

That difference shows up in the user experience very quickly. Some capabilities exist, but they still feel second-class. The ergonomics are weaker. The query model is thinner. Performance is less predictable. Operations feel awkward. The feature works in a demo, but once it becomes central to a real workload, you start seeing the limits.

That is exactly what I want to avoid with NodeDB.

I want NodeDB to reduce re-architecture later instead of causing it. I do not want to reach the next stage of a product and realize that an important capability was treated as an afterthought, so now the stack has to be rearranged. I do not want core requirements to arrive later and collide with a design that was never meant to support them properly.

That is why I care so much about feature depth early.

Not because users need everything on day one. And not because I think I can build everything perfectly from the start. I cannot.

What I do believe is this: if an important capability is likely to matter sooner or later, I would rather think hard about how it belongs in the system early.

I am not interested in a product page that lists many features. I care about whether the database actually behaves like one cohesive system. I care about whether the features feel like they belong together. I care about whether it stays usable across different scenarios without pushing the user into constant redesign or workaround.

If a database claims to do everything, but half the capabilities feel weak, awkward, or fragile, that is not real completeness. I would rather build something deeper but longer than wider and shallower.

So the database needs to be dependable. It has to hold up when requirements expand. It has to help the user avoid unnecessary stack changes later.

Maybe this approach is wrong in some places. It is still my opinion, my bias, and my way of thinking through the problem.

But if NodeDB works, I want it to work in a way that still makes sense years later, not just in the first exciting demo.

In the next post, I will go deeper into the design direction behind that idea and why so many multi-model databases still feel wrong to me.

Repo: NodeDB

Why I'm Building NodeDB

Farhan Syah — Thu, 02 Apr 2026 21:26:16 +0000

For the last few years, PostgreSQL has been my default database.

Before that, I worked with MySQL, MariaDB, and MongoDB. But once I spent enough time with PostgreSQL, it became very hard to justify anything else for most projects. It gave me the relational model I wanted, plus JSON support that was good enough to remove a lot of my reasons for using MongoDB. When I needed spatial support, I could add PostGIS. When I needed time series and partitioning, I could use TimescaleDB. For a long time, that worked very well.

Then the workload started changing.

Over the last two years, AI and ML stopped being side concerns and started becoming part of real application requirements. That meant vector search became relevant. PostgreSQL still looked like the right answer because pgvector existed and, at first, it was good enough. But once I started using it across more serious workloads, I kept running into the same friction: scaling and performance concerns, filtering limitations, and dimension and storage constraints that mattered for my use cases at the time.

And vector was only one part of the problem.

Then came graph needs. At that point, the pattern became very familiar. I could keep stretching PostgreSQL. I could handle graph logic manually at the application level. I could try more extensions. I could wire more tools together. And yes, any one of those decisions can be justified if you are working on one project and you are willing to absorb the complexity.

But I am not working on one project.

I work on multiple projects every year, often with different requirements. That changes the economics completely. What looks reasonable in isolation turns into repeated operational and mental overhead when you keep doing it again and again. A couple of extensions are fine. Then you need another one. Then another workaround. Then another set of limitations, quirks, and edge cases to remember. Then offline-first and sync requirements enter the picture and now you are adding even more surrounding tools just to make the whole thing usable.

That was the real breaking point for me.

The problem was not that PostgreSQL stopped being good. The problem was that PostgreSQL plus extensions plus surrounding infrastructure started becoming a stack I had to keep rebuilding across projects. It worked, but the repetition was exhausting.

I started looking around.

Like many people in this space, I first looked at what already existed. If someone had already built the thing I wanted, I would rather use it than build a database from scratch.

I found SurrealDB. I liked the vision. I still think the direction is compelling: fewer hops, better developer experience, a more unified model. But when I looked deeper, especially at the implementation and tradeoffs, I was not convinced. From my perspective, it felt more like a patchwork than a database designed deeply from the ground up. Even in graph support, I did not find the level of capability I expected. The idea was attractive. The execution did not give me enough confidence.

Then I looked at ArcadeDB. In many ways, I thought it was stronger. Better coding quality, better performance characteristics, more substance. But it is JVM-based, and I wanted something smaller, tighter, and better suited to the kinds of embedded, mobile, offline-first, and mixed deployment scenarios I care about.

At that point, my realistic options looked like this:

Stick with PostgreSQL and keep stacking extensions. Work around another database that did not fully fit. Or accept a polyglot architecture and keep paying the integration cost.

None of those felt right to me.

So I chose a fourth option: build my own database.

That is how NodeDB started in 2025.

It started as a side project, and honestly, I did not have high expectations. If it worked, it worked. If it failed, it failed. That attitude was useful because this is not the kind of project you begin with false confidence.

I have already scrapped the project twice.

This current version is the third serious attempt, and I only started building it earlier this year. The first two failures were important. They forced me to understand what I was doing wrong, what I was hand-waving, and what needed to be designed properly from the beginning instead of patched later. I do not think I would have reached this version without those failures.

One thing I should mention briefly: I use AI heavily in the implementation.

The code is mostly written by AI, not by me typing everything manually. That is simply the practical reality. It writes faster than I do, and often better than I do at raw throughput. But I am still the one directing, reviewing, rejecting, and understanding it. That part matters to me. If I am going to build a database seriously and support it in the future, I need to understand it all the way down.

NodeDB exists because I wanted something I could actually use across real projects without rebuilding the same database stack every time.

I built it first to solve my own use cases, because that part is non-negotiable. If it does not solve my real problems, there is no point. But I also do not want to build a shallow personal tool that only works for me. I want to go deeper than that. I want something that can support broader use cases properly, with serious performance, serious design, and serious technical depth.

Right now, NodeDB is working for my use cases, but it is still evolving.

I have already tested it in pilot projects, and for the kinds of problems I built it to solve, it is starting to prove itself. That does not mean the journey is done. Far from it. A database only becomes real when the design holds under pressure, when the tradeoffs are honest, and when the implementation can stand up over time.

That is the challenge I have chosen.

Will I make it? Time will tell.

But this is the journey I am on, and I am going to share it openly: the design decisions, the mistakes, the database ideas, the tradeoffs, and the lessons I learn along the way.

If you care about database engineering, multi-model systems, offline-first architecture, or the hard tradeoffs behind building a database from scratch, follow this journey.

I will be sharing what works, what fails, what I have to redesign, and what I learn from trying to make NodeDB real.

If that sounds interesting, follow me here on dev.to and keep an eye on the next posts. I am just getting started.

Repo: NodeDB

numr 0.5.0: The Rust numerical computing library that doesn't make you choose

Farhan Syah — Sat, 14 Mar 2026 20:15:39 +0000

Last year, I started building numr because I was frustrated.

ml-rust / numr

A high-performance numerical computing library for Rust with GPU acceleration, inspired by Numpy

numr

Foundational numerical computing for Rust

numr provides n-dimensional tensors, linear algebra, FFT, statistics, and automatic differentiation—with native GPU acceleration across CPU, CUDA, and WebGPU backends.

numr is like Numpy in Rust but built with gradients, GPUs, and modern dtypes built-in from day one.

What numr Is

A foundation library - Mathematical building blocks for higher-level libraries and applications.

numr IS	numr is NOT
Tensor library (like NumPy's ndarray)	A deep learning framework
Linear algebra (decompositions, solvers)	A high-level ML API
FFT, statistics, random distributions	Domain-specific
Native GPU (CUDA + WebGPU) + autograd

For SciPy-equivalent functionality (optimization, ODE, interpolation, signal), see solvr.

Why numr?

vs NumPy

Capability	NumPy	numr
N-dimensional tensors	✓	✓
Linear algebra, FFT, stats	✓	✓
Automatic differentiation	✗ Need JAX/PyTorch	✓ Built-in `numr::autograd`
GPU acceleration	✗ Need CuPy/JAX	✓ Native CUDA + WebGPU
Non-NVIDIA GPUs	✗ None	✓ AMD, Intel, Apple via WebGPU
FP8 /

…

View on GitHub

I wanted to do numerical computing in Rust — tensors, linear algebra, FFT, gradients — on GPUs. Not just NVIDIA GPUs. Any GPU. And I didn't want to glue together five incompatible crates to do it.

Python didn't plan for this either. NumPy emerged organically, and it took years of bolting on CuPy, JAX, and PyTorch before Python had GPU compute and autograd — scattered across incompatible libraries.

Some people say fragmentation is fine. Separate crates for separate concerns — that's the Unix philosophy. And I'd agree, if they shared conventions, types, and backends. But they don't. ndarray gives you tensors but no GPU. nalgebra gives you linear algebra but no autograd. rustfft gives you FFT but nothing else. Different types, different idioms, none of them compose.

So the burden falls on you — the application developer. You're the one writing adapter layers between crates. You're the one figuring out why this tensor type doesn't work with that decomposition. And when you need GPU support or a missing operation? You're filing issues and PRs upstream, waiting for maintainers, before you can get back to building your actual application.

numr takes that burden off you. One library, one tensor type, one API — tensors, linalg, FFT, statistics, autograd, GPU. numr will handle the hard part. You can just focus on building your application.

one library, one API, every backend. Write your code once. Run it on CPU with AVX-512. Run it on NVIDIA with native CUDA kernels. Run it on AMD, Intel, or Apple silicon through WebGPU. Same code. Same results.

Today, numr 0.5.0 ships. And it's the release where it stopped being a "promising project" and became something you can actually build on.

What changed

Fused kernels — because memory bandwidth is the real bottleneck

The single biggest performance win in GPU computing isn't faster math. It's reading memory fewer times.

A naive softmax reads your tensor five times: max, subtract, exp, sum, divide. A fused softmax reads it once. For large tensors, that's not a 5x difference (the math is cheap), but it's easily 2-3x.

0.5.0 adds fused kernels for the operations that matter most:

GEMM epilogue: matmul + bias + activation in one kernel launch. This is the inner loop of every neural network. Forward and backward.
Activation-mul: for gated architectures like SwiGLU that power modern LLMs. One read instead of three.
Add-norm: residual connection + normalization fused together. The other operation you hit every single transformer layer.

All of these work on CPU, CUDA, and WebGPU. All of them have backward passes for autograd.

FP8 and quantized compute — because not everything needs 32 bits

FP8 isn't just "smaller numbers." It's the difference between fitting a model in VRAM or not. Between one GPU and two.

numr now does FP8 matrix multiplication natively — E4M3 and E5M2 formats, across all backends. No external libraries. No NVIDIA-only restrictions.

We also added i8×i8→i32 quantized matmul on CPU. This is what powers efficient quantized inference when you don't have a GPU.

2:4 structured sparsity — because half your weights are probably zero

NVIDIA's Ampere architecture introduced hardware support for 2:4 sparsity: for every group of 4 weights, exactly 2 are zero. The hardware skips them, doubling throughput for free.

numr 0.5.0 supports 2:4 structured sparsity across all backends. On CUDA, it hits the hardware fast path. On CPU and WebGPU, it uses optimized sparse kernels.

Autograd that actually covers what you need

Previous releases had autograd for basic operations. 0.5.0 makes it comprehensive:

conv1d, conv2d, softmax, rms_norm, layer_norm, SiLU, softplus, SwiGLU, dropout, the fused GEMM epilogue, fused add-norm, dtype cast, narrow, cat, gather — all differentiable, all with correct backward passes, all supporting second-order derivatives.

Activation checkpointing lets you trade compute for memory. Backward hooks let you trigger distributed gradient sync during backprop.

This isn't an ML framework. It's the autograd engine that ML frameworks build on.

A CUDA backend that acts like it belongs there

The CUDA story got serious in 0.5.0:

Caching allocator. CUDA memory allocation is expensive. The old approach (stream-ordered allocation) worked but left performance on the table. The new Rust-side caching allocator reuses memory blocks, cutting allocation overhead dramatically.

Graph capture. Record a sequence of kernel launches once, replay it with zero overhead. Essential for inference serving where you run the same computation thousands of times.

GEMV fast paths. When one matrix dimension is small (which happens constantly during inference — batch size 1), you don't want full tiled GEMM. Specialized GEMV kernels for transposed weight matrices avoid unnecessary work.

Pipelined D2H copy. Overlap GPU computation with data transfer back to the host. The GPU doesn't wait for the CPU, the CPU doesn't wait for the GPU.

Why 0.5.0 matters

This is where numr crosses the threshold from "interesting foundation" to "you can build real things on this." And we know because we did.

0.5.0 has been validated against real downstream consumers. solvr — a scientific computing library with optimization, ODE solvers, and interpolation — builds and runs on numr 0.5.0. boostr — an ML framework with attention, MoE, and Mamba blocks — builds and runs on it too. LLM inference and embedding generation work end-to-end.

This isn't a library that passes unit tests in isolation. It's a library that other libraries are built on, and those libraries work.

The fused kernels mean you're not leaving performance on the table. The autograd coverage means you can differentiate through realistic computation graphs. The CUDA infrastructure means GPU workloads actually perform. And all of it works the same across CPU, CUDA, and WebGPU.

What's next

0.5.0 unblocks new releases of solvr (scientific computing — optimization, ODE solvers, interpolation) and boostr (ML framework) which both build on numr.

For numr itself, 0.6.0 focuses on hardening: cleaning up error handling, API stability audit, and preparing for an eventual 1.0.

ROCm (native AMD GPU) is on the roadmap for 0.7.0+.

Try it

[dependencies]
numr = "0.5.0"

# With GPU support
numr = { version = "0.5.0", features = ["cuda"] }
numr = { version = "0.5.0", features = ["wgpu"] }

GitHub: github.com/ml-rust/numr
Crates.io: crates.io/crates/numr

numr is Apache-2.0 licensed. Contributions welcome.

[Boost]

Farhan Syah — Wed, 11 Mar 2026 20:32:38 +0000

Why I Built RDX: Bringing Modern "Docs-as-Code" to the Rust Ecosystem

Farhan Syah ・ Mar 11

#ai #webdev #programming #rust

Why I Built RDX: Bringing Modern "Docs-as-Code" to the Rust Ecosystem

Farhan Syah — Wed, 11 Mar 2026 20:22:11 +0000

For more than 10 years, I lived and breathed the Node.js ecosystem. I built applications using Node, Bun, and especially Svelte. I loved it. I still do—I’ve never been of the opinion that JavaScript or Node.js is "bad." Tools like Astro, MDX, and SvelteKit are genuinely phenomenal.

But a while ago, my work shifted. I needed more control at a lower level, which led me to Rust. I’ve been using Rust full-time for a while now, and honestly? I don’t plan on going back.

However, moving to a new ecosystem always exposes what’s missing. In Rust, one of the biggest glaring holes is public-facing documentation tooling.

Don't get me wrong: rustdoc and docs.rs are incredible. They are arguably the cleanest, best ways to document source code in the industry. But API documentation isn't product documentation. When you need to build public-facing docs—with rich tutorials, interactive API playgrounds, custom callouts, and interactive tabs—Rust falls short.

You usually end up using mdBook, which is great but visually basic. If you want a modern, interactive documentation site that rivals Stripe or Vercel, you are forced to leave Rust and go back to Python (MkDocs) or the Node.js ecosystem (Docusaurus, Mintlify, Nextra).

I wanted to keep my stack 100% Rust. I didn't want to maintain a package.json just to write my documentation.

I decided to build my own Static Site Generator (SSG) in Rust that runs on WebAssembly (WASM) to fully utilize Rust in the browser. But right out of the gate, I hit a massive blocker: the format.

The MDX and Markdoc Dilemma

Standard Markdown (.md) is too limited. You can't build rich, interactive UI components with it.

The industry standard is MDX. But MDX is tightly coupled to JavaScript. It is inherently imperative—it executes code. Trying to force a Rust backend to safely parse, execute, and render React-based MDX is a nightmare.

Then there is Markdoc (by Stripe). Markdoc gets the philosophy exactly right: documents shouldn't execute code; they should be declarative data. But Markdoc is written entirely in TypeScript/JavaScript. Writing a Rust wrapper around a JS library, or trying to port a massive, moving TS codebase to Rust, felt counter-productive.

I needed a native, high-performance implementation written in Rust.

Introducing RDX: Reactive Document eXpressions

I realized that before I could build the generator, I needed the language. So, I designed RDX.

RDX has everything standard Markdown has, but it supports strict, declarative component rendering. It uses the familiar HTML/JSX-like syntax (<Notice type="warning">) that authors are used to, but it fundamentally treats documents as pure data. No import statements, no JavaScript execution. Just a clean, strictly typed Abstract Syntax Tree (AST).

I didn't just want to build a Rust crate, though. I started by writing a proper, formal specification. I did this so that while I was building the official Rust implementation, anyone else could read the spec and build an RDX parser in Go, Python, or Zig tomorrow.

After finalizing the spec, I built the official tools. Today, I'm thrilled to release them:

rdx-parser: The core parsing engine built on top of pulldown-cmark.
rdx-ast: The strictly typed data structures.
rdx-schema: A validation engine that guarantees your authors don't use fake props or components.
I've even included a CLI tool to help people convert their existing MDX files to RDX, verify schemas, and more.

Everything is open-sourced, and an be viewed here:

rdx-lang · GitHub

rdx-lang has 4 repositories available. Follow their code on GitHub.

github.com

You can start writing RDX today. In fact, I've already built and published a VS Code/VSCodium extension for syntax highlighting to make authoring a breeze.

The Missing Piece: Rendering (And a Sneak Peek)

Right now, we have the parser, the AST, and the editor support. The only thing missing is the rendering software to turn these .rdx files into a beautiful website.

Don't worry, I'm building that right now.

I am currently developing a next-generation SSG. It will consume your RDX files and generate a documentation site that rivals Docusaurus, Mintlify, and MkDocs. The best part? It uses Rust and WASM to deliver high speed build times and interactive components without ever touching npm.

Built for the AI Era

There is one final reason I believe RDX is the future of "Docs as Code."

RDX is incredibly AI-friendly. If you ask an LLM to write MDX, it frequently hallucinates JavaScript imports or breaks the build with syntax errors. If you ask an LLM to write Markdoc, it struggles with the custom Liquid-style tags.

But LLMs excel at writing standard HTML tags with typed attributes. Because RDX isolates components as pure data and pairs them with rdx-schema validation, you can autonomously generate documentation via AI and validate it instantly at build time. An RDX-powered AI documentation pipeline will beat an MDX or Markdoc pipeline in stability every single time.

But, it will only be like that if everything is done correctly, and I get all the support from the community.

I hope RDX can become the new standard for documentation. We finally have a way to write rich, interactive content without sacrificing the safety, speed, and tooling of the Rust ecosystem.

Check out the repo, read the spec, and stay tuned. The renderer is coming next.