Forem: Mohamed Hussain S

Why Too Many Parts Hurt ClickHouse Performance

Mohamed Hussain S — Mon, 25 May 2026 14:00:25 +0000

A lot of people initially think ClickHouse performance problems come from:

large queries
bad joins
massive datasets
missing indexes

And honestly, those things can matter.

But one of the most common operational problems in ClickHouse often starts much earlier:

too many tiny parts.

This is one of those issues that usually stays invisible at first.

Then suddenly:

merges fall behind
queries slow down
memory usage increases
inserts become unstable

And the cluster starts behaving strangely.

Every Insert Creates Parts

This is the first thing that’s important to understand.

In MergeTree-based engines, ClickHouse stores data as immutable parts.

Something as simple as:

INSERT INTO events VALUES (...);

creates new parts on disk.

And this is completely normal.

ClickHouse is designed around this storage model.

So:

parts themselves are not the problem.

The real issue starts when parts begin accumulating faster than merges can stabilize them.

Why Tiny Inserts Become Dangerous

At smaller scale, tiny inserts may seem harmless.

For example:

inserting row-by-row
extremely frequent micro-batches
tiny streaming flush intervals

Initially:

everything still works.

But over time, the number of parts starts growing aggressively.

Now ClickHouse has to manage:

more metadata
more merges
more scheduling
more file operations

This creates operational overhead.

Meaning:

the system starts spending increasing resources managing fragmentation itself.

Why Merges Matter So Much

ClickHouse relies heavily on background merges.

These merges:

combine smaller parts
reduce fragmentation
improve compression
optimize query performance

Under healthy ingestion patterns, merges naturally keep the system stable over time.

That is the ideal state.

But problems start when:

parts created per second
        >
parts merged per second

Now fragmented parts begin accumulating faster than ClickHouse can compact them.

And this is usually where instability slowly starts building.

The Dangerous Part Is That It Builds Slowly

This is what makes the issue tricky operationally.

You usually do not notice the problem immediately.

The cluster may look perfectly healthy initially.

Then gradually:

insert latency increases
merges lag behind
CPU usage becomes unstable
queries become heavier
replication slows down

And eventually ClickHouse may start throwing errors like:

Too many parts

At that point, the merge system is already under serious pressure.

Queries Also Become More Expensive

A lot of people think parts only affect inserts.

But queries suffer too.

Because queries now need to:

open more parts
scan more metadata
coordinate more files

Even when the actual dataset itself is not massive.

So sometimes:

performance degradation comes more from fragmentation than raw data volume.

That is a very important operational insight.

FINAL Does Not Really Solve This

One thing that’s important to understand:

FINAL is not really a solution for too many parts.

For example:

SELECT *
FROM events FINAL;

FINAL applies merge logic during query execution.

But the fragmented parts still physically exist underneath.

So if the system already has excessive fragmentation:

queries still scan many parts
merge pressure still exists
query execution can become heavier

Which means:

FINAL can actually become more expensive when fragmentation becomes unhealthy.

The real fix is usually improving ingestion and merge behavior itself.

Over-Partitioning Can Quietly Make This Worse

Another thing that often accelerates part explosion is overly granular partitioning.

For example:

PARTITION BY toYYYYMMDDhh(timestamp)

instead of something broader like:

PARTITION BY toYYYYMM(timestamp)

Now even small inserts may create parts across many partitions simultaneously.

Which means:

a single insert can end up creating multiple fragmented parts underneath.

And over time, merge pressure increases much faster than expected.

ClickHouse Also Has Ways to Help

Modern ClickHouse versions also support features like async inserts to help reduce excessive tiny-part creation.

Instead of immediately flushing every small insert into separate parts, ClickHouse can buffer inserts internally before writing larger parts to disk.

This helps reduce fragmentation and merge pressure in workloads that naturally produce smaller inserts.

But async inserts are not a replacement for healthy ingestion patterns themselves.

Stable batching still matters a lot.

Why Batch Size Matters So Much

ClickHouse generally performs much better with:

larger batches
fewer inserts
healthier merge behavior

Because fewer parts means:

fewer merges
lower metadata overhead
better compression
more efficient scans

This is one of the reasons ClickHouse ingestion patterns often look very different from traditional OLTP systems.

Too Many Parts Also Affects Startup and Recovery

Another thing people often discover late:

Large numbers of parts also affect:

startup time
replication recovery
metadata loading
server restarts

Because ClickHouse now has to:

scan part metadata
validate parts
rebuild internal state

before the server becomes fully operational again.

So the issue is not just query performance.

It becomes an overall operational stability problem.

The Important Lesson

One thing I’ve noticed with ClickHouse is that many performance problems are actually merge-management problems underneath.

And too many parts is one of the clearest examples of that.

Because the issue usually is not:

“ClickHouse cannot handle large data.”

The issue is more often:

fragmentation and merge pressure slowly became unhealthy.

That is a very different operational problem.

Final Thought

ClickHouse is extremely good at handling massive analytical workloads.

But it performs best when the storage engine is allowed to merge parts efficiently.

And sometimes the biggest performance problem is not the query itself.

It is the thousands of tiny fragmented parts quietly building underneath the system over time.

Why Real-Time Analytics Eventually Changes Your Database Architecture

Mohamed Hussain S — Tue, 19 May 2026 16:36:19 +0000

A lot of systems begin with a single database.

Usually PostgreSQL.

And honestly, in the beginning, that works perfectly fine.

The application stores:

users
payments
inventory
authentication
operational state

Dashboards query the same database.

Analytics queries also run directly on PostgreSQL.

Everything feels simple.

The Problem Usually Starts Slowly

At first, analytical queries are small.

Maybe:

daily reports
lightweight aggregations
small dashboards

Nothing too serious.

But over time, systems start generating:

more events
more metrics
more logs
more historical records
more observability data

And analytical workloads start behaving very differently from transactional workloads.

For example:

SELECT
    service_name,
    avg(response_time_ms)
FROM metrics
WHERE timestamp >= now() - INTERVAL 30 DAY
GROUP BY service_name;

This is a very different kind of workload from:

UPDATE inventory
SET stock = stock - 1
WHERE product_id = 101;

One is trying to preserve operational correctness.

The other is trying to analyze huge amounts of historical data.

And eventually those workloads start colliding.

PostgreSQL Slowly Becomes Responsible for Everything

This is where things usually start getting interesting.

A lot of systems unintentionally turn PostgreSQL into:

the transactional database
the reporting database
the analytics database
the observability database

all at the same time.

And honestly, modern PostgreSQL is capable enough that this can work surprisingly well for a while.

Until:

dashboards become heavier
retention windows grow
analytical scans become larger
observability traffic increases
aggregations become expensive

Now suddenly the same database handling:

payments
authentication
users
inventory

is also handling large analytical workloads.

And this is usually where architectural pressure starts building.

The Real Problem Is Workload Isolation

This is honestly the biggest lesson.

The issue is usually not:

“PostgreSQL is slow.”

The issue is:

transactional workloads and analytical workloads optimize for completely different things.

Transactional systems care heavily about:

consistency
operational latency
updates
row-level modifications
business correctness

Analytical systems care heavily about:

large scans
aggregations
compression
historical analytics
query throughput

Those are fundamentally different workload patterns.

And eventually trying to optimize one database perfectly for both becomes painful.

Why Observability Changes Everything So Quickly

One thing I find interesting is how fast observability workloads expose architectural limitations.

Because observability systems continuously generate:

logs
metrics
traces
events

And these workloads grow aggressively over time.

Now imagine running:

large aggregations
historical scans
high-cardinality queries
real-time dashboards

on the same database handling:

authentication
inventory
operational business logic
transactional traffic

At smaller scale this may still work.

At larger scale:

query contention increases
operational latency becomes sensitive
workload isolation becomes harder

And eventually systems start evolving toward separation.

This Is Usually When Analytical Databases Start Appearing

At some point, many systems evolve toward something like this:

Application
    ↓
PostgreSQL
    ↓
CDC / Kafka / Airbyte
    ↓
ClickHouse / OLAP DB
    ↓
Analytics / Dashboards / Observability

This pattern has become extremely common in modern analytical systems.

And honestly, the reason is pretty simple:

PostgreSQL remains responsible for operational correctness.

ClickHouse becomes responsible for analytical scale.

Each system handles the workload it was actually designed for.

Not All Analytical Data Needs PostgreSQL First

One important thing though:

Not all analytical data even originates from PostgreSQL.

A lot of observability workloads:

logs
metrics
traces
telemetry events

often flow directly into ClickHouse/OLAP DB through streaming pipelines.

Something like:

Applications / Services
        ↓
Kafka / Streaming Pipelines
        ↓
ClickHouse / OLAP DB

In many systems, PostgreSQL stores the business data while ClickHouse directly handles logs, metrics, events, and analytical workloads.

And honestly, this makes a lot of sense.

Because analytical systems are usually optimized for:

append-heavy ingestion
historical querying
event-style workloads

not transactional business operations.

Why Not Just Use ClickHouse for Everything?

This is another common misunderstanding.

ClickHouse is incredible for analytical workloads.

But transactional systems still require things like:

frequent updates
operational consistency
transactional guarantees
row-level modifications
business-critical correctness

Those are not the primary design goals of analytical databases.

You generally do not want your:

authentication system
payment workflows
inventory updates
operational application state

depending entirely on analytical database behavior.

Why CDC Pipelines Become So Important

One reason this architecture became so practical is CDC (Change Data Capture).

Instead of repeatedly exporting data manually, systems continuously stream changes from PostgreSQL into analytical systems using:

Kafka
Debezium
Airbyte
streaming pipelines

That means:

operational systems continue working normally
analytical systems receive near real-time data
workloads stay separated cleanly

And analytical queries no longer compete directly against transactional traffic.

Don’t Rush Into Multi-Database Architectures

One important thing though:

Most systems do not need Kafka + ClickHouse pipelines on Day 1.

Honestly, many applications can scale surprisingly far with PostgreSQL alone using:

proper indexing
query optimization
read replicas
partitioning
extensions like Citus

The goal is not to introduce more infrastructure as early as possible.

The real signal usually appears when analytical workloads start affecting operational user experience.

That is often when workload separation starts becoming worth the additional architectural complexity.

Because systems like:

CDC pipelines
Kafka
analytical databases

also introduce operational overhead of their own.

And good architecture is usually about introducing complexity only when the workload actually demands it.

The Bigger Engineering Lesson

Most systems do not start with multiple databases.

They evolve into them as workloads grow.

Transactional workloads and analytical workloads behave very differently at scale.

And eventually systems start separating:

operational correctness
analytical querying
observability workloads
historical analytics

into infrastructure optimized for each workload.

Final Thought

A lot of modern systems do not start with multiple databases.

They evolve into them.

Because transactional workloads and analytical workloads eventually want very different things from the same infrastructure.

And real-time analytics is often the thing that forces that architectural separation to happen.

FINAL in ClickHouse Isn’t as Expensive as It Used to Be

Mohamed Hussain S — Thu, 14 May 2026 16:04:00 +0000

For a long time, the advice around FINAL in ClickHouse was pretty straightforward:

Avoid it whenever possible.

And honestly, that advice existed for good reasons.

Older versions of ClickHouse could make FINAL extremely expensive depending on:

table size
partitioning
number of parts
merge state
query patterns

So people started treating FINAL almost like a red flag.

But modern ClickHouse has changed a lot.

And I think the conversation around FINAL deserves a bit more nuance now.

Why FINAL Existed in the First Place

To understand why FINAL was historically considered expensive, you first need to understand what it actually does.

In engines like:

ReplacingMergeTree
CollapsingMergeTree
VersionedCollapsingMergeTree

ClickHouse does not immediately rewrite rows in place.

Instead:

inserts create new parts
background merges reconcile rows later
deduplication happens asynchronously

That means queries can temporarily see:

duplicate versions
old versions
intermediate states

Example:

SELECT *
FROM users
FINAL;

FINAL forces ClickHouse to apply merge logic during query execution itself.

That means the query may:

read more data
perform additional deduplication work
consume more CPU and memory

This is why older advice strongly discouraged using it everywhere.

The Old FINAL Problem

Historically, FINAL could become painful on large datasets.

Especially when:

partitions were large
too many parts existed
merges lagged behind
queries scanned massive ranges

People would add:

FINAL

to "fix" duplicate rows without understanding why duplicates existed in the first place.

The result was often:

slower queries
higher memory usage
unnecessary query overhead

So the community advice became:

Design your schema properly and avoid FINAL whenever possible.

And honestly?

That advice still matters.

But the implementation of FINAL itself has improved significantly over time.

Modern ClickHouse Has Improved FINAL a Lot

Recent ClickHouse versions introduced multiple improvements around FINAL.

Things like:

parallel execution
partition-aware optimizations
improved memory behavior
smarter merge execution
reduced unnecessary reads

Which means:

FINAL is no longer the monster it used to be.

And this is important because newer ClickHouse guidance has also become more practical about using it when necessary.

Even in some recent discussions and office hours from the ClickHouse ecosystem, using FINAL for latest-state queries is no longer treated as automatically wrong.

That would have sounded controversial a few years ago.

FINAL vs argMax Isn’t Always a Simple Comparison

For a long time, many ClickHouse users avoided FINAL by using patterns like:

SELECT
    id,
    argMax(status, version)
FROM users
GROUP BY id;

And honestly, for older ClickHouse versions and large workloads, that often made sense.

But modern ClickHouse has improved FINAL significantly enough that the tradeoff is no longer as one-sided as it used to be.

In some latest-state query scenarios, using FINAL can now be:

simpler
easier to maintain
and completely reasonable

depending on:

table size
partitioning
query filters
merge behavior

The important part is understanding the workload instead of blindly following older rules.

So… Is FINAL Safe to Use Now?

This is where nuance matters.

The answer is not:

"FINAL bad"

and also not:

"FINAL free now"

The real answer is:

FINAL is much more practical in modern ClickHouse, but workload design still matters.

That distinction is important.

Where FINAL Makes Sense

There are legitimate cases where FINAL is completely reasonable now.

For example:

latest-state queries
smaller partitions
low-latency analytical workloads
deduplicated views over mutable datasets
operational analytics

Especially when using:

proper partitioning
controlled part counts
optimized schemas

In these cases, modern ClickHouse handles FINAL much better than older versions did.

Where FINAL Can Still Hurt

Even with improvements, FINAL is not magically free.

It can still become expensive when:

scanning huge datasets
querying many partitions
merges are heavily delayed
part counts explode
schema design is poor

For example:

SELECT *
FROM massive_events_table
FINAL
WHERE timestamp >= now() - INTERVAL 30 DAY;

On very large analytical datasets, this can still force substantial extra work.

So blindly adding FINAL everywhere is still not a great idea.

SELECT ... FINAL vs OPTIMIZE TABLE ... FINAL

One important distinction:

SELECT * FROM users FINAL;

and

OPTIMIZE TABLE users FINAL;

are completely different operations.

SELECT ... FINAL applies merge logic during query execution.

OPTIMIZE TABLE ... FINAL forces a heavy merge operation on storage parts themselves.

The first is a query-time behavior.

The second is a storage-level operation that can become extremely expensive on large datasets.

People often mix these two together when discussing FINAL performance, but they solve very different problems.

The Bigger Lesson Is Understanding Why You Need FINAL

This is honestly the most important part.

A lot of people use FINAL reactively.

They see:

duplicate rows
outdated versions
inconsistent query results

and immediately add:

FINAL

without understanding:

merge behavior
part lifecycle
asynchronous deduplication
storage engine behavior

That usually creates larger problems later.

The better approach is:

Understand why the table requires FINAL in the first place.

Because sometimes:

the schema can improve
partitioning can improve
merges can stabilize naturally
query design can change

And sometimes:

using FINAL is actually perfectly acceptable.

ClickHouse Advice Evolves Too

One thing I find interesting about ClickHouse is how quickly operational advice evolves as the engine improves.

Advice that was absolutely correct for older versions can become incomplete later.

And I think FINAL is one of the best examples of that.

Older guidance:

avoid FINAL aggressively

Modern reality:

understand FINAL properly before deciding whether to avoid it

That is a much more useful mental model now.

Final Thought

I still would not recommend blindly adding FINAL everywhere.

But I also do not think modern ClickHouse users should automatically treat it like a disaster anymore.

The real question is not:

"Is FINAL bad?"

The real question is:

"Why does this query need FINAL, and is that tradeoff acceptable for this workload?"

That mindset leads to much better ClickHouse designs than simply following old rules blindly.

References

ClickHouse Docs - FINAL Modifier

Altinity KB - FINAL Clause Speed

Why PostgreSQL and ClickHouse Work So Well Together

Mohamed Hussain S — Mon, 11 May 2026 09:26:10 +0000

A lot of people compare PostgreSQL and ClickHouse like they are competing databases.

They really are not.

In fact, modern data systems often use both together.

And once you understand what each database is optimized for, the reason becomes pretty obvious.

PostgreSQL and ClickHouse Solve Different Problems

The biggest mistake people make is expecting both databases to behave similarly.

They are built for entirely different workloads.

PostgreSQL is primarily an OLTP database.

ClickHouse is primarily an OLAP database.

That single difference changes almost everything about how they think internally.

PostgreSQL Thinks About Transactions First

PostgreSQL is extremely good at handling transactional workloads.

Things like:

user data
payments
inventory
banking records
order systems
application state

These are systems where:

consistency matters
updates happen frequently
rows are modified constantly
transactions must be reliable

For example:

UPDATE inventory
SET stock = stock - 1
WHERE product_id = 101;

This kind of workload is where PostgreSQL shines.

You want:

ACID guarantees
reliable transactions
row-level updates
strong consistency

PostgreSQL is designed around exactly that.

ClickHouse Thinks About Analytics First

ClickHouse approaches data very differently.

Instead of optimizing for frequent row updates, it optimizes for analytical queries across massive datasets.

Things like:

metrics
observability
logs
event streams
analytical dashboards
time-series workloads

For example:

SELECT
    service_name,
    avg(response_time_ms)
FROM metrics
WHERE timestamp >= now() - INTERVAL 1 HOUR
GROUP BY service_name;

This is a completely different style of workload.

Instead of:

modifying small numbers of rows

ClickHouse is optimized for:

scanning huge amounts of data efficiently
aggregating billions of records
compressing analytical datasets
fast columnar reads

PostgreSQL Stores the Business. ClickHouse Explains It.

This is honestly the simplest way I think about it now.

PostgreSQL usually stores:

current application state
transactional business data
operational records

ClickHouse usually stores:

analytical history
events
metrics
large-scale queryable telemetry

One powers the application.

The other explains what the application is doing.

Why They Commonly Exist Together

This is where things get interesting.

In many modern architectures, PostgreSQL becomes the operational source of truth.

Then data flows into ClickHouse for analytics.

Something like this:

Application
    ↓
PostgreSQL
    ↓
CDC / Airbyte / Kafka
    ↓
ClickHouse
    ↓
Dashboards / Analytics / Observability

This pattern is far more common than many people realize.

Because each database is doing what it is best at.

Why Not Just Use PostgreSQL for Analytics?

PostgreSQL can do analytical queries.

But analytical workloads behave very differently from transactional workloads.

For example:

scanning billions of rows
large aggregations
observability queries
real-time analytics
historical trend analysis

These workloads stress databases differently.

ClickHouse is optimized around:

columnar storage
vectorized execution
aggressive compression
analytical query execution

That is why queries over huge datasets often feel dramatically faster in ClickHouse.

Why Not Just Use ClickHouse for Everything?

This is another common misunderstanding.

ClickHouse is incredible for analytics.

But transactional systems require things like:

frequent updates
transactional consistency
row-level modifications
operational application state

That is not the primary design goal of ClickHouse.

You generally do not want your:

user authentication system
banking transactions
inventory updates
operational business logic

to depend entirely on analytical database behavior.

The Interesting Part Is the Separation of Responsibilities

What I personally find interesting is how these systems complement each other instead of replacing each other.

PostgreSQL handles:

operational correctness

ClickHouse handles:

analytical scale

That separation creates much cleaner architectures.

Instead of forcing one database to solve every problem, each system handles the workload it was designed for.

CDC Is What Connects Them

One thing that makes this architecture powerful is CDC (Change Data Capture).

Instead of manually exporting data repeatedly, systems can stream changes from PostgreSQL into ClickHouse continuously.

Tools like:

Debezium
Airbyte
Kafka pipelines

make this pattern extremely practical now.

The operational system continues running normally while analytical systems receive data almost in real time.

They Even Think Differently Internally

The differences go deeper than just "transactions vs analytics".

PostgreSQL thinks heavily about:

rows
transactional consistency
updates
locking
relational integrity

ClickHouse thinks heavily about:

columns
compression
merges
partitions
analytical scans
aggregation efficiency

Even their storage engines reflect completely different priorities.

This Is Why Modern Data Stacks Often Use Both

Once you stop viewing databases as competitors and instead view them as workload-specific systems, the architecture starts making much more sense.

PostgreSQL handles the operational side.

ClickHouse handles the analytical side.

Together, they create systems that can:

process transactions reliably
scale analytical workloads efficiently
support observability
power dashboards
retain huge historical datasets

without forcing a single database to do everything.

Final Thought

The more I learn about databases, the more I realize that most modern architectures are really about separation of responsibilities.

PostgreSQL and ClickHouse work well together because they optimize for fundamentally different problems.

One is built to preserve business state reliably.

The other is built to analyze massive amounts of history efficiently.

And when combined properly, they complement each other extremely well.

PostgreSQL Restore Failures: It Wasn’t pgBackRest, It Was My Recovery Logic

Mohamed Hussain S — Wed, 06 May 2026 12:24:28 +0000

I was building and testing a PostgreSQL backup and restore workflow using pgBackRest.

The idea was simple:

take backups
restore them automatically
validate the database
make recovery predictable

Instead, I ended up repeatedly breaking PostgreSQL recovery itself.

At one point, PostgreSQL refused to start entirely, the application depending on it failed to start, and I started seeing errors like:

invalid checkpoint record
could not locate a valid checkpoint record at 0/DEAD

Later, I also hit timeline mismatch errors like:

ERROR: [058]: target timeline 3 forked from backup timeline 2

At first, I thought:

pgBackRest restores were corrupting PostgreSQL.

That assumption turned out to be completely wrong.

The real problem was the way I was handling recovery.

What I Was Building

I was testing a PostgreSQL backup/restore flow locally after repeated restore failures elsewhere.

To isolate the issue properly, I moved PostgreSQL onto my local machine and started testing the restore logic independently through API-triggered workflows.

The restore flow looked roughly like this:

Download backup repo
Stop PostgreSQL
Restore backup
Start PostgreSQL
Validate database

Sounds straightforward.

It wasn't.

The First Major Failure

After a restore attempt, PostgreSQL refused to start.

The logs looked like this:

LOG: database system was interrupted
LOG: invalid checkpoint record
PANIC: could not locate a valid checkpoint record at 0/DEAD

At that point:

PostgreSQL was down
the application couldn't start
authentication-related functionality stopped working
and repeated restore attempts made things even worse

What confused me initially was this:

The restore itself appeared to complete.

But PostgreSQL would immediately enter recovery problems afterward.

My Wrong Assumption

This was the real issue.

Every time recovery failed, I kept seeing files like:

backup_label
recovery.signal
standby.signal

So I assumed they were leftover artifacts from failed restores.

My restore automation started aggressively cleaning them up.

Something like this:

rm -f recovery.signal standby.signal backup_label

I genuinely believed this was helping PostgreSQL start cleanly.

In reality:

I was deleting the exact recovery metadata PostgreSQL needed.

That misunderstanding caused almost every major issue afterward.

What PostgreSQL Was Actually Trying To Do

This was the turning point.

pgBackRest wasn't randomly writing junk files into the data directory.

Those files exist for a reason.

During restore:

backup_label tells PostgreSQL where recovery should begin
recovery.signal tells PostgreSQL to enter recovery mode
WAL replay reconstructs a consistent database state

PostgreSQL was actually trying to perform a valid recovery process.

My automation kept interrupting or invalidating it.

Once I understood that, the entire problem started making sense.

The Recovery Loop Problem

Because my cleanup logic removed recovery metadata prematurely, PostgreSQL ended up in inconsistent states repeatedly.

Sometimes it would:

enter recovery mode
fail WAL replay
lose checkpoint continuity
refuse startup entirely

Other times it would partially start, but remain stuck in recovery mode.

That led to additional logic being added just to stabilize startup behavior.

For example:

SELECT pg_is_in_recovery();

and when required:

SELECT pg_promote();

The goal wasn't to "force PostgreSQL to work".

The goal was:

let PostgreSQL finish recovery properly, then promote only when necessary.

That distinction mattered a lot.

The Timeline Mismatch Error

At one stage, I also hit this:

ERROR: [058]: target timeline 3 forked from backup timeline 2

This one was especially confusing at first.

The issue was not just corrupted startup state anymore.

Now PostgreSQL was rejecting WAL history itself.

This happened because earlier restore attempts had already created inconsistent recovery timelines.

I had essentially created multiple broken recovery histories while repeatedly testing and modifying the restore process.

That was another important lesson:

PostgreSQL backups are not just data files.
They are tightly connected to WAL history and recovery timelines.

At this point, I realized I was no longer debugging a simple restore failure. I was debugging recovery history itself.

The Real Problem In My Restore Flow

Initially, my restore logic tried to "fix" PostgreSQL after restore.

That approach was fundamentally flawed.

The older flow looked roughly like this:

Old Approach	Problem
Delta restore	Mixed old/new recovery state
Delete `backup_label`	Broke recovery metadata
Delete `recovery.signal`	Interrupted recovery
Force archive changes	Caused WAL continuity issues
Hope PostgreSQL starts	No validation or recovery awareness

I was treating recovery artifacts like corruption.

They weren't corruption.

They were part of PostgreSQL recovery itself.

The Change That Finally Fixed It

The biggest realization was this:

Stop fighting PostgreSQL recovery.

Instead of trying to manually "clean up" PostgreSQL after restore, I changed the restore flow completely.

The corrected restore flow became:

Stop PostgreSQL cleanly
Completely empty the data directory
Run pgBackRest restore properly
Let PostgreSQL recover normally
Wait for readiness
Promote only if recovery mode persists
Validate using pgBackRest check

The critical change was this:

self._run_pgbackrest("restore", "--type=immediate")

And equally important:

self._empty_directory(self.pg_data_dir)

Instead of attempting partial or delta-style recovery cleanup, the restore process now starts from a completely clean data directory.

That eliminated a huge amount of inconsistent state.

Why `--type=immediate` Helped

This turned out to be extremely important.

--type=immediate tells pgBackRest:

restore to the latest immediately consistent point available.

That meant:

PostgreSQL could perform proper WAL-based recovery
recovery metadata stayed intact
WAL replay remained valid
timeline handling became predictable

Most importantly:

PostgreSQL itself was finally allowed to control recovery correctly.

The Mistake That Increased the Blast Radius

One thing I learned the hard way:

Never test restore automation against a database actively used by an application.

Even though this was a testing workflow, the PostgreSQL instance was still tied to application startup behavior.

So whenever PostgreSQL failed:

application startup failed too
user-related functionality broke
debugging became much harder under pressure

After repeated failures, I moved the restore testing flow entirely onto my local machine and isolated PostgreSQL from the rest of the application stack.

That made debugging significantly easier.

Another Subtle Issue: Backup Failures After Restore

I also ran into another confusing problem after some restore attempts.

In certain cases, subsequent backups started failing unexpectedly after a restore.

Part of the issue came from mixing:

restore operations
delta-style restore assumptions
and archive/WAL state inconsistencies

At one stage, I was also toggling archive-related behavior incorrectly during recovery experiments, which further complicated WAL continuity.

This reinforced another important realization:

PostgreSQL backups are tightly coupled with WAL history and recovery timelines.

Even when the database appears to start correctly, inconsistent recovery state can break future backup behavior in subtle ways.

What I Learned From This

This experience completely changed how I think about PostgreSQL recovery.

Some major lessons:

backup_label and recovery.signal are not garbage files
PostgreSQL recovery is heavily WAL-dependent
Timelines matter more than most people realize
Partial cleanup creates inconsistent recovery states
A clean restore is often safer than trying to "repair" recovery manually
pgBackRest already knows how to orchestrate PostgreSQL recovery properly
Restore validation matters as much as backup creation
Backup testing should happen in isolated environments

Most importantly:

PostgreSQL recovery is not something you should "fight".

Once I stopped trying to override recovery behavior manually and instead allowed PostgreSQL + pgBackRest to handle recovery the way they were designed to, the restore flow finally became stable.

The Final Restore Flow That Actually Worked

After multiple failed recovery attempts, timeline mismatches, and broken startup states, I stopped trying to manually "fix" PostgreSQL recovery and instead simplified the restore process completely.

The final stable flow looked roughly like this:

# simplified restore flow

stop_postgres()

empty_data_directory()

pgbackrest_restore("--type=immediate")

start_postgres()

wait_for_connection()

if postgres_is_in_recovery():
    promote_postgres()

pgbackrest_check()

The important part here is not the code itself.

It's the recovery philosophy behind it.

The earlier versions of my restore logic tried to:

partially clean recovery state
remove recovery metadata
force PostgreSQL out of recovery
preserve old data directory state

That approach kept creating inconsistent recovery conditions.

The corrected flow instead does three important things:

starts from a completely clean data directory
lets pgBackRest manage recovery metadata properly
allows PostgreSQL to perform WAL recovery the way it was designed to

The biggest change was no longer treating files like backup_label or recovery.signal as corruption artifacts.

They were part of the recovery process itself.

Final Thought

At the beginning, I thought PostgreSQL restores were failing because the database was corrupted.

In reality, the corruption was coming from my own recovery assumptions.

The system wasn't broken.

My mental model of PostgreSQL recovery was.

arrayJoin in ClickHouse: Why Your Rows Are Duplicating (and How to Control It)

Mohamed Hussain S — Tue, 28 Apr 2026 10:20:11 +0000

When working with arrays in ClickHouse, arrayJoin feels straightforward.

Until your query suddenly returns far more rows than expected.

The Use Case

Let’s say you have a table like this:

CREATE TABLE events (
    user_id UInt32,
    actions Array(String)
) ENGINE = MergeTree
ORDER BY user_id;

Example row:

user_id: 1
actions: ['click', 'scroll', 'purchase']

Now you want each action as a separate row.

The Tool: `arrayJoin`

SELECT user_id, arrayJoin(actions) AS action
FROM events;

Output:

1   click
1   scroll
1   purchase

So far, everything looks correct.

Where Things Go Wrong

Now let’s say you write:

SELECT user_id,
       arrayJoin(actions) AS action,
       arrayJoin(actions) AS action2
FROM events;

You might expect:

3 rows

But you actually get:

9 rows

Why This Happens

arrayJoin doesn’t just flatten arrays.

It expands rows.

Each element in the array creates a new row.

So when you use it multiple times:

First arrayJoin → expands rows
Second arrayJoin → expands again

Result:

3 elements → 3 × 3 = 9 rows

This is effectively a cartesian multiplication of rows.

The Hidden Impact

This becomes a real problem when:

Arrays are large
Multiple arrayJoins are used
You don’t expect row multiplication

Result:

Incorrect output
Sudden increase in row count
Slower queries

The Better Approach

1. Use a single `arrayJoin` when possible

SELECT user_id,
       arrayJoin(actions) AS action
FROM events;

2. Use `ARRAY JOIN` syntax (cleaner and explicit)

SELECT user_id, action
FROM events
ARRAY JOIN actions AS action;

3. Use `arrayZip` to avoid unintended multiplication

If you’re working with multiple arrays:

SELECT user_id,
       arrayJoin(arrayZip(actions, actions)) AS zipped
FROM events;

This ensures elements are paired instead of multiplied.

Why This Matters

arrayJoin is powerful-but easy to misuse.

If used without understanding:

Row count can explode
Queries become expensive
Results can be misleading

Real-World Use Cases

Event tracking pipelines
Flattening nested JSON
Working with semi-structured logs
Exploding arrays into rows for analysis

One Important Gotcha

Every arrayJoin multiplies rows.

If your result size looks unexpectedly large, this is one of the first things to check.

Final Thoughts

arrayJoin is one of the most useful tools in ClickHouse.

But its behavior is not always intuitive.

In many cases, the issue is not the data itself-but how the query expands it.

Understanding this early can save a lot of debugging time.

greatCircleDistance in ClickHouse: Avoiding Full Table Scans

Mohamed Hussain S — Mon, 20 Apr 2026 16:18:23 +0000

When working with location data, one problem shows up almost immediately:

“How do I calculate the distance between two coordinates stored in my database?”

At first, it seems like something you’d have to handle outside the database.

But if you're using ClickHouse, there’s a built-in function for this.

The Right Tool: `greatCircleDistance`

greatCircleDistance(lat1, lon1, lat2, lon2)

It calculates the shortest distance between two points on Earth.

Example

SELECT greatCircleDistance(13.0827, 80.2707, 12.9716, 77.5946) AS distance_meters;

This gives you the distance between Chennai and Bangalore - in meters.

Looks Simple… But There’s a Catch

Now let’s say you write a query like this:

SELECT city
FROM locations
WHERE greatCircleDistance(lat, lon, 13.0827, 80.2707) < 5000;

At first glance, this looks perfectly fine.

But this can quietly turn into a full table scan - especially on large datasets.

Why This Happens

In ClickHouse, indexes don’t work like traditional B-tree indexes.

They are:

Sparse
Designed for range pruning

They work well for queries like:

WHERE lat BETWEEN x AND y

But not for:

WHERE greatCircleDistance(lat, lon, x, y) < 5000

Because:

The function is applied on the columns, so ClickHouse cannot use the index to skip data efficiently.

The Better Approach (What You Should Actually Do)

Instead of directly applying the function, reduce the dataset first.

Bounding Box Filter

SELECT city
FROM locations
WHERE lat BETWEEN (13.0827 - 0.05) AND (13.0827 + 0.05)
  AND lon BETWEEN (80.2707 - 0.05) AND (80.2707 + 0.05)
  AND greatCircleDistance(lat, lon, 13.0827, 80.2707) < 5000;

(The bounding box is an approximation to reduce the search space before exact filtering.)

Why This Works

lat BETWEEN → uses index
lon BETWEEN → reduces rows further
greatCircleDistance → applied only on filtered data

So instead of scanning the entire table:
=> You narrow it down first, then compute accurately

Real-World Use Cases

This pattern is useful in:

Delivery radius filtering
Finding nearby users
Geo-based analytics
Ride-sharing systems

One Important Gotcha

Make sure:

Coordinates are in degrees (not radians)
Order is always (lat, lon)

Swapping them will give incorrect results.

Final Thoughts

greatCircleDistance is powerful - but if used blindly, it can hurt performance.

In ClickHouse, performance often depends more on how you query than what you query.

Sometimes, the right approach isn’t just using a function - but knowing when and how to use it efficiently.

Why My S3 Backup Setup Broke: Buckets, “Folders”, and Scheduling Misconceptions

Mohamed Hussain S — Thu, 16 Apr 2026 07:04:20 +0000

Another lesson in building reliable systems - not just configuring them.

I thought I had everything set up correctly.

Backups configured
S3-compatible storage connected
Backup triggered via cron jobs during testing

And yet nothing showed up where I expected.

What looked like a simple configuration issue turned out to be a wrong mental model of how S3 actually works.

This post breaks down what went wrong and what fixed it.

The Setup

I was working with:

An S3-compatible object storage (not AWS directly)
A system that allows:
- Configuring a bucket
- Setting a backup path
- Defining backup frequency

Everything seemed straightforward.

But the problem started with one assumption:

Buckets can behave like folders.

The First Mistake: Treating Buckets Like Folders

In a traditional file system, you think like this:

backups/
  app1/
    db.sql

So it felt natural to assume:

Create a “folder” in object storage
Then create buckets inside it for different use cases

In my case, I had something like a folder already created in the object storage UI, and I assumed:

That is my base, and I can create buckets under it

So I tried:

Connecting to that “folder” as a bucket
Then creating another bucket inside it (for vector DB backups)

This kept failing.

At first, I thought:

Maybe it is a permission issue
Maybe my user does not have enough access

But that was not the real problem.

What Was Actually Going Wrong

I was effectively trying to:

Treat a bucket like a parent directory
And create another bucket inside it

That is not how S3 works.

In S3:

Buckets are top-level containers
You cannot nest buckets inside other buckets

So when I tried to:

Connect to an existing bucket
And then create another bucket under it

It failed because the concept itself is invalid.

The Correct Mental Model

This is how S3 actually works:

bucket: backups
object key: app1/2026-04-15/db.sql

There are only two things:

Bucket (top-level)
Object key (full path as a string)

There is no real folder hierarchy.

Organizing Data the Right Way

The fix was not about creating folders.

It was about changing how I name objects.

Instead of trying to structure things at the bucket level, I moved that structure into the object key.

For example:

object_name = f"qdrant/{collection_name}/{snapshot_name}"

This gives a structure like:

bucket: backups

qdrant/
  collection_1/
    snapshot_001
  collection_2/
    snapshot_002

Even though S3 is flat internally, most UIs render this as a folder-like structure.

This is the correct way to organize data.

The Second Mistake: Mixing Bucket and Path

Another issue was passing paths as part of the bucket name.

For example:

bucket = backups/qdrant

This is invalid.

Correct approach:

bucket = backups
object key = qdrant/collection_name/snapshot

S3 APIs expect a valid bucket name, not a path.

What Finally Clicked

The breakthrough was realizing:

I was not dealing with folders at all
I was dealing with string prefixes inside object keys

Once I stopped trying to create hierarchy at the bucket level and moved everything into object naming, the entire setup started working as expected.

Putting It All Together

Correct configuration:

Bucket:

  backups

Object naming:

  f"qdrant/{collection_name}/{snapshot_name}"

This alone was enough to:

Organize backups cleanly
Avoid bucket-related errors
Make the storage layout intuitive in the UI

Key Takeaways

Buckets are not folders
You cannot create a bucket inside another bucket
S3 is a flat object store
Folder-like structures come from object key prefixes
Always keep bucket and path separate

Final Thought

The issue was not with permissions or configuration.

It was a mismatch between how I expected storage to behave and how it actually works.

Once the mental model changed, the implementation became simple.

If something feels unnecessarily complicated in S3, it is often a sign that the model being used is incorrect.

Debugging a Broken Metrics Pipeline: What Actually Went Wrong

Mohamed Hussain S — Thu, 16 Apr 2026 03:29:50 +0000

Part 4 of a series on building a metrics pipeline into ClickHouse
Read Part 3: Understanding Vector Transforms

When Things Still Don’t Work

At this point, the pipeline looked correct.

Sources were defined
Transforms were working
Data structure matched expectations

And yet, something was still off.

Data wasn’t behaving the way it should.

This is where debugging became the main task.

The Only Way Forward: Logs

When dealing with ingestion issues in ClickHouse, logs become your best source of truth.

I started monitoring the error logs directly:

sudo tail -f /var/log/clickhouse-server/clickhouse-server.err.log

This immediately surfaced issues that were not visible from the pipeline configuration.

An Error That Didn’t Make Sense

At one point, I started seeing this error repeatedly:

There exists no table monitoring.cpu in database monitoring

This was confusing.

I hadn’t created a table named cpu
It wasn’t part of my current setup
My Vector configuration didn’t reference it

So where was it coming from?

What Was Actually Happening

After digging deeper, the issue had nothing to do with my current pipeline.

It turned out that a previously used Telegraf process was still running in the background.

Even though I had:

Removed configurations
Switched tools
Rebuilt the pipeline

The old process was still active and sending data using an outdated setup.

That’s why ClickHouse was reporting errors for a table I never intended to use.

The Real Problem

This wasn’t a configuration issue.

It was a runtime issue.

The system I was debugging was not the only system running.

That realization changed how I approached debugging.

Fixing It

The solution was simple - but easy to miss.

First, I checked for any running Telegraf processes:

ps aux | grep telegraf

Then stopped them explicitly:

sudo systemctl stop telegraf

Once the old process was stopped, the errors disappeared.

What This Teaches

This led to an important lesson:

Always validate the runtime environment - not just the configuration.

When working with pipelines:

Old processes may still be running
Multiple agents may write to the same destination
Previous setups can interfere with new ones

If you don’t account for this, you may end up debugging the wrong problem.

The Debugging Loop

Most of the pipeline development ended up looking like this:

Write → Run → Fail → Check logs → Fix → Repeat

Each iteration helped refine:

Transform logic
Data structure
Schema alignment

This loop is where real progress happens.

What Finally Worked

Once:

Transforms were correct
Timestamps were fixed
Old processes were stopped

The pipeline stabilized.

Data started flowing consistently into ClickHouse, and queries returned expected results.

Series Recap

This series covered:

Part 1: Why the Telegraf approach didn’t work
Part 2: Understanding Vector pipelines
Part 3: Writing transforms and handling data
Part 4: Debugging and making the pipeline reliable (this post)

Final Thought

Building data pipelines is rarely about getting things right on the first try.

It’s about:

Observing how the system behaves
Identifying where it breaks
Iterating until it stabilizes

Debugging is not a side task - it is the process.

From Pipelines to Transforms: Making Vector Work with ClickHouse

Mohamed Hussain S — Thu, 16 Apr 2026 01:13:59 +0000

Part 3 of a series on building a metrics pipeline into ClickHouse
Read Part 2: Understanding Vector Pipelines

Where Things Got Real

By this point, the pipeline structure made sense.

I understood:

Sources
Transforms
Sinks

But the pipeline still wasn’t working reliably.

That’s when it became clear:

The hardest part wasn’t collecting data.
It was transforming it correctly.

Why Transforms Matter

Raw metrics are rarely usable as-is.

When sending data into ClickHouse, even small inconsistencies can break ingestion.

Some common issues encountered were:

Wrong data types
Unexpected field structures
Missing values
Incorrect timestamp formats

Even if everything else is correct, these issues cause inserts to fail.

Enter VRL

In Vector, transformations are written using Vector Remap Language.

At first, VRL feels simple.

But in practice, it’s strict.

Types must be explicit
Fields must be handled carefully
Errors are not ignored

That strictness is what makes pipelines reliable - but also harder to get right.

The Timestamp Problem

One of the biggest issues I faced was timestamp handling.

ClickHouse expects timestamps in a specific format.

The raw data didn’t match that format.

Even when everything else was correct, inserts would fail silently because of this.

The fix looked like this:

.timestamp = to_unix_timestamp!(parse_timestamp!(.timestamp, "%+"))

This line did three things:

Parsed the incoming timestamp
Converted it into a Unix format
Made it compatible with ClickHouse

It seems simple - but this was a major blocker.

Normalizing Metrics

Another challenge was aligning the data structure with what ClickHouse expects.

For both host and GPU metrics, this required:

Converting values into numeric types
Standardizing field names
Adding metadata like host and source
Ensuring consistent structure across all metrics

Without this step, ingestion would fail even if the pipeline looked correct.

From Raw Data to Queryable Format

One important transformation was changing how metrics were structured.

Instead of storing multiple values in a single record:

cpu, memory, disk

The data was reshaped into a row-based format:

metric_name = "cpu", value = ...
metric_name = "memory", value = ...

This made it easier to:

Query data in ClickHouse
Aggregate metrics
Maintain a consistent schema

Why This Was Hard

Most of the time spent on this pipeline wasn’t on setup.

It was here:

Write transform → Run → Fail → Fix → Repeat

Each iteration revealed:

A type mismatch
A missing field
A formatting issue

This is where the pipeline actually gets built.

What Changed After This

Once the transforms were correct:

Data started flowing reliably
Inserts into ClickHouse succeeded
Queries started returning meaningful results

At that point, the pipeline finally felt stable.

What’s Next

Even after fixing transformations, one major challenge remained:

Debugging unexpected failures.

In the next part, I’ll walk through:

How I debugged pipeline issues
What ClickHouse logs revealed
And a mistake that cost me time

Series Overview

Part 1: Why the Telegraf approach didn’t work
Part 2: Understanding Vector Pipelines
Part 3: Writing transforms and handling data correctly (this post)
Part 4: Debugging and making the pipeline reliable

Final Thought

Transforms are where pipelines either succeed or fail.

Understanding how data needs to be shaped is more important than the tool itself.

Once that becomes clear, everything else starts to fall into place.

Understanding Vector Pipelines: From Config Files to Data Flow

Mohamed Hussain S — Thu, 09 Apr 2026 19:29:55 +0000

Part 2 of a series on building a metrics pipeline into ClickHouse
Read Part 1: Why my metrics pipeline with Telegraf didn’t work

Picking Up Where Things Broke

In the previous part, I talked about trying to build a metrics pipeline using Telegraf - and why that approach didn’t work for my use case.

The biggest issue wasn’t just tooling.

It was this:

I didn’t have enough control over how data moved through the system.

That’s what led me to explore a different approach.

Why Vector

I came across Vector while looking for something more flexible.

At a glance, it felt different.

Instead of thinking in terms of plugins and configs, Vector is built around a pipeline model.

And that changes everything.

The Core Idea: Pipelines

At the center of Vector is a simple concept:

Sources → Transforms → Sinks

That’s it.

But this model makes the flow of data explicit.

Sources → where data comes from
Transforms → how data is modified
Sinks → where data is sent

Compared to my earlier approach, this immediately felt clearer.

What This Actually Means

Instead of writing a config and hoping everything connects correctly, you define:

What data you are collecting
How that data should be shaped
Where that data should go

That shift sounds small - but it changes how you think about the system.

From Config Files to Data Flow

With Telegraf, my thinking looked like this:

Write config → Run → Debug errors

With Vector, it started becoming:

Collect → Transform → Route → Store

The focus moved from:

“What config do I write?”

to:

“How does data move through each stage?”

The New Learning Curve

Of course, switching tools didn’t magically solve everything.

There were new challenges.

Vector uses YAML for configuration, which was different from the TOML I was used to.

And more importantly:

The pipeline only works if every stage is defined correctly.

Some of the early issues I ran into:

Incorrect source definitions
Misconfigured sinks
Data not flowing as expected
Silent failures when something didn’t connect properly

At times, it felt like nothing was happening-even though everything looked “correct.”

First Realization: Everything Is Connected

One important thing I learned quickly:

If one stage breaks, the entire pipeline breaks.

Unlike simpler setups, you can’t treat components independently.

A bad transform can stop data entirely
A misconfigured sink can drop everything silently
A source that doesn’t emit correctly makes debugging harder

This forced me to start thinking in terms of end-to-end flow, not individual pieces.

What Improved Immediately

Despite the challenges, a few things became better compared to before:

Clear visibility into how data moves
Better control over transformations
More flexibility in shaping data before sending it to ClickHouse

Even though things weren’t fully working yet, I finally felt like I was closer to solving the actual problem.

What Was Still Missing

At this stage, the pipeline structure made sense.

But one part was still unclear-and turned out to be the hardest:

How to correctly transform the data so that ClickHouse would accept it.

This is where most of the complexity showed up.

What’s Next

In the next part, I’ll dive into the most challenging part of this setup:

Writing transforms using Vector Remap Language (VRL)
Handling strict data types
Fixing timestamp issues
And shaping metrics into a format that ClickHouse can actually ingest

Series Overview

This post is part of a series:

Part 1: Why the Telegraf approach didn’t work
Part 2: Understanding Vector pipelines (this post)
Part 3: Writing transforms and handling data correctly
Part 4: Debugging and making the pipeline reliable

Final Thought

Switching tools didn’t solve the problem immediately.

But it did something more important:

It made the system visible.

Once I could see how data moved through each stage, debugging stopped being guesswork-and started becoming structured.

Why My Metrics Pipeline with Telegraf Didn’t Work (and What I Learned)

Mohamed Hussain S — Tue, 07 Apr 2026 10:18:51 +0000

Part 1 of a series on building a metrics pipeline into ClickHouse

Collecting metrics is easy.

Shipping them to an analytical database without losing your mind is the hard part.

The Goal

At one point, the task seemed straightforward:

Collect system metrics (CPU, memory, GPU) and store them in ClickHouse for analysis.

This is a common observability use case.
You collect metrics, send them somewhere, and run queries on top.

Simple enough.

But in practice, it didn’t go as planned.

The Initial Approach: Telegraf

I started with Telegraf.

It’s widely used for collecting system metrics and has a plugin-based architecture, which makes it a natural first choice.

This was also where I first came across TOML.

At first, it felt like I just needed to “write a config and run it.”
But very quickly, I realized:

Configuration isn’t just syntax-it defines how your system behaves.

What I Was Trying to Build

The idea was simple:

Collect host-level metrics (CPU, memory, etc.)
Collect GPU metrics
Push everything into ClickHouse
Run analytical queries on top

Essentially, a basic observability pipeline.

Where Things Started Breaking

On paper, Telegraf looked like it should work.

In reality, I ran into a few issues:

No straightforward way to push data into ClickHouse
Lack of a native ClickHouse output plugin
Debugging wasn’t very intuitive
Configurations became rigid as complexity increased

At some point, I was spending more time trying to make the tool fit the use case than actually solving the problem.

A Shift in Perspective

This is where something important clicked.

Up until this point, I was thinking in terms of:

Write config → Run tool → Expect output

But that approach wasn’t working.

What I needed instead was a clearer understanding of how data actually flows:

Data source → Transformation → Destination

The problem wasn’t just the tool-it was the lack of control over how data moved through the system.

Why I Decided to Move Away

At this stage, it became clear that I needed:

More control over data transformations
Better visibility into how data flows
A system that is easier to debug

Telegraf, while powerful, didn’t give me that level of flexibility for this use case.

What’s Next

That’s when I decided to try a different approach using Vector.

Instead of treating configuration as static setup, Vector treats it as a pipeline.

In the next part, I’ll walk through:

How Vector pipelines work
Why the sources → transforms → sinks model made a difference
And what changed when I adopted that approach

Series Overview

This post is part of a series:

Part 1: Why the Telegraf approach didn’t work (this post)
Part 2: Understanding Vector pipelines
Part 3: Writing transforms and handling data
Part 4: Debugging and making the pipeline reliable

Final Thought

What started as a simple setup turned into a deeper lesson:

Tools don’t solve problems-understanding systems does.

Once that became clear, the direction forward was much easier.

Forem: Mohamed Hussain S

Why Too Many Parts Hurt ClickHouse Performance

Every Insert Creates Parts

Why Tiny Inserts Become Dangerous

Why Merges Matter So Much

The Dangerous Part Is That It Builds Slowly

Queries Also Become More Expensive

FINAL Does Not Really Solve This

Over-Partitioning Can Quietly Make This Worse

ClickHouse Also Has Ways to Help

Why Batch Size Matters So Much

Too Many Parts Also Affects Startup and Recovery

The Important Lesson

Final Thought

Why Real-Time Analytics Eventually Changes Your Database Architecture

The Problem Usually Starts Slowly

PostgreSQL Slowly Becomes Responsible for Everything

The Real Problem Is Workload Isolation

Why Observability Changes Everything So Quickly

This Is Usually When Analytical Databases Start Appearing

Not All Analytical Data Needs PostgreSQL First

Why Not Just Use ClickHouse for Everything?

Why CDC Pipelines Become So Important

Don’t Rush Into Multi-Database Architectures

The Bigger Engineering Lesson

Final Thought

FINAL in ClickHouse Isn’t as Expensive as It Used to Be

Why FINAL Existed in the First Place

The Old FINAL Problem

Modern ClickHouse Has Improved FINAL a Lot

FINAL vs argMax Isn’t Always a Simple Comparison

So… Is FINAL Safe to Use Now?

Where FINAL Makes Sense

Where FINAL Can Still Hurt

SELECT ... FINAL vs OPTIMIZE TABLE ... FINAL

The Bigger Lesson Is Understanding Why You Need FINAL

ClickHouse Advice Evolves Too

Final Thought

References

Why PostgreSQL and ClickHouse Work So Well Together

PostgreSQL and ClickHouse Solve Different Problems

PostgreSQL Thinks About Transactions First

ClickHouse Thinks About Analytics First

PostgreSQL Stores the Business. ClickHouse Explains It.

Why They Commonly Exist Together

Why Not Just Use PostgreSQL for Analytics?

Why Not Just Use ClickHouse for Everything?

The Interesting Part Is the Separation of Responsibilities

CDC Is What Connects Them

They Even Think Differently Internally

This Is Why Modern Data Stacks Often Use Both

Final Thought

PostgreSQL Restore Failures: It Wasn’t pgBackRest, It Was My Recovery Logic

What I Was Building

The First Major Failure

My Wrong Assumption

What PostgreSQL Was Actually Trying To Do

The Recovery Loop Problem

The Timeline Mismatch Error

The Real Problem In My Restore Flow

The Change That Finally Fixed It

Why --type=immediate Helped

The Mistake That Increased the Blast Radius

Another Subtle Issue: Backup Failures After Restore

What I Learned From This

The Final Restore Flow That Actually Worked

Final Thought

arrayJoin in ClickHouse: Why Your Rows Are Duplicating (and How to Control It)

The Use Case

The Tool: arrayJoin

Where Things Go Wrong

Why This Happens

The Hidden Impact

The Better Approach

1. Use a single arrayJoin when possible

2. Use ARRAY JOIN syntax (cleaner and explicit)

3. Use arrayZip to avoid unintended multiplication

Why This Matters

Real-World Use Cases

One Important Gotcha

Why `--type=immediate` Helped

The Tool: `arrayJoin`

1. Use a single `arrayJoin` when possible

2. Use `ARRAY JOIN` syntax (cleaner and explicit)

3. Use `arrayZip` to avoid unintended multiplication

The Right Tool: `greatCircleDistance`