Forem: Onkar

I Built a CRM AI Assistant in Go From Scratch — No LangChain, No Shortcuts

Onkar — Tue, 05 May 2026 06:28:12 +0000

TL;DR: I built Aria — a production-grade AI assistant for CRM agents. Agents log in with Google, type natural language questions about their leads, tasks, and bookings, and get real answers streamed from a live database. No hallucinations. No canned responses. Just SQL executed against real data, formatted by GPT-4o, and streamed token by token. Built entirely in Go, with a Python schema intelligence pipeline, pgvector for semantic search, Redis for caching and sessions, and a Next.js frontend. Here's everything I learned.

Why I built this

I work as a backend engineer at a PropTech company. Every day I watch pre-sales agents open five different tabs to answer one question — leads dashboard, task list, booking tracker, payment sheet, activity log. The data exists. It's all in PostgreSQL. But to answer "which of my leads haven't been contacted in 3 days?", an agent has to mentally join four tables themselves.

I wanted to build something that let an agent just ask that question — in plain English — and get a real answer from the live database. Not a chatbot that makes things up. An AI that runs actual SQL and tells you what it found.

That's Aria.

What Aria actually does

Before I get into the technical decisions, let me show you what it looks like in practice.

An agent logs in with Google. They see a chat interface. They type:

"How many leads are assigned to me?"

Aria responds:

"You have 5 leads assigned to you."

With a "SQL executed" toggle that reveals the actual query that ran. Then they ask:

"What tasks are due today?"

"There is one task due today. It is a high-priority call titled 'Welcome call — Priya,' which involves an intro call and confirming the budget range. The task is currently pending and has not been completed yet."

Then:

"Recent activity on my leads"

"Recent activities on your leads include:

Email Activity — Type: Outbound Email, Subject: Sent brochure, Body: PDF with Manchester options, Occurred: May 4, 2026 at 12:43 PM

Call Activity — Type: Inbound Call, Subject: Inbound budget check, Body: Student asked about bills-inclusive options, Occurred: May 3, 2026 at 4:43 PM, Outcome: Connected"

Real data. Live database. The agent never wrote a query.

The core technical problem — and why RAG isn't the answer

My first instinct was RAG (Retrieval Augmented Generation) — the pattern everyone uses for document question-answering. Embed all your CRM data as vectors, find the most similar documents when someone asks a question, feed them to GPT-4.

This is completely wrong for structured CRM data.

RAG is for unstructured data — PDFs, notes, articles. The answer to "how many leads are assigned to me?" isn't found by semantic similarity. It requires an exact COUNT query with a WHERE clause. Semantic search on a vector of "John Smith, high priority, interested state, assigned to agent_42" doesn't give you a count. It gives you a similar-sounding document.

The right approach is Text-to-SQL — translate the natural language question into an actual SQL query, execute it, get real rows, format the result.

But raw Text-to-SQL (just dumping your schema into a prompt and asking GPT-4 to write SQL) breaks down on complex schemas with business-specific terminology. "High priority leads" only maps to priority = 'HIGH' if the LLM knows your schema well enough. "Stale leads" needs to know that means last_activity_at < NOW() - INTERVAL '3 days'.

The solution I landed on: RAG over the schema, not over the data rows.

I build a semantic layer — plain-English descriptions of every table, every column, every enum value, and what they mean in business terms. That semantic layer gets embedded as vectors. When an agent asks a question, I retrieve the most relevant schema descriptions first, then ask GPT-4 to generate SQL using that focused context. The LLM sees exactly what it needs, not the entire 15-table schema.

Why Go

I'd been using Ruby on Rails professionally for over a year. I wanted to learn Go properly — not from tutorials, but by building something real with real constraints.

Go turned out to be the right call for this specific project for three reasons.

Goroutines for SSE streaming. LLM responses stream token by token. In Go, you open a channel, pump tokens from the OpenAI stream into it, and flush each one as an SSE event. The pattern is clean and idiomatic. The http.Flusher interface handles the rest. Zero boilerplate.

Type safety for LLM tool calling. OpenAI's function calling returns JSON that maps to your defined tool schema. In Go, you define a struct, and the type system ensures you're handling the response correctly. No runtime surprises from dynamic typing.

Concurrency for parallel queries. Some questions require multiple SQL queries — "show me my leads and today's bookings." Go's sync.WaitGroup + goroutines make parallel execution natural. Run both queries simultaneously, merge results, format the combined answer. Python could do this, but Go makes it feel like the default way to think.

The one place I kept Python: the schema intelligence pipeline that reads the DB schema and generates semantic documentation via GPT-4o. The Python AI ecosystem (psycopg2, pgvector library, openai SDK) is simply more mature for that one-time build step.

The architecture

Browser (Next.js)
      │
      │ Google OAuth → JWT → httpOnly refresh cookie
      │ POST /chat → SSE stream
      │
Go API (Chi router)
      │
      ├── JWT middleware (validates + blacklists in Redis)
      ├── Rate limiter (30 req/min per agent, Redis)
      │
      ├── Schema retriever
      │   → embed question → pgvector cosine search → top-5 relevant schema docs
      │
      ├── Intent example retriever
      │   → cosine search on pre-seeded Q→SQL examples
      │
      ├── GPT-4o tool calling
      │   → system prompt: agent context + retrieved schema + examples + history
      │   → tool: query_crm_database(sql, explanation)
      │
      ├── SQL validator
      │   → must be SELECT only
      │   → inject WHERE assigned_agent_id = $1 (agent isolation)
      │
      ├── pgx executor (read-only role, 5s timeout)
      │
      ├── Redis query cache (5 min TTL, invalidated on thumbs-down)
      ├── Redis session history (last 10 messages, 2h TTL)
      │
      └── SSE streamer → tokens flow to browser

PostgreSQL
  - 15 tables: leads, tasks, bookings, payments, users, partners, properties, activities...
  - pgvector extension: schema_embeddings, intent_examples
  - Read-only role: aria_readonly (SELECT only at DB level)

Python schema pipeline (runs once on setup)
  - Introspects PostgreSQL schema
  - GPT-4o generates plain-English column descriptions
  - Embeds + stores in schema_embeddings
  - Auto-generates 30 example Q→SQL pairs → intent_examples

The decision that matters most: agent data isolation

This is not a technical trick. It's a security requirement.

When agent A asks "show me my leads," they must only see their leads. Not agent B's. Not all agents' combined. This sounds obvious but there are three places where it could fail:

The LLM layer. GPT-4o might forget to include a WHERE clause. The LLM is non-deterministic — you cannot trust it to always include the agent filter.

The application layer. This is where I enforce it. After the LLM generates SQL, before it ever hits the database, I inject AND assigned_agent_id = $1 and pass the authenticated agent's ID as a parameterised argument. The agent ID comes from the JWT — which is cryptographically signed. This runs regardless of what the LLM produced.

The database layer. The application connects as aria_readonly — a PostgreSQL role with only SELECT permissions. Even if somehow a non-SELECT query got through, it would fail at the database level.

Three independent layers. Any one of them catches a failure in the others.

// After SQL is generated by LLM, before execution
func injectAgentFilter(sql, agentID string) string {
    // Replace :agent_id placeholder if LLM included it
    // If not, inject the filter regardless
    if !strings.Contains(strings.ToUpper(sql), "ASSIGNED_AGENT_ID") {
        // Wrap the query: SELECT * FROM (...) WHERE assigned_agent_id = $1
        return fmt.Sprintf(
            "SELECT * FROM (%s) AS q WHERE q.assigned_agent_id = $1", sql,
        )
    }
    return strings.ReplaceAll(sql, ":agent_id", "$1")
}

The schema intelligence pipeline

This is the piece I'm most proud of, because it solves a real problem that existing tools don't.

The problem: Text-to-SQL tools expect you to manually write documentation about your schema — what each column means, what each enum value represents. For a 15-table CRM with columns like ai_ineligible_reason, source_details, meta_ai_call_status, manually writing that documentation is hours of work. And every time you add a column, you have to update the docs.

My solution: a Python script that introspects the PostgreSQL schema and sends each table's DDL to GPT-4o, which generates the documentation automatically.

def generate_table_doc(client, table_name, ddl, foreign_keys, sample_rows):
    prompt = f"""
    You are documenting a CRM database for a student accommodation company.

    Here is the DDL for the '{table_name}' table:
    {ddl}

    Foreign keys: {foreign_keys}
    Sample rows (3): {sample_rows}

    Generate a JSON object with:
    - table purpose (1-2 sentences)
    - for each column: plain English description, possible values if enum
    - common business questions this table answers
    - relationships to other tables
    """
    # Returns structured JSON

The output for the leads table tells GPT-4o things like: "state = 'interested' means the student has shown interest but has not yet booked. pre_sales_agent_id is the agent currently responsible for converting this lead." These descriptions get embedded as vectors and stored in schema_embeddings.

The pipeline is incremental — it hashes each table's DDL. On re-run, only tables whose schema has changed get re-documented. Add a column to leads, only leads gets reprocessed. The rest is instant.

The dual-write problem and why I store conversations in both Redis and PostgreSQL

Conversation history lives in two places:

Redis (session:{user_id}:{session_id}, TTL 2 hours) — the last 10 messages, for fast lookup on every API call. The LLM needs this context to answer follow-up questions.

PostgreSQL (conversations and messages tables) — permanent storage for audit, feedback training, and conversation history display.

Why both? Speed vs durability. Redis is fast — loading session history from Redis takes ~1ms. PostgreSQL with a JOIN takes ~10ms. For something that happens on every LLM call, that 9ms difference matters. But Redis is ephemeral — TTL expires, memory pressure evicts keys. The PostgreSQL record is permanent and queryable.

This is the same pattern I use at my day job for a different problem. The tradeoff is always: what needs to be fast, and what needs to be permanent?

Streaming with SSE in Go

Server-Sent Events is the right choice for streaming LLM responses. It's one-directional (server to client), works over standard HTTP, and auto-reconnects on disconnect. No WebSocket handshake overhead.

The Go implementation is clean:

func streamResponse(w http.ResponseWriter, openaiStream *openai.ChatCompletionStream) {
    w.Header().Set("Content-Type", "text/event-stream")
    w.Header().Set("Cache-Control", "no-cache")
    w.Header().Set("X-Accel-Buffering", "no")

    flusher := w.(http.Flusher)

    for {
        chunk, err := openaiStream.Recv()
        if err == io.EOF {
            fmt.Fprintf(w, "data: [DONE]\n\n")
            flusher.Flush()
            return
        }

        token := chunk.Choices[0].Delta.Content
        event := fmt.Sprintf(`{"type":"token","text":%q}`, token)
        fmt.Fprintf(w, "data: %s\n\n", event)
        flusher.Flush()
    }
}

The frontend uses the EventSource API with a custom hook that parses event types (sql, token, done, error) and updates React state accordingly. The agent sees tokens appearing one by one — exactly like ChatGPT, but with real CRM data behind it.

The feedback loop

Every AI response has thumbs-up / thumbs-down buttons. This isn't just UX polish — it's the learning mechanism.

Thumbs up: The Q→SQL pair gets stored in query_feedback with is_helpful = true. After 3 upvotes on similar question patterns, the pair gets auto-promoted to intent_examples — it becomes part of the training data that future queries retrieve during the schema retrieval step. The system gets smarter from usage.

Thumbs down: The agent can optionally add a correction note. The bad Q→SQL pair is flagged. The Redis cache entry for that question is immediately invalidated — the next time someone asks the same question, it goes to the LLM fresh instead of serving the cached wrong answer.

A nightly job clusters negative feedback to identify recurring failure patterns — questions the system consistently gets wrong — and flags them to intent_gaps for manual review.

What I'd do differently

Add OpenTelemetry from the start. I added structured logging but not distributed tracing. Being able to see a full trace from HTTP request → schema retrieval → LLM call → SQL execution → SSE stream with timing at each step would have made debugging much faster.

Add a "confidence score" to responses. Right now, Aria answers with equal confidence whether it's 100% sure or guessing. A mechanism to say "I'm not sure this is right — here's what I think you're asking" would make the product more trustworthy.

Separate the schema pipeline into its own service. It's a Python CLI script today. Making it a proper FastAPI service that the Go API can call on-demand (when a new table is detected) would make the incremental update flow smoother.

What this project taught me about Go

Four things I didn't expect before building this:

Go's error handling forces you to think about every failure mode. There's no try/catch that lets you defer the "what if this fails?" question. Every database call, every API call, every Redis operation has an explicit error path. The code is more verbose but the failure modes are all visible.

Interfaces in Go make dependency injection natural. My AIService depends on a SchemaRetriever interface, not a concrete type. Swapping implementations for tests, or switching from pgvector to a different vector store, requires changing one line.

The context package is everywhere and it's the right abstraction. Timeouts, cancellation, request-scoped values (like the authenticated user ID) — all flow through context.Context. Once you understand why, you stop fighting it.

Goroutines are cheap but coordination is hard. Running parallel SQL queries with sync.WaitGroup is easy. Making sure errors from each goroutine are correctly propagated and the context cancels properly when one fails — that took more thought.

The numbers

15 tables with relational integrity
500 seeded leads, 300 tasks, 200 bookings, 150 payments
30 pre-seeded intent examples from the schema pipeline
~90 schema embeddings covering every table and column
Sub-100ms response time on cache hits
~1.5–2s to first token on cache misses (LLM latency)
5 second SQL execution timeout, hard limit at DB level
Read-only PostgreSQL role — no write access possible

The stack

Backend: Go 1.22, Chi router, pgx v5, go-redis v9, official openai-go SDK, golang-jwt

AI layer: GPT-4o for SQL generation and response formatting, text-embedding-3-small for schema embeddings, pgvector (HNSW index) for similarity search

Schema pipeline: Python 3.12, psycopg2, pgvector Python client, openai Python SDK

Frontend: Next.js 14, TypeScript, Tailwind CSS, Zustand for auth state, React Query

Infrastructure: Docker Compose, PostgreSQL 16, Redis 7

The code

Everything is on GitHub: github.com/Deonkar/Aria

It runs with docker compose up. Run make seed to populate the database. Run make schema-pipeline to generate the semantic documentation and embeddings. Then ask it anything about your leads.

The thing to try first: ask it the same question twice. Watch the first response stream token by token. Watch the second response come back instantly from cache with the "SQL executed" badge still visible. That's the difference between a demo and a system.

What's next

The natural next step is the RAG layer — adding support for unstructured data like call transcripts, email threads, and agent notes. "What did we discuss with this lead last week?" can't be answered by SQL alone. That question needs semantic search over unstructured text — which is where pgvector plus proper chunking comes in. The infrastructure is already there. The schema pipeline already uses pgvector for embeddings. Extending it to cover activity bodies and notes is the obvious v2.

The other direction is multi-agent context — letting admins ask questions across all agents ("which agent has the most stale leads this week?") while keeping agent-level queries scoped. The row-level security layer already handles this via the role claim in the JWT. It's a configuration change, not an architecture change.

If you're building something similar or have thoughts on the architecture decisions — particularly around the schema intelligence pipeline or the agent isolation approach — I'd like to hear from you. Drop a comment or find me on LinkedIn.

Why I Stopped Using SQS and Built a Kafka System From Scratch

Onkar — Fri, 01 May 2026 13:10:56 +0000

Why I Stopped Using SQS and Built a Kafka System From Scratch

TL;DR: I'd been using SQS and Sidekiq at work for a year. They worked fine — until I needed five services to independently react to the same event without knowing each other existed. This is the story of what broke, what I built, and what Kafka taught me that queues fundamentally cannot.

The Setup: What I Already Knew

I'm a backend engineer at a PropTech startup. Over the past year I've shipped a lot of infrastructure — event-driven pipelines using AWS SQS, background job queues with Sidekiq and Resque, dead-letter queues, circuit breakers, the works.

I thought I understood event-driven architecture.

I didn't. I understood queues. Those are different things.

The difference didn't click until I tried to build a system where one payment event needed to trigger five completely independent downstream effects — and I realised that with SQS, the architecture I had in my head was simply not possible.

That realisation sent me down a two-week rabbit hole building TxFlow — a payment event orchestrator built entirely from scratch with Kafka (specifically Redpanda), FastAPI, PostgreSQL, Redis, and a Next.js observability dashboard.

This post is what I learned.

The Problem: One Event, Five Reactions

Here's the scenario that broke my mental model.

A payment comes in. You need five things to happen:

Fraud scoring — check if the amount is suspicious
Wallet update — debit the sender's balance
Notification — email the user
Audit log — write an immutable record for compliance
Analytics — increment payment counters

With SQS, my instinct was: fire four separate messages, one per job type. But that immediately creates problems:

What if the wallet update succeeds but the audit log message never gets enqueued?
What if you add a sixth service next month — you have to go back and modify the producer code
If the analytics service is down, does it block everything? Or does that message just disappear?
You want to replay three days of events through a new fraud model — how do you do that if the messages were deleted after consumption?

The SQS mental model is a pipe. One message goes in, one consumer picks it up, it's gone.

Kafka's mental model is a log. The message is written and stays. Any number of independent consumers can read it at their own offset, at their own pace, completely unaware of each other.

SQS:
Producer → [queue] → Consumer A
                   (message deleted)

Kafka:
Producer → [topic: payments.initiated]
                ↓              ↓              ↓              ↓              ↓
          Consumer A      Consumer B      Consumer C      Consumer D      Consumer E
          (fraud)         (wallet)        (notify)        (audit)         (analytics)
          offset: 42      offset: 38      offset: 42      offset: 41      offset: 40

Each consumer has its own offset — its own "bookmark" in the log. They move independently. Consumer B being slow doesn't affect Consumer A. Consumer D being down for an hour doesn't lose any messages — it just falls behind, and catches up when it restarts.

This is the thing that took me a week to fully internalise.

Building TxFlow: The Architecture

The system is simple by design. It runs entirely with docker compose up — no cloud, no real money, no external services. The point is to understand the concepts, not manage infrastructure.

curl → FastAPI (POST /payment)
         │
         ▼
    PostgreSQL        ← outbox_events table (more on this shortly)
         │
         ▼
    Redpanda (Kafka-compatible broker)
    Topic: payments.initiated (3 partitions, 7-day retention)
         │
    ─────┼──────────────────────────────────────────────────
         │         │           │            │           │
    fraud       wallet      notifier      audit     analytics
    consumer    consumer    consumer    consumer    consumer
    group:fraud group:wallet group:notify group:audit group:analytics
         │         │
         ▼         ▼
    PostgreSQL  PostgreSQL
    fraud_      wallets
    assessments table
         │
    failures → payments.dlq topic → dlq_handler consumer

Stack:

Redpanda (Kafka-compatible, runs as a single Docker container — no Zookeeper)
FastAPI + Python (producer + DLQ handler API)
PostgreSQL (state: wallets, fraud assessments, audit log, outbox)
Redis (deduplication keys + analytics counters)
Next.js + TypeScript (observability dashboard)

Let me walk through the three decisions that taught me the most.

Lesson 1: The Dual-Write Problem (and why the Outbox pattern exists)

When I first wrote the payment producer, I did what felt natural:

# The naive approach — DO NOT DO THIS
@app.post("/payment")
def create_payment(req: PaymentRequest):
    payment = save_to_database(req)   # Step 1: write to DB
    publish_to_kafka(payment)          # Step 2: publish to Kafka
    return {"status": "accepted"}

This looks fine. It has a critical flaw.

These are two separate systems. They are not in the same transaction. If the app crashes between step 1 and step 2 — power cut, OOM kill, deployment — the DB has the payment record, but Kafka never received the event. Five consumers are waiting. None of them will ever process this payment. Silently. No error. The data just never flows.

This is called the dual-write problem. You cannot atomically write to two systems that don't share a transaction boundary.

The solution is the Outbox pattern:

@app.post("/payment")
def create_payment(req: PaymentRequest):
    with db.transaction():
        save_to_database(req)

        # Write to outbox IN THE SAME TRANSACTION
        insert_outbox_event(event_id=..., payload=req, published=False)

    # Transaction committed — the outbox record is the source of truth
    # NOW publish to Kafka (outside the transaction)
    try:
        publish_to_kafka(event)
        mark_outbox_published(event_id)
    except KafkaException:
        pass  # Background poller will retry this

    return {"status": "accepted"}

And a background poller runs every 30 seconds:

async def poll_outbox():
    while True:
        unpublished = get_unpublished_events()  # WHERE published = FALSE
        for event in unpublished:
            publish_to_kafka(event)
            mark_outbox_published(event.event_id)
        await asyncio.sleep(30)

Now if the app crashes between the Kafka publish and the mark_outbox_published call, the poller picks it up on the next cycle. The event might be published twice — but that's intentional. "At-least-once delivery" is the guarantee. The consumers handle idempotency (more on that next).

The insight: the DB is your single source of truth. Kafka is your delivery mechanism. Never treat them as equals — make one subordinate to the other.

Lesson 2: Manual Offset Commits (the thing that actually guarantees delivery)

This was the hardest concept to get right, and it's the one most tutorials gloss over.

Kafka tracks where each consumer group is in the log via an offset — a simple incrementing integer. Consumer group "wallet" is at offset 42 means it has processed messages 0 through 41, and is waiting for message 42.

By default, Kafka auto-commits offsets on a timer (every 5 seconds). This creates a dangerous window:

T=0s  — message 42 polled from Kafka
T=1s  — processing begins
T=3s  — AUTO-COMMIT fires — offset 42 committed as "done"
T=4s  — processing throws an exception
T=4s  — consumer restarts
T=4s  — consumer reads from offset 43
          message 42 is GONE. Permanently skipped.

A payment was silently dropped because the offset was committed before the side effect completed.

With manual offset commit:

T=0s  — message 42 polled
T=1s  — processing begins
T=4s  — processing throws an exception
T=4s  — consumer restarts (offset NOT committed — still at 41)
T=4s  — consumer reads message 42 AGAIN and retries it

In code, the base consumer pattern looks like this:

while True:
    msg = consumer.poll(timeout=1.0)
    if msg is None:
        continue

    event = json.loads(msg.value())

    # Check Redis dedup first
    dedup_key = f"wallet:processed:{event['event_id']}"
    if redis.exists(dedup_key):
        consumer.commit(msg)   # already processed — commit and skip
        continue

    # Try to process
    success = False
    for attempt in range(MAX_RETRIES):
        try:
            process_payment(event)   # DB write happens here
            success = True
            break
        except Exception as e:
            wait = BASE_DELAY_MS * (2 ** attempt) / 1000
            time.sleep(wait)

    if success:
        redis.setex(dedup_key, DEDUP_TTL, "1")
    else:
        publish_to_dlq(event, consumer_group="wallet", error=str(e))

    # ALWAYS commit offset — even on failure (DLQ handles the event now)
    consumer.commit(msg)

Two things to notice:

1. Commit happens last, always. Even on failure — once we've published to DLQ, we don't want to keep reprocessing a poison pill event that will never succeed. Commit it, move on.

2. Redis deduplication is required. Because we have at-least-once delivery (producer can publish the same event twice via the outbox poller), and because a consumer might process an event before crashing before committing the offset, the same event can arrive at a consumer multiple times. Without Redis dedup, a payment could debit a wallet twice.

The contract is: Kafka guarantees at-least-once. Your consumer guarantees exactly-once side effects via idempotency.

Lesson 3: The Fan-Out Test (the moment Kafka clicked)

This is the test I ran that made everything concrete.

I fired a single payment event:

curl -X POST http://localhost:8000/payment \
  -H "Content-Type: application/json" \
  -d '{"user_id":"user_001","amount":500,"currency":"USD","idempotency_key":"test-fanout-001"}'

Then I checked every downstream system:

-- Wallet debited
SELECT balance FROM wallets WHERE user_id = 'user_001';
-- 9500.00 ✓

-- Fraud assessed
SELECT risk_level FROM fraud_assessments WHERE event_id = '...';
-- CLEAR ✓

-- Audit record written (immutable, append-only)
SELECT * FROM audit_log WHERE event_id = '...';
-- row present ✓

# Notification logged
docker compose logs consumer-notifier | grep test-fanout-001
# {"event":"notification_sent","user_id":"user_001","amount":500.0,...} ✓

# Analytics incremented
docker exec txflow-redis redis-cli GET analytics:total_payments
# 1 ✓

One API call. Five side effects. Five completely independent services. Zero coupling — none of the consumers know the others exist. The fraud scorer doesn't call the wallet updater. The audit logger doesn't call the notifier. They all just read from the same Kafka topic, each at their own pace.

Then I ran the replayability test.

# Stop the audit consumer
docker compose stop consumer-audit

# Fire 10 more events
./scripts/fire_bulk.sh

# Redpanda Console shows: audit consumer group lag = 10
# Every other consumer: lag = 0 (they processed normally)

# Restart audit consumer
docker compose start consumer-audit

# Watch it replay all 10 missed events from its last committed offset
docker compose logs -f consumer-audit

Every single missed event was processed. The audit log caught up completely.

This is impossible with SQS. When a message is consumed from SQS, it's deleted. There's no log to replay. There's no offset to reset. If the audit service was down when the messages were consumed by the other workers, those events are gone from the queue perspective. You'd have to implement your own replay mechanism from scratch.

With Kafka, replay is not a feature you implement. It's the default behaviour. The log is the database.

Lesson 4: Partitions Are the Scaling Unit

The payments.initiated topic has 3 partitions. Every payment event is published with user_id as the partition key.

This means:

All events for user_001 always land on partition 0 (hash of "user_001" % 3 = 0)
All events for user_002 always land on partition 2
Within a partition, events are strictly ordered

Why does this matter? Because the wallet consumer needs to process user_001's events in order. If two payments arrive simultaneously and get processed in wrong order, the balance calculations could be wrong. Partitioning by user_id gives you a per-user ordering guarantee for free.

The other thing partitions determine: how many parallel consumers you can have in a group.

# Scale wallet consumer to 4 instances
docker compose up --scale consumer-wallet=4

What happens: 3 instances each own 1 partition. The 4th sits idle. You can never have more active consumers in a group than you have partitions. This is Kafka's fundamental scaling model — you scale by adding partitions, not just by adding consumers.

I ran this test with 4 instances, watched the Redpanda Console, and saw exactly one instance sitting there doing nothing. 10 minutes of reading about this is worth less than watching it happen once.

The Observability Dashboard

One thing that separates "I ran Kafka" from "I understand Kafka operations" is knowing what to watch.

The most important metric in Kafka is consumer lag — the difference between the latest message in a topic and the consumer group's current position. Lag = 0 means the consumer is keeping up. Lag = 500 means the consumer is 500 messages behind the producer.

Consumer lag is Kafka's equivalent of Sidekiq queue depth. If it's rising, something is wrong — either your consumer is too slow, there's an exception in the processing logic, or the consumer is down entirely.

The dashboard polls the Redpanda Admin API every 5 seconds and colour-codes lag per consumer group:

Green — lag = 0, all caught up
Amber — lag 1–20, slight backlog
Red — lag > 20, needs attention

I also wired in the DLQ event table and analytics counters. Running fire_bulk.sh (20 events in rapid succession) and watching the lag spike and drain in real time made the whole system feel alive in a way that just reading logs never does.

What I'd Do Differently

Use a schema registry from the start. I serialised events as plain JSON. That's fine for a POC, but in production it's a trap — change the event schema and you silently break consumers. Redpanda includes a schema registry compatible with Avro. I plan to add this as a stretch goal, but I wish I'd built it in from the start.

Add consumer lag alerting earlier. Rising lag is the first signal of a problem. I'd add a simple threshold alert (lag > 50 for > 60 seconds = log a structured alert) as part of the base consumer, not as an afterthought.

Test the poison pill scenario deliberately. A poison pill is a message that will never succeed — malformed data, a downstream service that's permanently broken for that record type. Without handling it explicitly, a poison pill will cause your consumer to retry forever, never advancing its offset, completely blocking that partition. I added DLQ handling but I should've stress-tested it earlier.

SQS vs Kafka — When to Use Which

I want to be clear: this isn't "Kafka is better than SQS." They solve different problems.

Use SQS when:

You have one consumer per job type
Messages can be deleted after processing
You don't need replayability
You're already on AWS and want managed infrastructure
The scale is modest and operational simplicity matters

Use Kafka when:

Multiple independent consumers need to react to the same event
You need replayability (new service, bug fix, backfill)
Event ordering within a key matters
You want to decouple producers from consumers at an architectural level
You're building something that will grow to high throughput

The payment system I described — where fraud scoring, wallet updates, notifications, audit logging, and analytics all react to the same event independently — is a textbook Kafka use case. SQS would work, but you'd be fighting the tool.

What Building This Taught Me About Backend Engineering

The deeper lesson isn't about Kafka specifically. It's about what "event-driven architecture" actually means.

Before this project, I used the phrase "event-driven" to mean "I use a message queue." That's not wrong, but it's incomplete. True event-driven architecture means the event is the primary citizen. Services don't call each other — they react to facts that have already been recorded. The payment happened. The event is a statement of that fact. Any service that cares about it can react. Any service that doesn't, ignores it. Adding a new service next month requires touching zero existing code.

The log is the system. Everything else is a view.

That's the mental model shift that Kafka forces, and it's worth building a project from scratch just to internalise it.

The Code

Everything is on GitHub: github.com/Deonkar/txFlow_build_tasks

It runs with a single docker compose up. Fire a payment with ./scripts/fire_event.sh. Watch five things happen simultaneously.

If you're coming from SQS or RabbitMQ or Sidekiq, the thing to run first is the replayability test — stop one consumer, fire 10 events, restart it, watch it catch up. That 30-second test will reframe how you think about message processing.

If you found this useful or have questions about any of the implementation decisions, drop a comment. I'm particularly interested in hearing from people who've run Kafka at production scale — there's definitely more to learn about partition rebalancing, exactly-once semantics, and schema evolution.

Tags: kafka backend python systemdesign distributedsystems