Forem: HarmanPreet-Singh-XYT

How CodiLay Reads a Codebase the Way a Detective Reads a Crime Scene

HarmanPreet-Singh-XYT — Thu, 19 Mar 2026 07:39:35 +0000

Most documentation tools ask you to write the docs yourself, or they generate something so shallow it barely survives contact with the actual codebase. CodiLay takes a different approach. It reads the code the way an investigator reads evidence — tracing connections, holding open questions, resolving them when the right file comes along, and building a picture that gets more accurate as it goes.

Here's how it actually works under the hood.

The Wire Model

The central abstraction in CodiLay is the wire. A wire represents an unresolved reference — a file imports something, calls something, or depends on something that hasn't been documented yet. The agent opens a wire when it sees the reference. That wire stays alive in the agent's active state, carried forward through subsequent files, until a later file explains the other end. At that point, the wire closes, the connection gets recorded, and it retires from active context permanently.

This is deliberate. Closed wires are never re-injected into future LLM calls. As the codebase grows, the active context stays lean — only what's genuinely unresolved travels forward.

Wires carry type information too. import, call, model, config, event. A routes file importing a service opens an import wire. That service calling into a payment processor opens a call wire. Wires that reach external packages or reference deleted files stay permanently open and surface in the final output as Unresolved References — which is useful information, not a failure.

The Agent Loop, Phase by Phase

Bootstrap strips the project down to what matters: parse .gitignore, merge any additional ignore patterns from config, walk the directory tree, preload existing markdown files.

Triage is a single LLM call that sees only filenames and paths, not file content. It categorizes every file as core (document fully), skim (extract key metadata), or skip (ignore entirely). For a Flutter project this means ios/, android/, build/, and all generated .g.dart and .freezed.dart files disappear before the planner ever sees them. For a Next.js project, .next/ and out/ vanish. The triage phase is autonomous — no user confirmation step.

Planning makes one LLM call against the curated (post-triage) file list. The planner outputs an ordered queue, a list of parked files (too ambiguous to process yet), and a suggested document skeleton including section names and structure.

Processing runs file by file through that queue. For each file, the agent counts tokens with tiktoken, decides whether it goes through single-call or large-file handling, loads the relevant doc chunks, builds a prompt with current wire state and section index, calls the LLM, applies the JSON diff to the docstore, and updates the wire state — closing resolved wires, opening new ones, checking whether any parked files can now be unparked given the new context.

Finalization runs a sequential sweep with full wire context. Parked files get documented with whatever context exists. The Unresolved References and Dependency Graph sections assemble. CODEBASE.md writes out. links.json writes out. The current HEAD commit hash saves to state for the next re-run.

Large File Handling

Files are measured in tokens, not lines. A 500-line TypeScript generics file costs more context than a 1,000-line config file. The default threshold is 6,000 tokens, configurable per project.

Files over the threshold go through a skeleton pass first. The agent reads imports, function signatures, class definitions, and docstrings — no function bodies. This builds the section in the docstore, opens all detectable wires early, and marks the section as detail_pending.

Then the file splits along natural boundaries — class definitions, top-level functions, component edges. If no clean boundaries exist, it splits by token budget with 10–15% overlap between chunks. The overlap is the key detail. Without it, a function that starts at the bottom of chunk N and ends at the top of chunk N+1 gets half-documented or missed entirely. Overlap means the model always has trailing context from the previous chunk.

The skeleton-first order matters for a reason beyond chunking. By the time detail passes run, other files in the queue may have already processed. The LLM reading a function body already knows what calls it and what it returns to — it reads with full context rather than in isolation.

Docstore Architecture

The docstore manages the document as independently addressable sections rather than a flat string. Each section carries invisible metadata:

<!-- section:id=auth-middleware tags=auth,jwt,middleware deps=routes/users.js,routes/orders.js -->
## Auth Middleware
...content...
<!-- /section -->

These markers strip on final output. The section index — a lightweight JSON object always in context — holds only metadata: title, file, tags, deps, which wires it closed, whether it's detail_pending.

When processing a new file, relevant sections load by priority: sections whose deps list includes the current file path, sections whose tags overlap with imports found in the current file, sections flagged by open wires pointing to this file, and the top-level Overview (always loaded). Each LLM call stays bounded regardless of total document size.

Structured Parallelism

Sequential processing of a 50-file codebase means 50 serial LLM calls. Naive parallelism breaks the wire model — two workers processing related files simultaneously produce inconsistent context and miss connections.

CodiLay solves this with a dependency tier model. Before the loop starts, the planner builds a lightweight DAG from structural inference (folder hierarchy, import patterns visible in skim files). Files assign to tiers by their depth in the DAG. Tier 0 is entry points. Tier 1 is files directly imported by tier 0. And so on.

Processing happens tier by tier. Within a tier, all files run in parallel. Between tiers, a hard sync point waits for all workers to finish and reconciles the central wire bus before the next tier begins.

The wire bus is the shared state that makes safe parallelism possible. All workers read from and write to it through atomic operations: open(wire), close(wire_id, summary), peek(file_path), mark_pending(wire_id). Individual doc sections write independently per worker — only wire state is shared and locked.

Every worker takes a frozen snapshot of the wire bus at job start. It processes its file against that snapshot, not a live-updating view. A wire that closes mid-call by another worker doesn't affect the current worker's LLM call. The finalize pass reconciles everything afterward.

Sections generated during parallel processing carry a confidence tag: partial if there were pending wires at generation time. The finalize pass always re-reviews partial sections. Speedup expectations range from 1.5x on deep call-chain monoliths to 5–8x on flat utility repos.

Git Integration

The current HEAD commit hash saves to state after every run. On the next run, git diff <last_commit> HEAD --name-status returns a typed change list — M for modified, A for added, D for deleted, R for renamed.

Modified files re-enter the queue. Any wires that originated from or pointed to them re-open. Their doc sections invalidate. Added files run through a single-file triage call before queuing. Deleted files turn their wires permanently open with a note referencing the commit hash. Renamed files get all their wire from/to fields updated along with section index deps entries, then re-process at the new path.

When git isn't available — no git history, no git binary — the fallback compares file mtimes against timestamps in the state file. Files newer than last_run treat as modified. Files in processed but missing from disk treat as deleted.

Parallelism Safety: Three Failure Modes Eliminated

Missed wire — Worker B documents a file without knowing Worker A opened a wire pointing to it. The wire never closes. The connection disappears from the final doc. The frozen-context snapshot eliminates this: every worker reads the wire bus before starting, and the finalize pass catches anything that slipped.

Partial read — Worker B reads shared wire state mid-update by Worker A, gets half-formed context, and produces inconsistent output. The locked atomic operations on the wire bus eliminate this.

Confident wrong — Worker B has no open wire pointing to its file, assumes self-containment, and hallucinates relationships. The confidence tagging catches this: if pending wires existed at generation time, the section marks as partial and finalize re-reviews it.

Resumption and Cost Protection

The state file rotates through four copies: current, .bak.1, .bak.2, .bak.3. On corruption, automatic fallback cascades through the backups. State files run 50–200KB — negligible disk footprint.

The LLM response cache is the deepest money protection. Before any LLM call, the system checks for a cached response keyed on hash(file_content) + hash(prompt_template) + hash(wire_context_snapshot). A crash between API call and docstore write doesn't lose the money — the response is in cache and replays on resume at zero cost.

The two-phase docstore write ensures this: write LLM response to cache (atomic), write section to temp file, atomically move to final location via os.replace(), update state, rotate backups. A crash at any step after the cache write means the response survives.

Failure routing distinguishes retryable from user-actionable errors. Rate limits get exponential backoff with jitter, up to 5 retries. Timeouts retry once with a longer timeout, then park. Auth errors pause the run and prompt the user to fix the API key — then resume. Disk full pauses and waits. The distinction matters: retrying an auth error silently accomplishes nothing.

The Web UI Architecture

The server is a FastAPI application built with a factory pattern — create_app(target_path, output_dir) scopes the entire app to a specific project via closures. All state is project-local.

Four items cache with mtime invalidation: agent state, wire links, CODEBASE.md, and the TF-IDF retriever. When codilay run or codilay watch updates files on disk, the server picks up changes without restart.

All synchronous operations — LLM calls, file I/O, search indexing — wrap in asyncio.to_thread to keep the async event loop responsive under concurrent requests. Feature modules lazy-import inside their endpoint handlers rather than at module level, which keeps startup fast and isolates import errors.

The chat system runs three layers. Layer 1 renders the static output with an interactive dependency graph from links.json. Layer 2 is a chatbot that answers questions from doc context only. Layer 3, the deep agent, activates when confidence drops below threshold — it reads actual source files, answers with precision, then patches the doc with what it found. The next time the same question arrives, Layer 2 handles it without escalation.

Code Annotation

The annotation system writes wire knowledge back into source files as comments and docstrings. It's the one feature that modifies actual code, which drives the guard design.

requireGitClean blocks annotation if the working tree has uncommitted changes. git checkout . always works as a clean rollback. requireDryRunFirst forces a dry run on the first annotation of any project — nothing writes until the user has seen the unified diff preview.

Annotations insert sorted by line number descending — bottom of file first. This prevents line offset drift. Inserting a block comment at line 10 shifts every subsequent target line number by N. Inserting from the bottom up means each insertion doesn't affect anything above it.

Before writing any file, syntax validation runs. Python goes through ast.parse(). Other languages get structural checks. If validation fails, the original file stays untouched and the annotation moves to the review queue.

The wire connection block in an annotation is the output that no other doc generator produces:

def process_payment(order_id, retry_count=0):
    """
    Charges the customer for a pending order via Stripe.

    Wire connections:
      <- Called by: routes/orders.py (checkout), scheduler/retry_jobs.py
      -> Calls:     stripe.charge.create, notify_fulfillment (Celery task)
      -> Reads:     Order model, Customer.stripe_id

    Retry logic: up to 3 attempts with exponential backoff.
    """

The cross-file relationship shows up directly in the code, not in a separate document that drifts out of sync.

Conversation Search

The search module is a custom TF-IDF implementation with no external dependencies. It tokenizes with a regex that extracts words and code identifiers (get_user, handleClick) while filtering stop words. Augmented TF — 0.5 + 0.5 * (term_count / max_term_in_doc) — prevents long documents from dominating by raw occurrence count. IDF uses log((1 + N) / (1 + df)) + 1 smoothing so terms appearing in all documents don't score to zero.

Snippets extract by scanning for the 120-character window with highest query term density, which ensures the most relevant part of a message shows rather than just the opening.

The index saves to codilay/chat/search_index.json. It's a non-critical cache — if missing or corrupt, it auto-rebuilds from conversation files. First search after fresh install is slower; everything after uses the cached index.

The Audit System

Audits run against the wire graph and doc context with a specific analytical lens. The wire model already knows where auth happens, what touches it, where data enters and exits, what depends on what, where secrets get referenced. An audit agent reads this existing knowledge and does targeted deep dives into the files relevant to the audit type.

Every finding includes the wire path that shows how the vulnerable code gets reached:

FINDING: Unsanitized user input in search handler
Severity:   HIGH
File:       src/routes/search.js  line 47
Wire path:  routes/search.js -> utils/query_builder.js (call)
Evidence:   req.query.term passed directly to buildQuery() with no sanitization
Impact:     buildQuery() constructs raw SQL — potential injection
Fix:        Sanitize req.query.term before passing to buildQuery()

The checklist system matters here. Each audit type loads a specific checklist into the system prompt. The LLM works through it item by item rather than free-associating. Nothing gets skipped, and every finding ties to a specific checklist item.

Audit types split into three tiers. Tier 1 is pure code analysis — security, architecture, performance, dependency supply chain, license audit. CodiLay reads the code and produces findings. Tier 2 adds config file analysis — Dockerfiles, CI pipelines, IAM definitions, cloud resource config. Tier 3 generates specifications for external tooling — OWASP ZAP target lists from route wires, load test targets from high-traffic route analysis, chaos engineering blast radius maps from the dependency graph.

Multiple audit types share file reads in one pass. A security and architecture audit running together surfaces things neither finds alone — a service boundary violation that also creates a security exposure.

CodiLay sits at ~30k lines across ~25+ source files, 500+ passing tests, and 30+ CLI commands. The wire model is the idea everything else builds on. Everything from parallelism to audit reports to annotation safety to cost protection traces back to the same core abstraction: track what you know, track what you don't, resolve unknowns as you go, and never carry dead weight into the next call.

Website - codilay.harmanita.com

The AI That Reads Your Codebase Like a Detective

HarmanPreet-Singh-XYT — Mon, 16 Mar 2026 04:09:19 +0000

Every engineering team has that one project. The one where no one knows exactly what lib/helpers.js does. The one where onboarding takes two weeks because the documentation is a six-year-old Confluence page with broken screenshots. The one where you open the repo and feel the weight of someone else's decisions pressing down on you.

CodiLay starts from the belief that documentation has to be generated, not written — and that the generator should think like an investigator, not a summarizer.

"A detective doesn't read a crime scene and produce a bullet-point list. They trace the wires."

Reading code the way a human does

When an experienced engineer explores a new codebase, they don't read files alphabetically. They find the entry point, follow the imports, note what they haven't seen yet, and circle back. CodiLay replicates this exact behavior through a mechanism called wires.

A wire opens the moment the agent encounters something it hasn't documented yet — an imported module, a called service, an unknown dependency. That wire gets tracked, carried forward in active memory, and prioritized. The agent reorders its reading queue around it. When a later file finally explains the other end of that reference, the wire closes, the link gets recorded, and it retires from memory permanently.

State	Meaning
`OPEN`	Reference seen, target unknown
`CLOSED`	Both ends documented, link formed

Closed wires never re-enter the LLM context. This matters more than it sounds. Most documentation agents bloat their prompts as the codebase grows. CodiLay's context window stays lean by design — each LLM call only carries what's genuinely unresolved.

Triage before analysis

The first thing CodiLay does after reading the file tree is triage. A single LLM call looks at filenames and paths — no file content — and categorizes everything into three buckets: core, skim, and skip.

A Flutter project, for example, gets its ios/, android/, and build/ folders automatically moved to skip. The agent recognizes standard scaffolding and doesn't waste budget analyzing generated code. You get to review the triage decision before anything else runs.

Then the planner takes over — another single LLM call, this time with the curated file list — and produces a reading order. Files get ranked by how many open wires they're likely to close. The ones with no context yet go into a parked queue and get unparked as more of the codebase becomes clear.

Big files don't break it

Files over 6,000 tokens get a two-pass treatment. The first pass extracts only symbols and docstrings, builds a skeleton, and surfaces whatever wires are detectable from the shape of the code. The second pass reads the actual implementation chunk by chunk, filling in the skeleton with real understanding.

The threshold is configurable. The logic — read the shape before reading the substance — stays constant.

What comes out the other end

The final output is a CODEBASE.md — a living document with an overview, component sections for every file, a dependency graph rendered as a table, and an Unresolved References section that surfaces whatever wires stayed open at the end of the run.

That last section is genuinely useful. Open wires that point to external packages are expected. Open wires that point inward — to files that should exist but don't — are potential dead code or missing modules. The agent hands you the signal. What you do with it is up to you.

A machine-readable links.json comes alongside it — a dependency graph you can query, render, or feed into other tools.

When the run ends, the current git commit hash gets stored. Next time you run CodiLay, only the changed files get re-analyzed.

The web layer on top

Beyond the CLI, CodiLay ships with a web reader that turns the generated documentation into an interactive interface. At the base layer, it renders CODEBASE.md and the dependency graph statically. Above that, a chatbot answers questions scoped to the documentation context. When the documentation doesn't have the answer, a deeper agent escalates directly to the source code.

Any insight the deep agent surfaces gets patched back into the permanent doc. The documentation improves itself through use.

What it's built from

The core is Python 3.10+. Token counting runs through tiktoken. The CLI renders through Rich. A FastAPI backend serves the web UI and chat endpoints. Gitignore parsing uses pathspec for full glob compatibility. The LLM layer wraps a unified interface over multiple providers — the underlying model is swappable through config.

The honest part

Documentation tools tend to fall into one of two failure modes: they're too shallow to be useful, or they require so much setup that no one uses them. CodiLay's bet is that a system designed around genuine code comprehension — not just summarization — produces output worth the setup cost.

The wire model is the foundation of that bet. Treat every unknown as something to be resolved, carry only what's unresolved, and retire knowledge the moment it's complete. That's the circuit. CodiLay traces it until the map is done.

Github

What Building Two Gemini-Powered Apps Taught Me About AI, Tokens, and Scope Discipline

HarmanPreet-Singh-XYT — Tue, 03 Mar 2026 02:35:03 +0000

From Hackathon to Impact: What I Built with Google Gemini

This is a submission for the Built with Google Gemini: Writing Challenge

What I Built with Google Gemini

I didn't build one thing with Google Gemini — I built two, each solving a completely different human problem. And honestly, that contrast is what made this experience so valuable.

Project 1: SunRes — The Resume That Actually Gets You the Interview

The Problem: Most people apply for jobs the same way — they spray one resume across dozens of listings and wonder why they never hear back. The resume isn't bad. It's just wrong for the job. Mismatched keywords. Missing skills. Sections that ATS systems silently reject before a human ever sees them.

I've felt this personally as a student applying for internships. So I built SunRes.

What it does:

SunRes isn't just a resume generator — it's a mismatch engine. Users build a master profile (all their projects, skills, achievements), then paste in a job description. Gemini then:

Parses the uploaded resume to auto-fill their profile
Analyzes the job description to extract what the company actually wants
Generates a job–profile match score and explains why the user might get rejected
Suggests targeted fixes to close those gaps
Auto-selects the most relevant projects and achievements for that specific role
Generates a tailored resume and cover letter
Lets users refine everything through a chatbot interface before downloading

The key insight: one profile, infinite tailored resumes. You never start from scratch again.

Stack: Next.js · FastAPI · PostgreSQL · Google Gemini API

Demo: Watch it in action →

Project 2: NorthStar — A Lifeline for People in Crisis

The Problem: When someone is in a genuine crisis — no food, no shelter, a medical emergency — they don't need a cluttered Google search. They need the nearest food bank, the nearest emergency shelter, the nearest free clinic. Right now. Standard map apps prioritize commercial results. That's not good enough.

NorthStar strips away the noise.

What it does:

Using the browser's Location API, NorthStar instantly surfaces the most relevant community resources nearby — organized into four categories: Health, Shelter, Sustenance, and Support. A custom semantic mapping system on the backend translates simple icon clicks into precise, filtered queries (e.g., "Shelters" → "homeless shelter emergency housing"), cutting through commercial noise to surface the places that actually help.

But finding the resource is only half the battle. Once you see a result — a shelter, a clinic, a food bank — you immediately have follow-up questions. Is it wheelchair accessible? What are the hours? Do they take walk-ins? That's where Gemini comes in. NorthStar has a built-in Gemini-powered chat on each result, so users can ask natural questions about any location and get instant, contextual answers — without leaving the app or opening a new search.

The goal: from crisis moment to actionable resource in under two seconds, with the context you need to actually show up.

Stack: React.js · Python/FastAPI · Google Maps & Places APIs · Google Gemini API · Google Cloud

Demo: Watch it in action →

What I Learned

These two projects taught me different things, and I think that's worth unpacking.

SunRes taught me that AI is most powerful when it explains, not just executes. Early versions just generated a resume. That felt hollow — like magic with no instruction manual. The breakthrough was making Gemini articulate why a profile was a weak match and what specifically needed to change. That transparency turned a tool into a coach. Users didn't just get an output; they understood their problem.

NorthStar taught me that infrastructure is a product feature. We spent real time on IAM permissions, API key restrictions, and quota management in Google Cloud Console. That's not glamorous work, but it's what makes something actually deployable and sustainable. Security isn't a post-launch task — it's part of the build.

Both projects taught me scope discipline. Hackathon timelines are brutal. The features you don't build matter as much as the ones you do. Knowing when to stop is a skill.

And perhaps most unexpectedly: clear storytelling matters as much as clean code. A product that people understand and feel connected to will always outperform a technically superior one that nobody gets.

Google Gemini Feedback

I'll be candid here, because I think honest feedback is more useful than praise.

NorthStar (smooth experience): Gemini plays a direct user-facing role here — once someone finds a resource on the map, they can ask Gemini questions about it right on the page: accessibility, working hours, walk-in availability, and so on. The queries are focused and conversational, the responses are short and practical, and it just... worked. Clean outputs, fast responses, no friction. When the scope of what you're asking Gemini is well-defined and contained, it really shines.

SunRes (where things got real): SunRes pushed Gemini much harder. The pipeline was: parse a resume → extract structured profile data → analyze a job description → cross-reference both → generate scored feedback with reasoning → produce a tailored resume → write a cover letter. That's a lot of tokens per request, and I hit walls.

The honest truth: I misconfigured my max_tokens values. I set them too conservatively early on, and responses were getting truncated mid-output — which broke my structured JSON parsing downstream and caused some ugly cascading failures. Once I understood the actual token budget I needed per stage and set appropriate limits, things stabilized.

Was it a Gemini problem? Not really. It was a me problem — I underestimated what I was asking for. But it did surface something worth noting: when you're building multi-step AI pipelines, token budgeting isn't an afterthought. It's architecture. I'd love better built-in tooling or clearer documentation around estimating token usage for complex, chained prompts.

What worked exceptionally well in Gemini: structured reasoning over ambiguous inputs. Resumes come in wildly different formats. Job descriptions are inconsistently written. Gemini handled both with a reliability I didn't fully expect — extracting structured data from messy, real-world documents without me needing to write elaborate parsing logic. That was genuinely impressive and saved significant development time.

What's Next

SunRes: I want to track application outcomes so the match scoring improves over time with real feedback loops. Interview prep tools based on what's actually on the user's resume. Integration with job boards. Skill-gap learning recommendations. The goal is to take candidates from application → interview → offer — not just hand them a prettier PDF.

NorthStar: More resource categories, community-contributed listings to keep data current, and offline capability for users in areas with poor connectivity. The people who need this most are often the people with the worst internet access. That feels like the right problem to solve next.

Building with Gemini forced me to think about AI not as a black box that produces outputs, but as a reasoning layer that can explain its work, handle messy real-world data, and fit into larger product pipelines. That mental shift — from AI as autocomplete to AI as collaborator — is what I'm taking forward.

How Google Search Pagination Works: A Developer's Deep Dive

HarmanPreet-Singh-XYT — Mon, 28 Jul 2025 16:19:24 +0000

As developers, we often take pagination for granted—those simple "Next" and "Previous" buttons at the bottom of search results. But when you're dealing with billions of web pages and millions of queries per second like Google does, pagination becomes a fascinating engineering challenge. Let's explore how this actually works.

What is Pagination, Really?

Imagine you walk into the world's largest library and ask for "all books about cooking." The librarian could dump 10 million books on your desk, but that would crush you (literally). Instead, they bring you 10 books at a time. You can ask for the next 10 whenever you're ready. That's pagination—breaking large amounts of data into digestible chunks.

The Traditional Way vs. Google's Reality

How Most Websites Do It

Think of a regular website like a small bookstore. When you search for "mystery novels," the store:

Counts all 500 mystery books
Shows you books 1-10 on page 1
When you click page 2, it shows books 11-20
Simple and straightforward

Google's Challenge

Now imagine you're not dealing with 500 books, but 100 billion web pages. And instead of one person asking for books, you have 100,000 people asking every second, each wanting different books in different languages, from different locations.

Suddenly, that simple counting method breaks down completely.

The Distributed Library System

Google doesn't have one giant computer holding all web pages. Instead, think of it like this:

The Mega-Library Analogy:
Imagine splitting that massive library into 1,000 smaller libraries spread across the world. When you search for "pizza recipes":

Your request goes to a head librarian (query processor)
The head librarian shouts your request to all 1,000 libraries simultaneously
Each library quickly finds their best pizza recipe books
All libraries report back their top 10 books
The head librarian picks the absolute best 10 from those 10,000 options
You see these as your first page of results

The Deep Pagination Problem

Here's where it gets interesting. What happens when you click on page 50 of Google results?

The Stadium Analogy:
Imagine you're looking for the 500th tallest person in a packed football stadium:

You can't just skip to person #500
You need to measure everyone, rank them by height, then find who's #500
Even worse, people keep entering and leaving the stadium (new web pages appear/disappear)

This is why Google shows "About 2,340,000 results" but won't actually let you click through to page 234,000. It would require ranking millions of results just to show you 10 of them—computationally insane!

The Clever Workarounds

1. The Snapshot Approach

Restaurant Menu Analogy:
When you search, Google takes a "snapshot" of results at that moment—like photographing a restaurant's daily specials board. Even if the specials change 5 minutes later, you're still looking at that original photo as you flip through pages. This ensures consistency but means you might miss super-fresh content.

2. The Priority System

Emergency Room Triage:
Just like hospitals prioritize critical patients, Google prioritizes computing results:

First page: Computed with maximum effort (like treating critical patients)
Pages 2-5: Still high priority (urgent care)
Pages 10+: Lower priority (regular checkup)
Page 50+: "Are you sure you need this?" (elective procedure)

3. The Estimation Game

When Google says "About 45,700,000 results," they're not actually counting. It's like estimating crowd size at a concert:

Look at a small section
Count people in that section
Multiply by the total area
Give an approximate number

It's good enough for users to understand the scale, without counting every single person.

Why You Can't Go Past Page 40-50

The Highway Analogy:
Imagine a highway system where:

90% of drivers exit in the first 5 miles (pages 1-5)
9% exit in miles 5-20 (pages 5-20)
Only 1% go further

Would you build and maintain a 1,000-mile highway for that 1%? Google makes the same calculation—it's not worth the computational cost to support deep pagination when almost nobody uses it.

The Technical Trade-offs

Speed vs. Completeness

Pizza Delivery Analogy:
You can either:

Deliver hot pizza to 99% of customers in 30 minutes
Or deliver to 100% of customers, but everyone waits 2 hours

Google chooses speed for the majority over completeness for everyone.

Freshness vs. Consistency

News Stand Analogy:
When you're browsing newspapers:

Option 1: Keep showing you the same papers as you browse (consistent but potentially outdated)
Option 2: Constantly update with new editions (fresh but confusing)

Google typically chooses consistency within a search session.

Cost vs. Coverage

Buffet Restaurant Analogy:
A buffet could offer every dish in the world, but:

It would cost millions to maintain
99% of food would go to waste
Most people just want common dishes

Similarly, Google limits deep pagination because the cost doesn't justify the rare usage.

Real-World Implications

What This Means for Users

The first few pages matter most: Google puts maximum effort into getting these right
Deep diving has limits: You can't browse all 2 million results—and honestly, you wouldn't want to
Freshness varies: Breaking news might not show up if you're on page 10 of yesterday's search

What Other Sites Learn from Google

The Small Restaurant Principle:
Your local restaurant doesn't need a 50-page menu. Similarly:

Most sites don't need to paginate through millions of results
Focus on helping users find things quickly on page 1
Add filters instead of endless pages
Consider "Load More" buttons instead of page numbers

The Evolution: Mobile and Infinite Scroll

The Escalator vs. Elevator Analogy:

Traditional pagination: Like an elevator (discrete floors/pages)
Infinite scroll: Like an escalator (continuous movement)

Mobile Google often uses infinite scroll because:

Thumb-scrolling is easier than clicking tiny page numbers
It loads results as needed, saving bandwidth
Users feel like they're making progress

The Bottom Line

Google's pagination is like a master chef preparing a meal:

They could show you every possible dish (all results)
But they know you can only eat so much (cognitive limits)
So they carefully prepare the best dishes first (top results)
And limit the menu size (pagination limits)
While keeping everything fresh and fast (performance)

The genius isn't in showing you everything—it's in showing you just enough, just in time, without overwhelming you or their servers. It's a delicate balance of user experience, technical constraints, and business efficiency that Google has refined over decades.

Next time you click "Next page" on Google, remember: you're not just moving through a list. You're experiencing one of the most sophisticated distributed computing systems ever built, disguised as a simple button.

Building Scalable Authentication: The Smart Way to Handle Tokens with Redis and Database Storage

HarmanPreet-Singh-XYT — Mon, 28 Jul 2025 16:10:58 +0000

Introduction: Why Traditional Authentication Falls Short

Imagine you're at a high-security office building. Every time you want to enter a room, you need to show your ID to the security desk, wait for them to verify it against their records, and then get permission. Now imagine doing this hundreds of times per day, with thousands of other employees doing the same. The security desk would be overwhelmed, and everyone would waste time waiting.

This is exactly what happens when we rely solely on database-driven authentication in modern applications. Every API request needs verification, and if each verification hits the database, we create a massive bottleneck. Today, let's explore how combining JWT tokens, Redis, and strategic database usage can solve this problem elegantly.

The Two-Token System: Your Digital Security Badge

Understanding Access and Refresh Tokens

Think of the two-token system like having both a daily visitor badge and a master keycard:

Access Token (The Daily Badge):

Short-lived (typically 15-30 minutes)
Used for every API request
Contains user permissions and basic info
Like a temporary pass that expires quickly

Refresh Token (The Master Keycard):

Long-lived (days or weeks)
Used only to get new access tokens
More secure, stored carefully
Like your employee ID that you use to get new daily badges

The Architecture: Redis + Database Hybrid Approach

System Design Overview

Our authentication system works like a well-organized office building with multiple security checkpoints:

Frontend Security Desk (Client Application)
- Holds both tokens
- Presents access token for every request
- Uses refresh token only when access token expires
Fast-Check Station (Redis Cache)
- Validates access tokens quickly
- Maintains active session information
- Acts like a digital bouncer with a guest list
Main Security Office (Database)
- Stores refresh tokens securely
- Handles long-term user data
- Processes token renewal requests

The Authentication Flow

Let me walk you through the process like a story:

Initial Login - Getting Your Badges:

User provides credentials (username/password)
System verifies against database (the slow but necessary check)
System generates both tokens
Access token info goes to Redis (the fast-access list)
Refresh token goes to database (the secure vault)
Both tokens sent to user

Making Requests - Using Your Daily Badge:

User sends request with access token
System checks Redis (lightning fast)
If valid, request proceeds
If invalid/expired, rejection with "token expired" message

Token Refresh - Getting a New Daily Badge:

User sends refresh token to renewal endpoint
System checks database for refresh token validity
If valid, generates new access token
Updates Redis with new access token info
Sends new access token to user

The Benefits: Why This Design Shines

1. Drastically Reduced Database Load

Traditional Approach:

1000 users × 100 requests/day = 100,000 database hits
Each hit takes ~50ms = 5,000 seconds of database time daily

Redis Hybrid Approach:

Access token checks: 100,000 Redis hits (< 1ms each)
Refresh token checks: ~3,000 database hits (assuming 30-min tokens)
Total database time: ~150 seconds (97% reduction!)

2. Blazing Fast Performance

Redis operates in-memory, making token validation almost instantaneous. It's like the difference between checking a list on your phone versus driving to the library to look it up in a book.

3. Enhanced Security Through Isolation

Compromised access tokens expire quickly
Refresh tokens are accessed rarely, reducing exposure
Each token serves a specific purpose, following the principle of least privilege

4. Scalability Built-In

As your user base grows, you can:

Add more Redis instances (horizontal scaling)
Keep the database focused on critical operations
Handle millions of validations without breaking a sweat

The Challenges: Honest Discussion of Drawbacks

1. Increased Complexity

The Reality: You're now managing two storage systems instead of one.

Mitigation:

Use established patterns and libraries
Document the flow clearly
Implement comprehensive monitoring

2. Redis Dependency

The Risk: If Redis goes down, authentication fails.

Solutions:

Implement Redis clustering for high availability
Have a fallback mechanism to database (with rate limiting)
Use Redis persistence features for quick recovery

3. Token Synchronization Issues

The Challenge: Ensuring Redis and database stay in sync.

Approach:

Implement proper TTL (Time To Live) in Redis
Use event-driven updates when tokens are revoked
Regular cleanup jobs for orphaned entries

4. Storage Overhead

The Cost: Storing session data in two places.

Optimization:

Store minimal data in Redis (user ID, permissions, expiry)
Use Redis memory optimization techniques
Implement intelligent cache eviction policies

Real-World Scenarios and Outcomes

Scenario 1: E-Commerce Platform During Flash Sale

Without Redis:

Database overwhelmed with authentication queries
Legitimate users face timeouts
Lost sales due to poor performance

With Redis:

Authentication remains snappy
Database focuses on order processing
Happy customers, successful sale

Scenario 2: Social Media App with Sudden Viral Content

The Situation: A post goes viral, bringing 10x normal traffic.

Result with Redis Architecture:

Authentication layer handles the spike effortlessly
Users experience no login delays
System administrators sleep peacefully

Scenario 3: Financial Services App

Security Requirement: Immediate token revocation for compromised accounts.

Implementation:

Remove access token from Redis → Instant block
Invalidate refresh token in database → Permanent revocation
User forced to re-authenticate → Security restored

Best Practices for Implementation

1. Token Lifecycle Management

Access Tokens: 15-30 minutes (balance between security and UX)
Refresh Tokens: 7-30 days (based on security requirements)
Redis TTL: Match access token expiry + small buffer

2. Security Considerations

Encrypt sensitive data in tokens
Use secure random generators for token creation
Implement refresh token rotation for extra security
Monitor for unusual refresh patterns

3. Performance Optimization

Batch Redis operations where possible
Use Redis pipelining for multiple checks
Implement connection pooling
Monitor Redis memory usage

4. Error Handling

Graceful fallbacks for Redis failures
Clear error messages for token issues
Automatic retry mechanisms with exponential backoff
Comprehensive logging for debugging

Monitoring and Maintenance

Key Metrics to Track

Redis Performance:
- Hit/miss ratio
- Response times
- Memory usage
- Connection pool health
Token Usage:
- Refresh frequency
- Token expiry patterns
- Failed authentication attempts
- Unusual access patterns
System Health:
- Database query times
- Redis availability
- Error rates
- User experience metrics

Conclusion: The Path Forward

Implementing a Redis-backed authentication system with separate access and refresh tokens isn't just a technical optimization—it's a fundamental architectural decision that pays dividends as your application scales.

The beauty of this approach lies in its elegance: frequently accessed data lives in fast storage, while sensitive, rarely-accessed data remains in secure, persistent storage. It's like organizing your home where everyday items are within arm's reach, while valuable documents stay in a safe.

As developers, we often face the trade-off between complexity and performance. This authentication pattern represents one of those rare cases where a modest increase in complexity yields exponential benefits in performance, security, and scalability.

Whether you're building the next social media giant or a modest SaaS application, implementing this pattern early will save you from painful refactoring later. Your future self (and your users) will thank you when your authentication system handles that unexpected viral moment with grace.

Remember: great authentication is invisible to users but robust for developers. This architecture achieves both, making it a powerful tool in your system design arsenal.