Forem: Sujal Gupta

I Fed React's Entire Hooks Transition History to Gemma 4. Here's What It Found That We Missed.

Sujal Gupta — Sat, 23 May 2026 10:20:38 +0000

"Which commit broke everything?"
Every developer who has inherited a legacy codebase has asked this. We just never had a good way to answer it.

The Problem That Started This

Six months ago I was debugging a production issue in a codebase I'd inherited. The bug had been there for a long time — I could tell because the workarounds had workarounds. But I couldn't figure out when it started, or what change introduced it.

I opened git log. 2,847 commits. Three years of history. Everything was in there — every decision, every mistake, every refactor — locked inside commit messages that ranged from "fix critical auth bug" to "stuff".

I needed a historian, not a search engine.

That's why I built CodeDNA.

What I Wanted to Know

The question I couldn't answer manually: When did this codebase's quality start degrading, and what caused it?

Standard git tools answer how much happened. Commit graphs show velocity. git blame shows who touched what. But none of them answer why a period was chaotic, or connect a March 2019 API change to a June 2019 bug cluster.

That connection requires reasoning across time — holding 180 commits in context and tracing causal chains between them. That's exactly what Gemma 4's Thinking Mode is designed for.

Why Gemma 4 Specifically

I want to be honest about this, because "I used an LLM" is not the same as "I used Gemma 4 intentionally."

Thinking Mode is the reason this project exists.

I tested the same prompt against several models during development. Standard instruction-tuned models produce summaries. They count keywords and report patterns. Gemma 4 with Thinking Mode reasons about patterns — it traces why a cluster of fix commits appeared after a specific API change, not just that they appeared.

The live reasoning stream in the UI is not a gimmick. It's the proof. When you paste your git log and watch the right panel stream Gemma's analysis in real time, you're watching it build the causal chain before it outputs the structured result. That's not post-hoc storytelling — that's the actual analysis process made visible.

128K context is the other prerequisite.

180 commits with file stat data is roughly 35,000–40,000 tokens of compressed history. The only way to detect that a March pivot caused a June bug storm is to have both in the same context window. Without 128K, you're forced into chunking — which destroys the causal chain entirely.

Privacy is structural, not optional.

Your git history contains proprietary module names, security patch descriptions, unreleased feature branches, and often enough context to reverse-engineer your business logic. I built CodeDNA to run under your own API key with zero data retention. This isn't a feature toggle — it's the only way I could imagine a real engineering team actually using it with their private repos.

The Experiment: React's Hooks Transition

I chose the React repository's 2018–2019 Hooks transition period as my primary test case for one specific reason: any developer who knows React can verify the output in 2 minutes.

This is the verifiability test that every other project idea I considered failed. Financial anomaly detection? A judge would need domain expertise. CVE scanning? Knowledge cutoff problems. Food photo analysis? Blurry curry images break the demo. Git history? The raw commits are public. Anyone can check.

I fed Gemma 4 the commits from September 2018 through June 2019. 24 commits in the demo, roughly 180 in a fuller run. The Hooks era: one of the most architecturally significant transitions in any major open-source project.

Here's what it found:

What Gemma 4 Said About React's History

The milestone Gemma 4 identified first: A feature burst in July–September 2018 — Scheduler time-slicing infrastructure (Scheduler.js, 144 insertions in one commit), then React.lazy, Suspense, and createContext v2 added within 6 weeks of each other.

This is factually accurate. Any React developer recognizes this as the foundation-laying period before Hooks went public.

The milestone that surprised me: Gemma 4 flagged January–February 2019 as a stability → bug storm transition, citing ca53456 (fix for useRef) and cb54567 (fix for infinite useEffect loops) within days of the 16.8.0 release. It specifically noted that ReactFiberHooks.js had 8 modifications in this period versus 2 in the preceding stable phase.

I had to look this up to verify. It's correct. The Hooks release in 16.8.0 (February 6, 2019) was followed by a cluster of hotfixes addressing edge cases in the hooks implementation that weren't caught before release. The file-level evidence in the commit stats makes this visible — but only if you're looking across all 24 commits simultaneously, not one at a time.

The health score: 79/100. Breakdown: +15 for high commit message quality, +10 for clear refactor era visible in 2019-05, -10 for 21% bug-fix ratio, and a neutral note for concentrated churn in ReactFiberHooks.js. Every factor is displayed, with evidence. No black-box number.

The Prompt Engineering That Took the Longest

Getting Gemma 4 to produce specific insights — not corporate-speak — was the hardest part of this project. It took three major iterations.

Iteration 1 (bad): Asked the model to "tell me the story of this codebase." It produced beautifully written hallucinations. "The team worked hard to address technical debt during a difficult refactoring period." Sounds good. Means nothing. Not a single commit hash cited.

Iteration 2 (better): Added a list of forbidden phrases: "technical debt", "the team", "seems like", "likely indicates", "possibly", "perhaps". Insights became slightly more factual but still vague. The model would say "there were many fixes in this period" without specifying how many, which period, or what was being fixed.

The breakthrough (Iteration 3): Injecting pre-computed metadata. Before any model call, my preprocessor now extracts:

A monthly commit histogram: 2019-03: 47 commits, 2019-02: 12 commits
Top changed files: ReactFiberHooks.js modified 8 times
Bug-fix ratio: 21% When this metadata is in the prompt, the model's job changes from counting to interpreting. Instead of asking "were there a lot of fixes in early 2019?" I'm telling it "there were 14 fix commits in 6 weeks targeting ReactFiberHooks.js — what does that mean?" The insights became specific and verifiable almost immediately.

The map-reduce split was the second major insight. Asking Gemma 4 to simultaneously produce flowing Thinking Mode prose and valid structured JSON produced neither well. Splitting into Step 1 (reasoning stream — clean markdown, no JSON) and Step 2 (JSON structuring from the reasoning trace) dramatically improved both. The live reasoning panel now shows actual analytical prose. The timeline data is reliably structured.

What I Learned About Gemma 4's Limitations

Small repos produce low-confidence analysis. Under 50 commits, the patterns aren't there yet. CodeDNA handles this gracefully — it labels the output clearly as "micro-analysis" rather than inventing dramatic narratives. But the wow factor is absent.

Vague commit messages are the real enemy. A repo full of "fix", "update", "wip" commits gives Gemma very little signal to reason from. The model tries — it picks up on date clustering and file patterns — but the confidence is honest: "data_quality: low", "insufficient evidence for causal narrative". I considered trying to work around this but decided against it. Honest uncertainty is more valuable than confident fabrication.

Model size vs. speed tradeoff is real. The 31B model produces noticeably richer reasoning. The 26B model is faster and more reliable on the free API tier. I defaulted to 26B primary with 31B fallback, and added OpenRouter as a second fallback layer with dynamic model discovery. For a solo developer on a laptop with no GPU, the API-first approach was the only realistic path.

The reasoning stream can't gracefully fall back mid-stream. If the primary model fails after the SSE connection is open, the stream errors rather than switching providers. The JSON structuring call (which runs in parallel) handles fallback correctly, but the stream panel may show an error while the timeline still renders successfully. I documented this honestly rather than hiding it.

The Moment That Made It Worth Building

There's a specific interaction that convinced me this tool has real value beyond the hackathon.

I ran CodeDNA on my own FundTrace project — a fraud detection system my team built for a hackathon earlier this year. 47 commits, 3 months.

Gemma 4 flagged a stability period followed by a pivot, then a concentrated feature burst in the final week before submission. The health score was 61/100. The main reason: 38% bug-fix ratio in the final 10 days.

That's exactly what happened. The last week was a scramble. We knew it at the time. Seeing it reflected back as a pattern — with specific commit references and a calculated ratio — was oddly clarifying. Not because it told us something we didn't know, but because it quantified what we felt.

That's the use case I want to build toward: not just forensics on old code, but a continuous mirror for engineering health.

Try It on Your Repo

git clone https://github.com/acchasujal/codeDNA.git
cd codeDNA/backend
pip install -r requirements.txt
cp .env.example .env
# Add your Google AI Studio key
uvicorn main:app --reload

# New terminal:
cd ../frontend && npm install && npm run dev

Then run this in any repository you're curious about:

git log --stat | head -3000 > my_history.txt

Upload the file. Click Analyze.

What would Gemma 4 find in your codebase?

Post your results in the comments — I'm genuinely curious what it finds in different projects and domains.

What's Next

This is v1. The things I want to build next:

GitHub URL input — analyze any public repo without manual log export
Trend alerts — "your current sprint's bug-fix ratio is 2x your baseline"
Team patterns — author-level analysis (with appropriate consent and privacy controls)
CI/CD integration — flag risky commit pattern spikes as part of a PR check The core insight — that Gemma 4's Thinking Mode + 128K context makes this class of analysis possible — still has a lot of room to run.

GitHub:

acchasujal / codeDNA

CodeDNA — AI Codebase Archaeologist

Feed Gemma 4 your git history. Discover exactly when — and why — your codebase evolved.

Every codebase has a turning point. The moment before is clean commits and clear intent The moment after is hotfixes, reverts, and growing entropy. CodeDNA finds it.

What It Does

Maps your codebase history with Gemma 4 — up to 400 commits, preprocessed and compressed for maximum analytical signal. The preprocessor extracts monthly commit histograms and per-file change frequency before sending to the model, so insights are grounded in observable data.

Returns a structured archaeological report — health score with transparent breakdown, milestone timeline (bug storms, refactors, pivots, feature bursts), and key metrics. Every claim cites a specific commit hash, date, or metadata value.

Streams Gemma 4's live reasoning — watch the Thinking Mode trace in real-time as the model identifies causal patterns across years of history. Verifiable: the…

View on GitHub

Built solo in 4 days for the Google Gemma 4 Challenge. The tool's own git history is its own best demo.

If this was useful or interesting, a reaction means a lot — it's the tiebreaker in this challenge and tells me the problem resonates.

CodeDNA: AI Codebase Archaeologist Built with Gemma 4 Thinking Mode

Sujal Gupta — Fri, 22 May 2026 22:01:12 +0000

You inherited this codebase 6 months ago. You can feel something went wrong around 2021. Bug reports spiked. Velocity dropped. The original authors left. The commit history has 3,000 entries — and every answer is in there.

Nobody has time to read 3,000 commits.

CodeDNA does.

What I Built

CodeDNA is an AI Codebase Archaeologist. You paste your git log, and Gemma 4 — using Thinking Mode — reconstructs the story of your codebase: bug storms, architectural pivots, refactor eras, feature bursts, and an overall health score with a transparent breakdown.

The output is 100% verifiable. You can check every milestone against your actual commit history. No hallucinated CVEs, no unverifiable financial claims — just pattern-extracted facts from structured text you already own.

GitHub:

acchasujal / codeDNA

CodeDNA — AI Codebase Archaeologist

Feed Gemma 4 your git history. Discover exactly when — and why — your codebase evolved.

Every codebase has a turning point. The moment before is clean commits and clear intent The moment after is hotfixes, reverts, and growing entropy. CodeDNA finds it.

What It Does

Maps your codebase history with Gemma 4 — up to 400 commits, preprocessed and compressed for maximum analytical signal. The preprocessor extracts monthly commit histograms and per-file change frequency before sending to the model, so insights are grounded in observable data.

Returns a structured archaeological report — health score with transparent breakdown, milestone timeline (bug storms, refactors, pivots, feature bursts), and key metrics. Every claim cites a specific commit hash, date, or metadata value.

Streams Gemma 4's live reasoning — watch the Thinking Mode trace in real-time as the model identifies causal patterns across years of history. Verifiable: the…

View on GitHub

The Problem It Solves

You inherit a codebase. Something went wrong around late 2021 — you can feel it. Bug reports spiked, velocity dropped, the original authors left. The commit history has everything, but nobody has time to read 3,000 commits manually.

Traditional tools give you graphs of commit frequency. That tells you how much happened, not what happened or why one period was chaotic and another stable.

CodeDNA uses Gemma 4's Thinking Mode to reason across your entire commit history and surface the narrative that was always there.

Live Demo

The live demo in action: CodeDNA processing the React repository’s architectural transition history.

Core Features

Feature	Description
Animated timeline	Color-coded milestones — red = bug storm, yellow = refactor, green = pivot, blue = feature burst
Health score + breakdown	0–100 score with transparent factor table (not a black-box number)
Live Thinking Mode stream	Watch Gemma 4 reason step-by-step as it analyzes your history
Smart preprocessing	Caps at 180 commits, extracts monthly histograms and file hotspots before inference
Multi-provider fallback	Google AI Studio (26B → 31B) → OpenRouter (gemma-2-27b-it → gemma-3-12b-it → more)
Analysis caching	Same git log = instant results on repeat runs
Markdown export	Download a complete archaeological report
Messy commit handling	Detects vague history and gives honest, low-confidence analysis instead of hallucinating

Screenshots

The timeline builds milestone by milestone. Red = bug storm, yellow = refactor, green = pivot.

Health Score is never a black-box number. Every factor cites commit evidence.

The reasoning panel shows Gemma 4's step-by-step analysis as it happens. This is Thinking Mode — not post-hoc summarization.

Architecture

git log --stat (your paste or .txt upload)
        ↓
preprocessor.py
  → parse commits, build monthly histogram, extract file hotspots
  → metadata header injected: MONTHLY_COUNTS, TOP_CHANGED_FILES, BUG_FIX_RATIO
        ↓
Step 1: Reasoning Stream (REASONING_SYSTEM_PROMPT)
  → Gemma 4 Thinking Mode streams clean markdown report
  → Visible live in right panel
        ↓
Step 2: JSON Structuring (JSON_SYSTEM_PROMPT)
  → Separate Gemma call converts reasoning → typed AnalysisResult JSON
  → Pydantic v2 validates schema
        ↓
React UI
  → Health Score ring + breakdown table (center, always visible)
  → Animated vertical timeline (left)
  → Live reasoning stream (right)
  → Markdown export

Map-reduce design: By splitting reasoning (Step 1) from JSON structuring (Step 2), Thinking Mode output is clean prose instead of polluted with schema enforcement constraints. Insight quality is significantly higher.

Stack:

Backend: FastAPI + httpx async + SSE streaming
Frontend: React 18 + Vite + Tailwind CSS
LLM: Gemma 4 via Google AI Studio (primary) + OpenRouter (fallback)
State: In-memory + disk cache (no database)

Why Gemma 4 — Not "Just Any LLM"

This is the most important section for me to get right.

1. Thinking Mode for causal chain reasoning — not summarization

Standard completion models count keywords. Gemma 4's Thinking Mode traces why patterns emerged. When it sees 14 "fix" commits targeting ReactFiberHooks.js in a 3-week window after a large API change, it connects them causally — it doesn't just report a spike.

The live reasoning stream in the UI makes this directly observable. Judges (and users) can watch Gemma's chain-of-thought in real time. This is the intentional use criterion — not decorative AI, but AI whose reasoning process is the deliverable.

2. 128K context — the archaeology window

180 commits × ~200 tokens each = ~36K tokens of compressed history in one request. No chunking, no context loss, no multi-call stitching. Gemma 4 holds the full narrative arc in one reasoning window, which is the only way to detect multi-month causal patterns (e.g., a March 2019 API change causing a June 2019 bug cluster).

3. Structured output drives the UI deterministically

The JSON schema is strict (Pydantic v2 validated). If Gemma returns valid JSON, the timeline renders. If not, the error is surfaced honestly. No post-processing guesswork.

4. Privacy-first by design

Git history contains proprietary code, unreleased feature names, security patches, and competitive intelligence in commit messages. CodeDNA passes everything under your own API key. Zero data retention. This is not a UX choice — it's the only architecture engineering teams will actually trust with real repositories.

Demo: React Hooks Era (2018–2019)

I ran CodeDNA on React's public git history during the Hooks transition — one of the most architecturally significant periods in any major open-source project.

What Gemma 4 found:

2018-07: Feature burst — Scheduler time-slicing and Fiber pool infrastructure added (5 commits, Scheduler.js dominant)
2018-09–10: Pivot — React.lazy, Suspense, and createContext v2 introduced across 6 commits
2019-01–02: Stability → Bug storm — 4 rapid fixes for useRef and useEffect infinite loops following the 16.8.0 release
2019-05: Feature burst — useTransition, useDeferredValue, unstable_createRoot (5 commits, ReactFiberHooks.js dominant)

Health score: 58/100 — justified by 21% bug-fix ratio, two high-severity bug storms in 2019-01 and 2019-02, partially offset by clear feature burst eras and high commit message quality (83% of commits have descriptive messages ≥8 words).

Quick Start

# Clone
git clone https://github.com/acchasujal/codeDNA.git
cd codeDNA

# Backend
cd backend
pip install -r requirements.txt
cp .env.example .env
# Add your Google AI Studio key as GEMINI_API_KEY
uvicorn main:app --reload

# Frontend (new terminal)
cd ../frontend
npm install
npm run dev
# Opens http://localhost:5173

Get your git log:

# Any repo you have locally:
git log --stat | head -3000 > my_history.txt
# Upload the .txt file or paste directly

# React demo (what the screenshots use):
git clone https://github.com/facebook/react
cd react
git log --stat --after="2018-09-01" --before="2019-06-01" | head -3000 > react_hooks.txt

.env.example:

GEMINI_API_KEY=your_google_ai_studio_key_here
GEMMA_MODEL=models/gemma-4-26b-a4b-it
MAX_COMMITS=180
OPENROUTER_API_KEY=optional_for_fallback

Technical Highlights

Multi-provider fallback chain — At startup, CodeDNA queries the OpenRouter API to dynamically discover available Gemma models and builds a priority chain. Google AI Studio is primary; OpenRouter provides up to 9 additional Gemma models as fallback. The chain is logged at startup so you always know what's running.

Preprocessor intelligence — Before any model call, the preprocessor extracts a MONTHLY_COMMIT_COUNTS histogram and TOP_CHANGED_FILES list from the raw git log. This ground-truth metadata is injected directly into the prompt, so Gemma cites real numbers ("commit count tripled to 47 in March 2019") rather than inferring from prose.

Anti-fluff enforcement — The system prompt contains an explicit FORBIDDEN_PHRASES list ("technical debt", "the team", "seems like", "likely indicates", and 12 others). Every insight must cite a specific commit hash, date, file name, or count — or say "insufficient evidence."

Honest confidence — Every milestone includes a confidence field (high | medium | low) with a justification sentence. Low-quality commit histories get a QUALITY_WARNING header and produce conservative, clearly-labeled micro-analyses rather than dramatic fabrications.

The Reasoning System Prompt

The full prompt that drives Step 1 (the reasoning stream):

See the REASONING_SYSTEM_PROMPT

You are CodeDNA, a concise git-history analyst.
Produce a clean public report, not private reasoning.

Rules:
- Output markdown prose only. No JSON. No code fences.
- No meta-commentary, self-correction, planning notes, or internal monologue.
- Never write "wait", "I used", "the prompt says", or any phrase from this
  forbidden list: technical debt, the team, engineers, developers, working hard,
  prioritized, decided to, management, business logic, seems like, appears to,
  it looks like, likely indicates, possibly, perhaps, might have.
- Use only observable evidence from the metadata header and commit log.
- Cite commit hashes, dates/months, file names, commit counts, and ratios
  whenever making a claim.
- If evidence is thin, say "insufficient evidence" and name the missing signal.
  Do not invent intent, people, architecture, risk, or causality.
- Keep every sentence useful. Avoid repetition.

Format exactly:
## Overview
Two to three factual sentences covering commit count, date range,
most changed files or file types, and BUG_FIX_RATIO.

## Milestones
Four to eight bullets when evidence allows. Each bullet:
- **YYYY-MM** - type - concise evidence sentence with commit hash(es),
  changed file(s), and count(s).
  Allowed types: bug_storm, refactor, pivot, feature_burst, stability.

## Health Signals
Three bullets: one positive signal, one negative signal, one confidence note.
Each bullet must cite evidence.

## Churn Summary
One concise sentence naming the peak period and the files or commits behind it.

The Hardest Problem: Making Gemma Say Something Real

The biggest technical challenge wasn't the UI, the SSE streaming, or the fallback chain. It was getting Gemma 4 to produce specific, verifiable insights instead of confident-sounding nonsense.

Here's what the first version produced on a repo with commits like "fix navbar bug", "update readme", "refactor utils":

"This period reflects a time of organizational growth and technical maturity. The team worked hard to address accumulated complexity while balancing feature delivery with stability concerns."

That output is useless. It contains zero commit references, zero file names, zero numbers. A junior consultant could have written it without looking at the code. A judge would mark it dead on arrival.

Three iterations to fix it.

Iteration 1 — Forbidden phrases list.
Added an explicit blocklist to the system prompt:

FORBIDDEN PHRASES — never use these:
"technical debt", "the team", "engineers", "developers",
"working hard", "prioritized", "decided to", "management",
"seems like", "appears to", "it looks like", "likely indicates",
"possibly", "perhaps", "might have"

The output became less flowery but still vague: "There were many fixes in early 2019." How many? Which files? Which period exactly?

Iteration 2 — Mandatory evidence citation.
Added to the prompt: "Every milestone description must cite at least one commit hash, date/month, file name, count, or ratio. If you cannot cite evidence, write 'insufficient evidence' and stop."

Better, but Gemma was still counting commits itself — and sometimes miscounting.

Iteration 3 — Pre-computed metadata injection (the breakthrough).

Instead of asking Gemma to figure out what happened, I tell it what happened and ask it to interpret it.

The preprocessor now builds a metadata header before any model call:

# META: 180tot|180ana|Q:HIGH|Fx:21%|Vg:0%
# DATES: 2019-06-20..2018-07-02
# MONTHS: 2018-09:3,2018-10:3,2019-01:4,2019-02:2,2019-05:5,2019-06:2
# HOTSPOTS: ReactFiberHooks.js:8,Scheduler.js:5,package.json:4

Now instead of asking "were there a lot of fixes in early 2019?", I'm asking "given that commits spiked to 5 in 2019-05 and ReactFiberHooks.js was modified 8 times — what does that pattern indicate?"

The model's job shifted from counting to interpreting. The output became:

"2019-01 through 2019-02 saw 6 commits (bf32345, ca53456, cb54567, cc55678, cd56789, ce57890) concentrated in ReactFiberHooks.js and ReactFiberBeginWork.js. ca53456 fixed an incorrect useRef identity across re-renders; cb54567 resolved an infinite useEffect loop triggered by object dependency comparison. The 16.8.0 release on 2019-02-06 (cd56789) was followed two days later by ce57890 — a hooks state regression fix, indicating at least one edge case reached production."

Every claim is checkable. Every hash is real. That's the difference.

The map-reduce split was the second breakthrough.

Asking Gemma 4 to simultaneously produce flowing Thinking Mode prose and valid JSON produces neither well. I split it:

Step 1 (stream): REASONING_SYSTEM_PROMPT — output clean markdown only, no JSON, no schema constraints
Step 2 (analyze): JSON_SYSTEM_PROMPT — read the reasoning trace, output strict AnalysisResult JSON The reasoning panel now shows actual analytical prose. The timeline data is reliably structured. Both improved dramatically when separated.

Limitations (Honest)

Works best with 100–200 commits. Very large histories (1000+) need more aggressive preprocessing.
Commit message quality determines insight quality. A repo full of "fix", "wip", "update" commits will produce low-confidence analysis (CodeDNA tells you this clearly rather than inventing drama).
The reasoning stream uses the primary model; fallback models handle JSON structuring. If all Google models are slow, the stream may be empty — but the timeline will still render from the fallback result.
Currently runs locally only. Cloud deployment would require careful handling of API key security.

What's Next

Actual GitHub API integration (analyze any public repo by URL, no manual log export)
Branch comparison (main vs. feature branch health)
Team velocity metrics (authors per period, bus factor analysis)
CI/CD integration — run CodeDNA as a PR check to flag risky commit patterns

Built solo in 4 days for the Google Gemma 4 Challenge. Every commit in this repo is real — you can run CodeDNA on its own history.

GitHub:

acchasujal / codeDNA

CodeDNA — AI Codebase Archaeologist

Feed Gemma 4 your git history. Discover exactly when — and why — your codebase evolved.

Every codebase has a turning point. The moment before is clean commits and clear intent The moment after is hotfixes, reverts, and growing entropy. CodeDNA finds it.

What It Does

Maps your codebase history with Gemma 4 — up to 400 commits, preprocessed and compressed for maximum analytical signal. The preprocessor extracts monthly commit histograms and per-file change frequency before sending to the model, so insights are grounded in observable data.
Returns a structured archaeological report — health score with transparent breakdown, milestone timeline (bug storms, refactors, pivots, feature bursts), and key metrics. Every claim cites a specific commit hash, date, or metadata value.
Streams Gemma 4's live reasoning — watch the Thinking Mode trace in real-time as the model identifies causal patterns across years of history. Verifiable: the…

View on GitHub