Forem: Kumaraswamy Chavvakula

I built a tool that catches misleading charts using Gemma 4 running locally

Kumaraswamy Chavvakula — Mon, 25 May 2026 06:57:50 +0000

This is a submission for the Gemma 4 Challenge: Build with Gemma 4

What I Built

You know the charts that look dramatic but are actually showing a 3% change? The Y-axis that conveniently starts at 95 instead of 0. The 3D pie chart whose slices somehow add up to 108%. The stock line that’s “up 59.5%!” — over a five-month window hand-picked from a bad year.

I see these constantly — news, earnings decks, social posts — and it bugs me every time. So I built DataDetective: drop in any chart image and Gemma 4 gives you a forensic breakdown — what manipulation tricks are in play, an integrity score from 0–100, what the chart actually shows vs. what it wants you to think, and how to fix it.

The whole thing runs locally through Ollama. No API keys, no cloud, nothing leaves your machine — which matters when the thing you’re analyzing is an internal financial chart or a competitor’s deck.

It ships with three intentionally-misleading sample charts (a truncated bar chart, a cherry-picked line, and that impossible 108% pie) so you can see it work in one click, plus drag-and-drop for your own images.

Demo

Repo: github.com/kumarsparkz/datadetective

# grab Gemma 4 through Ollama (e4b runs comfortably on a laptop)
ollama pull gemma4:e4b      # ~9.6 GB; or gemma4:26b if you have the RAM
ollama serve

# serve the app — it's just static files
python3 -m http.server 8080
# open http://localhost:8080

Green dot = Ollama connected and a Gemma 4 model detected. Click a sample or upload a chart.

Here’s what it actually returns (measured on gemma4:e4b, not aspirational):

Chart	Trust score	What Gemma 4 flagged
108% pie chart	35 / 100	`[high] Inconsistent totals (sum > 100%)` — “the parts sum to 108%, not 100%”
Cherry-picked stock line	35 / 100	`[high] Cherry-picked time range` + promotional language
An honest bar chart (control)	95 / 100	nothing — “a highly effective and honest visualization”

That last row is the one I’m proudest of. A tool that flags everything is useless. The honest chart scoring 95 next to the pie scoring 35 is what makes this feel like it’s reasoning, not pattern-matching for keywords.

How I Used Gemma 4

Why local, why Gemma 4

Privacy is the real reason. If you’re analyzing your company’s revenue charts or a competitor’s investor deck, shipping those images to a cloud API feels wrong. Local means the data literally never leaves the machine. Gemma 4’s open weights make that possible, and it handles multimodal input natively — you POST to localhost:11434/api/chat with the model, your messages, and an images: [base64] array. No separate vision encoder, no plumbing.

The thing that actually made it work: let the model think first

Here’s the part worth reading if you build on local models.

My first version used Ollama’s format: 'json' flag. It felt great — guaranteed parseable JSON, no regex-ing it out of markdown. But the analysis quality was quietly terrible on the subtle cases. I fed it the classic truncated-axis bar chart (Y-axis starting at $95M so a 5% rise looks enormous) and it returned a trust score of 90–95 and didn’t flag the axis at all — three times in a row. It would read the axis labels correctly and then conclude the chart “accurately represents the increase.”

The problem wasn’t the prompt. It was that format: 'json' forces the model to emit the JSON object immediately, with no room to reason first. A small model like e4b needs to work through “the axis starts at 95, not 0, therefore the bars exaggerate a 5% change” in plain text. JSON mode amputates exactly that step.

So I dropped format: 'json' and restructured the prompt into an explicit procedure — reason out loud through axis baseline, pie totals, time window, and language, then output the final answer inside a JSON code fence. Same model, same chart:

108% pie: caught the bad total as a high-severity flag, score dropped to 35.
Truncated axis: started naming the non-zero baseline instead of waving it through.
Honest chart: still scored 95 — so the extra scrutiny didn’t make it paranoid.

The core call now looks like this — note the absence of format:

const response = await fetch('http://localhost:11434/api/chat', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({
    model: 'gemma4:e4b',
    messages: [
      { role: 'system', content: FORENSICS_PROMPT },   // includes a step-by-step procedure
      { role: 'user', content: 'Reason step by step, then return JSON...', images: [base64Data] }
    ],
    stream: false,
    options: { temperature: 0.3, num_predict: 4096 }   // low temp = consistent forensics
  })
});

Then I parse the JSON out of the fenced block — and keep the reasoning text before it.

Bonus: the reasoning became a feature

Because the model now thinks in plain text before answering, I had its actual forensic reasoning sitting right there. So I surface it in a collapsible “Gemma 4 Reasoning” panel. You can watch it add up the pie slices and catch the 108% itself. That transparency — showing the work, not just a verdict — turned out to be the most compelling thing in the whole app.

Being honest about the limits

e4b is the small variant, and it shows. It nails cherry-picking and impossible pie totals every time, but the truncated-axis case it catches maybe 3 runs out of 4, and as a medium issue rather than a high one. gemma4:26b (26B params, only ~3.8B active per pass thanks to the MoE design) handles it far more decisively — the architecture scales cleanly, you just trade RAM and a few seconds of latency. I built and tuned everything against e4b specifically to prove the concept works on hardware people actually have.

The frontend

Zero dependencies — HTML, CSS, vanilla JS. Dark glassmorphism theme, an animated SVG trust gauge (stroke-dasharray), staggered result cards, and system/light/dark themes. The three sample misleading charts are drawn with the Canvas API so there are no external image assets.

What I Learned

The headline lesson: on local models, JSON mode is a trap for any task that needs reasoning. Convenience at the parsing layer cost me the model’s entire analytical capacity. Letting Gemma 4 think out loud first — and parsing the JSON out of the tail — was the difference between a tool that rubber-stamps misleading charts and one that actually catches them. And it handed me a transparency feature for free.

Team

Solo project — just me and an unreasonable number of misleading charts I’ve been annoyed by over the years.

Building LinguaLive: A Real-Time AI Language Tutor with Gemini Live API

Kumaraswamy Chavvakula — Mon, 16 Mar 2026 20:40:45 +0000

This blog post was created for the purposes of entering the Gemini Live Agent Challenge hackathon. #GeminiLiveAgentChallenge

The Problem: Language Learning Feels Disconnected

We've all downloaded Duolingo, done the first week religiously, and then... stopped. Why? Because language learning apps are fundamentally disconnected from real life. You're matching words on a screen when what you actually need is someone patient sitting next to you, pointing at things, and helping you build vocabulary from your own world.

Immersion is widely regarded as one of the most effective approaches to language acquisition — but it typically requires expensive human tutors or living abroad. What if AI could bring that immersive experience to everyone?

The Idea: Point Your Camera, Learn a Language

LinguaLive is a real-time AI language tutor named Luna that:

Sees through your camera and teaches you words for objects in your environment
Hears your pronunciation and gives specific, actionable feedback
Speaks back with native-sounding voices via the Gemini Live API
Generates custom visual flashcards using Imagen 3
Adapts to the learner's pace — the system prompt instructs Luna to simplify when the learner struggles and increase difficulty when they're doing well

No text boxes. No multiple choice. Just a natural conversation where you point your camera at your kitchen and Luna teaches you cooking vocabulary in Spanish.

The Tech Stack

Here's what powers LinguaLive:

Component	Technology
Real-time AI	Gemini 2.0 Flash Live API (bidirectional audio/video streaming)
Agent Definition	Google ADK (Agent Development Kit) for agent structure and tool registration
Live Streaming	Google GenAI SDK (`client.aio.live.connect()`) for real-time bidirectional streaming
Image Generation	Imagen 3 on Vertex AI
Backend	Python 3.11, FastAPI, WebSocket
Data Persistence	Cloud Firestore
Asset Storage	Cloud Storage
Hosting	Cloud Run (auto-scaling, session affinity)
CI/CD	Cloud Build
Frontend	Vanilla HTML/JS, Web Audio API, MediaDevices API

How It Works: The Multimodal Loop

The core of LinguaLive is a multimodal streaming loop:

Voice In → User speaks in their target language (PCM 16kHz via Web Audio API)
Camera In → Browser captures JPEG frames at ~1fps and sends to Gemini
Voice Out → Gemini responds with native audio (PCM 24kHz)
Image Out → Imagen 3 generates flashcard illustrations for key vocabulary on demand

This happens over a single WebSocket connection. The browser captures audio via AudioWorklet (with a ScriptProcessor fallback) and camera frames via getUserMedia(). The FastAPI backend bridges these to the Gemini Live API's bidirectional streaming endpoint.

Key Technical Decisions

Why WebSocket Instead of REST?

The Gemini Live API uses bidiGenerateContent — a bidirectional streaming endpoint. REST would add significant latency for real-time conversation. Our WebSocket carries audio chunks (~250ms each) and video frames interleaved, keeping the conversation feeling natural and responsive.

ADK + GenAI SDK: Why Both?

We use ADK to define the agent — Luna's persona, system instruction, and 7 registered tools. But for the actual Live API streaming, ADK's standard runner doesn't support real-time bidirectional audio/video, so we use the GenAI SDK's client.aio.live.connect() directly. This gives us:

Real-time PCM audio streaming in both directions
Live video frame ingestion
Function calling mid-stream
Input and output audio transcription

Luna's 5 active Gemini tools (the ones declared to the model):

get_session_progress — returns real-time learning stats
get_vocabulary_quiz — generates adaptive quizzes from learned words
detect_scene — identifies environments for themed vocabulary lessons
identify_objects_in_view — processes camera object detection
generate_flashcard_image — creates Imagen 3 visual flashcards

(Vocabulary and pronunciation tracking happen automatically via output transcription to avoid interrupting the audio stream with tool calls.)

Grounding to Reduce Hallucinations

A language tutor that invents translations is worse than no tutor at all. We added explicit grounding rules to Luna's system prompt: only teach words she's confident about, only identify camera objects she can clearly see, and acknowledge uncertainty rather than guessing. This doesn't eliminate hallucination entirely, but it significantly reduces it in practice.

Firestore for Returning Learners

Session data persists to Cloud Firestore, enabling a "welcome back" experience. When a learner returns, Luna knows what words they learned last time and builds on that foundation rather than starting over.

Keeping the Audio Stream Smooth

In a real-time voice app, anything that blocks the event loop causes audible stuttering. Two patterns were key to keeping audio smooth:

Async-safe Firestore initialization. Multiple WebSocket connections can arrive simultaneously at startup. Without protection, each could try to create a Firestore client at the same time. We used asyncio.Lock() with a double-check pattern inside _init_firestore() to ensure the client is created exactly once, without blocking the event loop.

Background flashcard generation. Imagen 3 takes 3–8 seconds to generate an image. If we awaited that inside the receive loop, audio would freeze. Instead, we respond to Gemini immediately with a "pending" status and spin up the actual generation as a background task via asyncio.create_task(). When the image is ready, it's pushed to the client over the WebSocket independently of the audio stream.

The Hardest Part: Audio Reliability

Getting real-time audio working reliably across browsers was the biggest challenge. Key issues we solved:

AudioWorklet vs ScriptProcessor — AudioWorklet runs off the main thread for better performance. We use it as the primary approach with a ScriptProcessor fallback for broader compatibility.
Sample Rate Mismatch — Requesting 16kHz from the browser doesn't guarantee it. We added runtime resampling in the AudioWorklet to ensure Gemini always receives 16kHz PCM regardless of the device's native sample rate.
Barge-in Handling — When the user interrupts Luna mid-speech, we immediately stop audio playback, clear the queue, and let the new response stream through. We also suppress mic forwarding while the model is speaking to prevent speaker echo from causing false interruptions.
Receive Loop Re-entry — We discovered that the Live API's receive() generator completes after each model turn. The fix is to re-enter it in a while True loop for multi-turn conversations.

What I Learned

The Gemini Live API is remarkably capable — bidirectional audio + video + function calling in a single streaming session opens up experiences that weren't possible before.
Grounding matters more for educational AI — users trust a tutor implicitly. Teaching a wrong translation erodes that trust fast, so explicit anti-hallucination prompting is essential.
Imagen 3 adds a visual dimension — generated flashcard illustrations make vocabulary tangible and give learners something to revisit later.
Cloud Run with session affinity works well for WebSocket-based apps — the session affinity flag ensures long-lived WebSocket connections stick to the same instance. One thing to watch: the in-memory session cache works perfectly with sticky sessions, but if you ever scale to multiple instances without affinity, you'd need to handle cache coherence with Firestore.
The Live API's receive() generator ending per turn was the most subtle bug — it looked like sessions were dropping after one exchange until we figured out the re-entry pattern.

Try It Yourself

The code is open source: github.com/kumarsparkz/lingualive

git clone https://github.com/kumarsparkz/lingualive.git
cd lingualive
pip install -r requirements.txt
gcloud auth application-default login
python -m app.main

Or deploy to Cloud Run with the automated script:

export GCP_PROJECT_ID=your-project-id
./deploy.sh

Built for the Gemini Live Agent Challenge 2026. #GeminiLiveAgentChallenge