Forem: Omkar tripathi

Kizuna 絆 the Gemini-web to Local Environment Bridge

Omkar tripathi — Wed, 04 Mar 2026 11:05:10 +0000

This is a submission for the Built with Google Gemini: Writing Challenge

╔══════════════════════════════════════════════════════════════╗
║ ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ ║
║                                                              ║
║   ██╗  ██╗ ██╗ ███████╗ ██╗   ██╗ ███╗   ██╗  █████╗         ║
║   ██║ ██╔╝ ██║ ╚══███╔╝ ██║   ██║ ████╗  ██║ ██╔══██╗        ║
║   █████╔╝  ██║   ███╔╝  ██║   ██║ ██╔██╗ ██║ ███████║        ║
║   ██╔═██╗  ██║  ███╔╝   ██║   ██║ ██║╚██╗██║ ██╔══██║        ║
║   ██║  ██╗ ██║ ███████╗ ╚██████╔╝ ██║ ╚████║ ██║  ██║        ║
║   ╚═╝  ╚═╝ ╚═╝ ╚══════╝  ╚═════╝  ╚═╝  ╚═══╝ ╚═╝  ╚═╝        ║
║                                                              ║
║          [ The Gemini to Local Environment Bridge ]          ║
║ ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ ║
╚══════════════════════════════════════════════════════════════╝

Integrating AI into our daily coding workflows is a recurring discussion. The discourse focuses heavily on context windows, reasoning models, and whether AI will replace or augment engineers. But I think centering this debate purely on the models themselves is reductive.

The bigger question to me is the environment. How do we actually connect these floating, cloud-based brains to our physical work?

For a long time, my workflow was agonizing. I was stuck in what I call "Copy-Paste Torture." I would give the AI context, copy a file from my IDE, paste it into Google Gemini, ask for a change, copy the resulting code, paste it back, run it, hit an error, and start over.

Gemini was incredibly capable, but the friction of constant context-switching was killing the momentum. A brain in a browser jar has no hands.

So, I decided to build it a nervous system. I called it Kizuna (絆 - meaning bond or connection).

What I Built with Google Gemini

Kizuna is an end-to-end toolchain that transforms the standard Google Gemini web interface into a localized, agentic IDE companion. I didn't want to just build a wrapper; I wanted to create a system that felt intentional, granting the web-based LLM the ability to read, search, and safely patch a local codebase without compromising my machine.

To achieve this, I broke the system down into three foundational pillars:

1. The Engine (Local Daemon) ⚙️

Code is where software lives, but you can't just give an AI raw shell access—that's a massive security risk. I built a local backend service to act as a secure sandbox. It translates strict JSON intents from the browser into optimized local file reads, writes, and Git operations.

╔══════════════════════════════════════════════════════════════╗
║ ░░░░░░░░░░░░░░ THE PATH JAIL (SANDBOX) ░░░░░░░░░░░░░░░░░░░░░ ║
╠══════════════════════════════════════════════════════════════╣
║                                                              ║
║  ╭── [ ALLOWED WORKSPACE ] ──────────────────────────────╮   ║
║  │                                                       │   ║
║  │   📂 /workspace/my-app/     (Sandbox Root)            │   ║
║  │    ├── 📄 src/main.js             [ ✓ ] OK            │   ║
║  │    └── 📄 package.json            [ ✓ ] OK            │   ║
║  │                                                       │   ║
║  ╰───────────────────────────────────────────────────────╯   ║
║                                                              ║
║  ╭── [ BLOCKED EXTERNALS ] ──────────────────────────────╮   ║
║  │                                                       │   ║
║  │   🚨 /etc/passwd                  [ ⨉ ] BLOCKED: 403  │   ║
║  │                                                       │   ║
║  ╰───────────────────────────────────────────────────────╯   ║
║                                                              ║
║  ▪ Engine drops all path traversal requests (../)            ║
║  ▪ Symlinks are resolved prior to boundary validation        ║
╚══════════════════════════════════════════════════════════════╝

The Engine resolves all symlinks before executing. If an AI hallucinates a path traversal, the system structurally drops the request.

2. The Bridge (Chrome Extension)

A web extension that sits quietly on the right side of the Gemini window. Building this was an architectural challenge. Because Gemini streams text, scraping the DOM naively crashes the parser with incomplete JSON. I had to build a MutationObserver that waits for the absolute "completion" state of the chat UI before parsing. It captures the AI's outputs, relays them to my local engine via a Background Worker (to bypass strict browser CORS restrictions), and injects the results back into the chat.

3. The Protocol (Documentation & Rules)

A deterministic system prompt fed to Gemini at the start of every chat. This acts as the "Operating Manual." LLMs are chaotic; they need boundaries. The protocol forces Gemini to use a strict JSON schema instead of conversational markdown.

Demo

Here is a look at the complete, fully-boxed architectural flow of Kizuna:

╔══════════════════════════════════════════════════════════════╗
║ ▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒ BROWSER ENVIRONMENT ▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒ ║
║                                                              ║
║  ┌────────────────┐   [ DOM ]     ┌───────────────────────┐  ║
║  │  🧠 Gemini UI  │ ◀───────────▶ │  🧩 Chrome Extension   │  ║
║  └────────────────┘               └───────────────────────┘  ║
║          ▲                                    │              ║
║          │ (Injects UI Data)                  │ (WebSockets) ║
╠══════════╪════════════════════════════════════╪══════════════╣
║          │                                    ▼              ║
║  ┌───────┴────────────────────────────────────┴───────────┐  ║
║  │  💻 KIZUNA ENGINE (Local Daemon :8080)                 │  ║
║  │                                                        │  ║
║  │  ╭─────────────────╮          ╭─────────────────────╮  │  ║
║  │  │ 🛡️ Path Sandbox │ ───────▶ │ 🛠️ Tool Dispatcher   │  │  ║
║  │  ╰─────────────────╯          ╰───┬──────┬──────┬───╯  │  ║
║  │                                   │      │      │      │  ║
║  │                                [Read] [Write] [Git]    │  ║
║  │                                   │      │      │      │  ║
║  │                                 ╭─┴──────┴──────┴─╮    │  ║
║  │                                 │ 📂 Local Storage│    │  ║
║  │                                 ╰─────────────────╯    │  ║
║  └────────────────────────────────────────────────────────┘  ║
║                                                              ║
║ ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓ LOCAL ENVIRONMENT ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓ ║
╚══════════════════════════════════════════════════════════════╝

When I ask Gemini to "Update the database connection," it doesn't write me a markdown tutorial. It outputs its intent directly:

{
  "action": "patch_file",
  "path": "src/config.js",
  "search": "localhost",
  "replace": "process.env.DB_HOST"
}

The extension picks this up, the engine verifies the path hasn't escaped the workspace, the file is safely patched, and the success message is injected directly back into the Gemini chat or I pasted it manually sometimes.

What I Learned

Through this work, I learned to question the constraints of LLMs first, not treat them as assumptions.

Abstract Syntax Trees (ASTs) vs. Raw Text

The naive approach to the context window is to just send the AI the whole file. But why send a 1,000-line file when the AI just needs to know what the file does? Sending raw text is a massive waste of tokens and scatters the model's focus.

I built a tool into the engine that parses code into an Abstract Syntax Tree (AST). Instead of returning raw code, it strips the implementation logic and returns a "Skeleton" of class names, imports, and function signatures.

╔══════════════════════════════════════════════════════════════╗
║ ░░░░░░░░░░░░░░░ AST PARSER ROUTING LOGIC ░░░░░░░░░░░░░░░░░░░ ║
╠══════════════════════════════════════════════════════════════╣
║                 [ 📄 Gemini Requests File ]                  ║
║                              │                               ║
║                              ▼                               ║
║                  ╭───────────────────────╮                   ║
║                  │  Is File > 300 Lines? │                   ║
║                  ╰───────────┬───────────╯                   ║
║                              │                               ║
║            ┌───( YES )───────┴───────( NO )────┐             ║
║            │                                   │             ║
║            ▼                                   ▼             ║
║    ╭───────────────╮                   ╭───────────────╮     ║
║    │  🌳 AST Parse │                   │  📝 Raw Read   │    ║
║    ╰───────┬───────╯                   ╰───────┬───────╯     ║
║            │                                   │             ║
║    ╭───────▼───────╮                   ╭───────▼───────╮     ║
║    │ ▪ Classes     │                   │ ▪ Full Logic  │     ║
║    │ ▪ Signatures  │                   │ ▪ Implement.  │     ║
║    │ ▪ Docstrings  │                   │ ▪ Variables   │     ║
║    ╰───────┬───────╯                   ╰───────┬───────╯     ║
║            │                                   │             ║
║    █▓▒░ Token Cost: ~5%                Token Cost: 100% ░▒▓█ ║
╚══════════════════════════════════════════════════════════════╝

To make this work, I learned to rely on highly descriptive function names and docstrings. With good docstrings, the AI could understand the codebase's intent just by pinging the AST, mapping out the architecture flawlessly.

Sandboxing and Heuristics

I actually used Gemini itself to help create a dataset filtering hundreds of developer CLI commands into "safe" and "harmful" categories. This allowed me to build heuristic judgments into the local sandbox, entirely disabling shell: true evaluations in Node.

The Ultimate Safety Net (Auto-Commits)

You can never trust an AI blindly. I didn't want it making untracked changes. I built a feature into the engine that automatically runs a local git commit after every (almost) single file change and kept git files backed up when messing around with git commits and refactoring. If the AI messed up, I had an instant, local undo button. Git became the AI's eyes and my safety net.

Google Gemini Feedback

I think of designing with AI in two stages: when the system hums, and when the material fights back.

✅ The 70% Magic

When it worked, it felt like magic. About 70% of the time, the system operated perfectly. Reading code worked phenomenally well. Gemini 2.5 Pro could absorb the AST skeletons, navigate the directory, and reason about system design with striking clarity.

⚠️ When the material fights back (The 30% Chaos)

The remaining 30% of the time was a battle against the realities of the medium: hallucinations, syntax errors, and context degradation.

1. The Writing Problem: Surgical Diffs vs. End-to-End

While reading was elegant, writing was clumsy. Gemini 2.5 Pro struggled heavily with surgical, line-by-line insertions. Standard diff patching is notoriously flaky with LLMs—they hallucinate line numbers or forget indentation. When I asked for complex changes across a large file, it would misplace the code entirely.

The Workaround: The solution wasn't to push harder; it was to change the abstraction. I stopped asking for line manipulations. Instead, I updated the protocol to force verbatim block patching or end-to-end rewrites of the entire file. The search block had to match the local file exactly, down to the space, or the Engine rejected it.

2. Amnesia and The Autonomous Self-Healing Loop

Sometimes, the AI would just forget its own rules. It would hallucinate bad JSON syntax (trailing commas, unescaped quotes) or start outputting raw markdown. If I fixed it manually in my IDE, the sync between the "brain" and the "hands" broke.

The Workaround: I built an Autonomous Self-Correction Loop. If the Chrome extension failed to JSON.parse() the output, it automatically generated an error payload and injected it directly back into the chat. Gemini would immediately apologize, fix the syntax, and re-emit the tool call without me typing a single word and if this was too often then I inserted Protocol docs again.

╔══════════════════════════════════════════════════════════════╗
║ ░░░░░░░░░░░░░ AUTONOMOUS SELF-HEALING LOOP ░░░░░░░░░░░░░░░░░ ║
╠══════════════════════════════════════════════════════════════╣
║                                                              ║
║   [🧠 Gemini UI]         [🧩 Extension]          [💻 Engine] ║
║         │                      │                      │      ║
║         │ 1. Invalid JSON      │                      │      ║
║         │ ───────────────────▶ │                      │      ║
║         │                      │ ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓   │      ║
║         │ 2. Catch Error       │ ▓ JSON Parse Fails ▓ │      ║
║         │ ◀─────────────────── │ ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓   │      ║
║         │                      │                      │      ║
║         │ 3. Fixes Syntax      │                      │      ║
║         │ ───────────────────▶ │                      │      ║
║         │                      │ 4. Valid Request     │      ║
║         │                      │ ━━━━━━━━━━━━━━━━━━━▶ │      ║
║         │                      │                      │      ║
║         │                      │  ╭────────────────╮  │      ║
║         │                      │  │ ⚙️ Validates   │  │      ║
║         │                      │  │ ⚡ Executes    │  │      ║
║         │                      │  ╰────────────────╯  │      ║
║         │                      │ 5. Returns Data      │      ║
║         │                      │ ◀━━━━━━━━━━━━━━━━━━━ │      ║
║         │ 6. Inject Status     │                      │      ║
║         │ ◀━━━━━━━━━━━━━━━━━━━ │                      │      ║
║                                                              ║
╚══════════════════════════════════════════════════════════════╝

3. Context Degradation (The Final Boss)

This compounds when the conversation gets too long. When dealing with detailed codebases, the LLM eventually loses its understanding of the current situation. The context window fills up, attention mechanisms drift, and it makes bad decisions.

The Master Workaround: To tackle this, I created a specific "Summarization Prompt" within the Chrome extension. When a chat got too long and the AI started losing the plot, I didn't try to salvage it.

I would run the prompt, instructing Gemini to condense its current understanding of the architecture, the problem, and our progress into one cohesive document. I would then open a brand new chat, paste that summary along with my base JSON rules, and resume.

╔══════════════════════════════════════════════════════════════╗
║ ░░░░░░░░░ CONTEXT HANDOVER PROTOCOL (SUMMARIZATION) ░░░░░░░░ ║
╠══════════════════════════════════════════════════════════════╣
║                                                              ║
║  [ ⏳ STAGE 1: Degradation ]                                 ║
║   │                                                          ║
║   ╰─▶ 💬 Long Chat Session ──▶ ⚠️ Hallucinations Begin       ║
║                                                              ║
║  [ 💉 STAGE 2: The Handoff ]                                 ║
║   │                                                          ║
║   ├─▶ 1. Inject "Summarization Prompt" via Extension         ║
║   ╰─▶ 2. Gemini Generates Cohesive 'State Document' 🗂️        ║
║                                                              ║
║  [ ✨ STAGE 3: Resurrection ]                                ║
║   │                                                          ║
║   ├─▶ 1. Close Degraded Session 🗑️                            ║
║   ├─▶ 2. Open Empty Chat Session 🆕                          ║
║   ├─▶ 3. Inject [ Base Rules + State Document ] 📥           ║
║   ╰─▶ 4. Fresh AI Instance with Perfect Context 🚀           ║
║                                                              ║
╚══════════════════════════════════════════════════════════════╝

It was like giving the AI a fresh cup of coffee and a perfect handover document.

Elevating the craft

The Kizuna system did not work flawlessly out of the box. But building it taught me that modern LLMs are not a pipeline; they are a search. They hallucinate, they forget, and they make errors.

But by building the right scaffolding—strict constraints, local safety nets, AST parsers, and clever workarounds like the self-healing loop and summarization prompt—you can harness them to do incredible things. It forced me to be intentional about how software is written, and it laid a profound foundation for the systems I want to build next.

GeminiChallenge #AI #WebDev #Productivity #Agents #Automation #SoftwareDesign

DialogueAI: Interactive Playground for assemblyai with automatic code generation

Omkar tripathi — Sat, 23 Nov 2024 13:47:18 +0000

This is a submission for the AssemblyAI Challenge: : Sophisticated Speech-to-text and No More Monkey Business.

What I Built : DialogueAI ( GITHUB )

I built DialogueAI, an interactive platform that leverages the powerful capabilities of AssemblyAI's sophisticated speech-to-text API and their LeMUR summarization model. The primary goal of this platform is to simplify the process for users who are new to these APIs, helping them overcome the steep learning curve typically associated with diving into new documentation.

Key Features of the Platform:

Interactive Playground: Users can explore and experiment with various API functionalities through an intuitive interface. Input boxes, selection options, model selection, and summary types are all easily adjustable.
Instant Results: With a single click, users can execute API calls and see the results immediately. This feature helps bridge the gap between learning and actual implementation.
Code Generation: For those who prefer to handle API calls manually, the platform generates the necessary code snippets, which can be directly run on their systems. This feature significantly reduces the time and effort required to understand and use the API.
Smart Summary Page: Similar to the main playground, this page offers various configuration options and examples to help users generate summaries of transcripts quickly. Users can also get the generated code to use by themselves.

By providing these features, the platform ensures that users can quickly and efficiently learn how to use AssemblyAI's APIs, reducing the frustration and time typically spent navigating complex documentation. This makes it an invaluable tool for developers and anyone looking to incorporate speech-to-text and summarization capabilities into their projects.

Journey

The inspiration for this platform came from my own experience when I first encountered AssemblyAI's API. I found it a bit confusing to get started with the documentation and the API usage. So, I set out to solve this problem not just for myself but for everyone else who might face the same challenge.

Tech Used

Frontend: React, TypeScript, Tailwind CSS
API: AssemblyAI Speech-to-Text, LeMUR LLM model summary API
Animations: Framer Motion

Working Features

Interactive Speech-to-Text Configurations:
- Users can easily configure and experiment with various settings.
- Single Click Run: Execute the configuration and see results immediately.
- Single Click Code Generation: Generates the code based on the configuration for users to use directly.

Configurations Available:

API Key
Speech Model
Word Boost
Profanity Filter
Audio Range
Audio Intelligence
Summary Model
Summary Type

Interactive Summary Generation with LeMUR:
- Users can generate summaries with various options and configurations.
- Single Click Run: Instantly generate summaries.
- Single Click Code Generation: Provides the code for generating summaries.

Configurations Available:

API Key
Summary Type (Basic, Custom)
Transcript ID
Model
Prompt
Custom Prompt
Max Output Tokens (Example Pre-coded)

In Development

Chat with the Transcript: Using LeMUR API to enable interactions with the generated transcript.
Interactive Quiz Generation: Generate quizzes based on the transcript.

Journey

So far, I've successfully addressed the initial problem statements for the speech-to-text API and LeMUR summary model. This project has been incredibly exciting to work on, pushing the boundaries of what can be done with API interactions and user interface design.

Looking ahead, I plan to expand the platform to include interactive playgrounds and code generation capabilities for real-time APIs and more sophisticated use cases of LeMUR. This will further streamline the learning and implementation process for developers and enhance the overall user experience.