Forem: Akshat Jain

Beyond Grep and Vectors: Reimagining Code Retrieval for AI Agents

Akshat Jain — Mon, 27 Oct 2025 13:48:56 +0000

Not long ago, the idea of an AI assistant refactoring an entire application felt like a distant future. Today, that future is arriving, driven by language models that can use tools to execute complex tasks. However, a critical lesson has emerged from the first wave of agentic systems: even the most advanced model is only as effective as the context it is given.

The core challenge is not the agent's reasoning ability but its access to information. When an AI coding agent fails, it's often because we have fed it irrelevant, incomplete, or outdated code snippets. The shift from copilot-style autocompletion to autonomous agents isn't incremental—it's a phase change in how code touches code. And our retrieval layer hasn't caught up. It's time to rebuild our approach to retrieval from the ground up.

The Friction Point: When Legacy Search Meets Agentic Workloads

Consider a common scenario: you ask a coding agent, "Where is our login logic actually rate-limited?" The response you get reveals the limitations of our current tools.

A grep-based search dumps pages of literal matches—unrelated constants, comments in test files, and deprecated code.
A semantic or vector search returns "things that are like rate limits," surfacing conceptually similar but functionally incorrect parts of the codebase.

You paste these fragmented results into the agent's context window. The model generates a confident-sounding response, but the subsequent continuous integration (CI) pipeline disagrees. The problem wasn't the model; it was the quality of the information we fed it.

Here's a simple test: ask your current setup to "find where we throttle login attempts and increase the backoff by 50%." Does it return a surgical package or a scavenger hunt? The answer reveals everything about whether your retrieval system is ready for agents.

Why Old Search Habits Fail in the Agentic Era

Grep was a miracle when codebases fit in memory. Vector search unlocked semantic understanding we never had before. But both were designed for human-in-the-loop workflows, where tolerance for noise is high and iteration is slow.

Search tools built for humans operate on the assumption of human pacing. A developer might issue one or two queries, skim the results, and use their own intuition to synthesize an answer. Agentic workflows are fundamentally different.

Volume and Speed: An agent fires off dozens of micro-queries in seconds as it explores a codebase.
Precision over Volume: It requires just enough context to perform a specific action, not an exhaustive list of every possible match.
Verifiability: It must be able to demonstrate why a particular code snippet is relevant to the immediate task.

If the retrieval layer doesn't respect these requirements, everything downstream becomes fragile and unreliable.

The Limitations of Our Toolkit

Our standard tools, grep and vector search, were designed for a different era and create hidden costs when applied to agentic systems.

Grep: The Literal Search

Grep is excellent for finding exact string matches. If you already know the precise function or variable name you're looking for, it's unparalleled. However, for the exploratory tasks common in agentic work, its limitations become clear. It has no understanding of indirection or semantic meaning, and it often returns large, noisy blocks of code that pollute the context window and degrade reasoning.

Vector Search: The Semantic Search

Vector search excels at finding "things like this," making it a powerful tool for conceptual exploration. Yet, this same fuzziness becomes a liability when surgical precision is required. It can easily surface lookalike functions while missing the one critical implementation that needs to be changed. Snippets often arrive decontextualized, shorn from their callers, tests, or configuration files. Furthermore, its reliance on embeddings means it is perpetually at risk of operating on a stale map of a rapidly evolving repository.

These approaches create downstream "taxes" in the form of latency from bloated context windows, fragility as minor code changes break brittle heuristics, and a fundamental lack of explainability.

The Context Window Illusion

You might think: just give the agent the entire codebase. After all, aren't context windows growing exponentially? But context windows aren't free—they're quadratic in cost and linear in confusion. More isn't better; relevant is better. The real win isn't cramming more in; it's delivering exactly what's needed, exactly when it's needed.

Principles for Agent-Ready Retrieval

To build reliable agents, we need a new retrieval paradigm guided by a set of practical principles. The goal is no longer to return the most hits, but the most complete and actionable context.

Return Whole Behaviors: Instead of fragmented lines, retrieval should provide complete, edit-safe units, such as an entire function, class, or API handler.
Preserve Adjacency: Code should be delivered with its immediate neighbors—the callers, tests, and configuration files that are essential for making a safe and effective change.
Aim for Less, But Complete: Two precise, context-aware snippets are exponentially more valuable than twenty fuzzy matches.
Stay Fresh by Default: The retrieval system must treat recent changes as a primary signal for relevance, not as an afterthought.
Explain the Relevance: Every item returned should be accompanied by a justification for why it was selected in response to the specific query, right now.
Operate in Loops: Retrieval should be an interactive process that helps the agent propose, get feedback, and narrow its focus, rather than a one-shot "dump and pray" operation.

A Simple Litmus Test

Remember that test from earlier? "Find where we throttle login attempts and increase the backoff by 50%."

Does the system return the rate-limiting function, its direct call site, its configuration, and its unit tests as a single, cohesive package? Or does it return a list of keyword hits and semantic lookalikes? The difference in output will directly correlate to how quickly and safely the agent can propose a valid change.

Building the Engine for the Agentic Era

Grep isn't flawed, and neither are vectors. They are simply tools from a world where a human was responsible for stitching the context together. The next generation of AI agents requires a retrieval engine that does the stitching first, enabling the agent to land the correct fix on the first try.

This isn't a hypothetical exercise. At Vyazen, we're building retrieval infrastructure that treats these principles as requirements, not aspirations. Our approach is founded on delivering complete, fresh, and verifiable context so that your agents can ship code, not just suggestions.

If you're wrestling with the same questions—if this challenge resonates with you—we'd love to learn from your toughest use cases.

To share a story where an agent missed the mark, please reach out.
To try our focused beta, visit us at https://vyazen.dev
For direct inquiries, you can email us at akshat@vyazen.dev

The Solution Wasn’t Smarter Prompts, It Was Better Context

Akshat Jain — Wed, 15 Oct 2025 04:04:45 +0000

How I spent 8+ hours debugging a "compatible" integration—and solved it in 15 minutes with the right context

It was 2 AM, and I was staring at a spinning loader that refused to stop. The UI was frozen, the stream had died silently, and I had no idea why.

I was upgrading our backend to use Mastra—a TypeScript agent framework for building AI applications with agentic workflows—to work with our existing frontend that consumed Vercel AI SDK v5 streams. On paper, this should have been straightforward. Mastra advertised compatibility with Vercel v5, and our React frontend was already configured to consume UIMessage streams over Server-Sent Events (SSE).

But the first test run? Complete failure.

The Setup: What I Was Building

Let me give you some context. We were building a production application with:

Frontend: React app using Vercel AI SDK's useChat hook to consume streaming AI responses
Backend: NestJS running on Express (not Next.js)
The Upgrade: Integrating Mastra's powerful agentic workflows to replace our simpler LLM calls

Mastra is a TypeScript framework from the team behind Gatsby that lets you build AI agents with workflows, memory, and tool selection. It's designed for production use and can deploy anywhere—not just Next.js.

Vercel AI SDK is the TypeScript toolkit for building AI applications, providing standardized APIs for streaming text, structured objects, and building chat interfaces across multiple model providers.

Both claimed compatibility with each other. Both had great documentation. This should have worked.

The Three Non-Negotiables

I had three constraints I couldn't compromise on:

Frontend Contract: Our production UI consumed Vercel AI SDK v5 UIMessage streams over SSE. Rewiring the entire interface wasn't an option—we'd break existing user experiences.
Backend Ambition: Mastra's agentic workflows were the whole point of this upgrade. We needed the agent orchestration, memory management, and tool selection capabilities.
Server Infrastructure: Our application stack was NestJS on Express. Adopting Next.js solely to support streaming wasn't viable—we had existing middleware, authentication, and infrastructure that couldn't be rewritten.

These weren't preferences. They were hard boundaries.

First Attempts: When "Compatible" Isn't Enough

I started with what seemed obvious:

// In my NestJS controller
const output = await agent.stream(body.messages, { format: 'aisdk' });
// Now what? How do I pipe this to Express response?

Every example assumed Next.js with Web Response objects. I was using Express with Node.js ServerResponse. Fundamentally different APIs.

I tried everything:

Direct piping → TypeError: res.pipeTo is not a function
Manual SSE headers → streams died silently
Reading Mastra's source → found toUIMessageStream() but still couldn't connect it to Express

Hours passed. Nothing worked.

The Documentation Trap

Here's what was maddening: every answer I got from AI assistants was technically correct.

"Yes, Mastra is compatible with Vercel v5." ✓

"Yes, you can stream UIMessages." ✓

"Yes, you need SSE headers." ✓

But none of this solved my actual problem. Why? Because all the examples, all the documentation, all the Stack Overflow answers assumed a Next.js environment with Web Response objects.

Then came the really bad advice. One AI assistant, after analyzing my problem, confidently told me:

"You need to build a custom parser to convert Mastra's output format to Vercel AI SDK's UIMessage format. Here's a 200-line implementation..."

I spent hours implementing and debugging this custom parser. It was complex, error-prone, and felt wrong. I kept thinking: "Surely someone has solved this before?"

Spoiler: They had. Mastra already had the conversion logic built-in (toUIMessageStream). I was building something that already existed because the AI assistant couldn't see across both codebases to know it was there.

But I didn't know that yet. I didn't even know what the real problem was. Was it:

The data format? (Protocol issue)
The streaming mechanism? (Transport issue)
My implementation? (Code issue)
Some Next.js-specific magic I was missing?

I had no visibility into what was actually broken. I was debugging blind.

I started questioning if I was missing something obvious. Was I the only person trying to use Mastra with Express? Was I supposed to just rewrite everything in Next.js?

The Turning Point: A Different Approach

After 8+ hours of debugging (including wasting time on that unnecessary custom parser), I realized something: I wasn't asking the wrong questions—I was asking them with the wrong context.

Every AI assistant I'd consulted had partial knowledge:

One knew about Mastra's API surface
Another knew about Vercel's streaming format
Another knew about Express SSE patterns
One even suggested building a custom parser (that already existed!)

But none of them could see the complete picture across all three codebases simultaneously. They couldn't tell me:

What Mastra already provided internally
What Vercel already had for similar use cases
Where exactly the gap was in my specific setup

I realized the problem: the issue wasn't in my code—it was somewhere in how Mastra and Vercel AI SDK were supposed to work together. To debug this, I needed to look at the actual source code of both frameworks, not just their documentation.

But reading through two large codebases manually? That would take days.

I needed a different approach. I used Vyazen to index all three repositories:

Our application code (NestJS/Express)
Mastra's source code (the agent framework)
Vercel AI SDK's source code (the streaming protocol)

Now I could ask an AI agent questions with complete context across all three codebases. Not documentation. Not examples. The actual source code.

With this unified context, I asked a precise question:

"Identify where Mastra converts its internal stream into Vercel v5 UIMessage chunks and where that output is written to a Node/Express response."

The Breakthrough: Seeing the Seam

With access to all three codebases, the AI agent could now give me a precise answer:

File: packages/core/src/stream/aisdk/v5/output.ts in Mastra
Function: toUIMessageStream() - converts Mastra's stream to Vercel UIMessage format
Transport for Next.js: Exists in Mastra for Web Response objects
Transport for Express: Does not exist

This was the gap. Within minutes, I could see what I didn't need to build:

✅ Mastra provided: A function toUIMessageStream() to convert agent streams into Vercel-compatible UIMessage chunks (I didn't need my custom parser!)

✅ Mastra provided: A transport layer for Next.js/Web Response to stream these chunks to the browser

❌ Mastra did NOT provide: An equivalent Node.js/Express transport for ServerResponse

✅ Vercel AI SDK included: Helpers implementing the required SSE semantics—headers, framing, and terminal signals

This was the revelation. I finally understood what the actual problem was:

The problem wasn't the protocol. The protocol was fine. Both sides spoke the same language (UIMessage format).

The problem was the transport layer. There was no mechanism to correctly pipe the UIMessage stream over Express/NestJS with the expected SSE semantics.

I stared at the screen. All those hours building a parser... for nothing. The gap was so obvious once I could see it.

I had been debugging the wrong thing. I had been building parsers I didn't need. I had been trying to fix the protocol when the protocol was already working.

All I needed was a simple transport adapter—about 50 lines of code.

I felt stupid and relieved at the same time.

This is what I needed:

Content-Type: text/event-stream
Cache-Control: no-cache, no-transform
Connection: keep-alive
X-Accel-Buffering: no

data: {"type":"text","content":"Hello"}\n\n
data: {"type":"text","content":" world"}\n\n
data: [DONE]\n\n

Each chunk needed to be framed as data: {JSON}\n\n, concluding with a data: [DONE]\n\n sentinel to signal completion.

The Fix: 15 Minutes of Coding

Once I understood the boundary, the solution was simple:

// The key insight: use Mastra's existing converter + add Express transport
export async function pipeUIMessageStreamToResponse({ res, output }) {
  const uiStream = output.toUIMessageStream(); // Mastra already had this!

  // Set SSE headers
  res.setHeader("Content-Type", "text/event-stream");
  res.setHeader("Cache-Control", "no-cache, no-transform");

  // Pipe with proper SSE framing: data: {JSON}\n\n
  const sseStream = uiStream.pipeThrough(jsonToSseTransform());
  // ... stream to Express response
}

My controller became one line:

@Post('chat')
async chat(@Body() body: any, @Res() res: Response) {
  const output = await agent.stream(body.messages, { format: 'aisdk' });
  await pipeUIMessageStreamToResponse({ res, output }); // Done.
}

I refreshed the page. Messages streamed smoothly.

It worked. 15 minutes of coding vs 8+ hours of debugging.

See the full implementation in PR #8720

Giving Back: Upstreaming the Solution

I realized this couldn't just be my problem. If I hit this issue, others would too. So I decided to contribute it back to Mastra.

I opened PR #8720 with the helper function, comprehensive tests, and documentation. The PR is in review and will help other developers avoid the same 8-hour debugging session.

Why Precise Context Changed Everything

Let me show you the difference context makes:

Without source code context (what other AI assistants told me):

"Mastra supports Vercel v5 format" ✓ (technically true, but unhelpful)
"You need SSE headers" ✓ (I already knew this)
"Build a custom parser to convert formats" ✗ (completely wrong—it already existed!)

These answers were based on documentation and general knowledge. The AI couldn't see what was actually in the code.

With source code context (using Vyazen):

"Mastra has toUIMessageStream() in packages/core/src/stream/aisdk/v5/output.ts" ✓
"Mastra has transport for Next.js Web Response, but not for Node.js ServerResponse" ✓
"You need to create a ~50 line adapter, reusing Vercel's SSE framing pattern" ✓

These answers were based on the actual source code. The AI could see exactly what existed and what was missing.

The Key Insight

AI agents aren't dumb. They're just working with incomplete information.

When you ask an AI agent a question with only documentation:

It guesses based on common patterns
It suggests rebuilding things that already exist
It can't tell you what's in the actual source code

But when you give an AI agent access to the actual source code across multiple repositories:

It can trace exact function calls and data flows
It can identify what exists vs what's missing
It can point you to specific files and functions
It can suggest minimal solutions that reuse existing code

This is especially critical when integrating third-party frameworks. The issue isn't in your code—it's in understanding how the frameworks work internally. And for that, you need source code context, not just documentation.

When You Need Source Code Context

Use this approach when:

1. Integrating third-party frameworks - Documentation tells you what's possible, source code tells you what's actually implemented.

2. Debugging "compatible but broken" issues - The problem is usually at the boundary between systems, which requires seeing both codebases.

3. Working across multiple repositories - Microservices, internal libraries, SDKs—understanding how they connect requires unified context.

4. Avoiding duplicate work - Before building something, check if it already exists in another repo.

The pattern is simple: Give your AI agent precise context, get precise answers.

Conclusion

The resolution to my 8-hour debugging nightmare didn't come from better prompts or a smarter AI. It came from better context.

Once I could see across all three repositories, the invisible seam became visible. And once visible, it was solvable:

✅ 50 lines of code
✅ Zero breaking changes
✅ 15 minutes to implement
✅ Production-ready

Context > cleverness, every time.

Try It Yourself

If you're working across multiple repositories—microservices, SDKs, internal libraries—Vyazen helps you give AI agents the source code context they need to provide accurate answers.

And remember: when you’re stuck on a “compatible but broken” integration, the answer isn’t a smarter AI. It’s giving your AI the complete context.