<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Akshat Jain</title>
    <description>The latest articles on Forem by Akshat Jain (@akshat_ilen).</description>
    <link>https://forem.com/akshat_ilen</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3565730%2F99b76ff8-4dd3-4b89-a191-a15b4129dfa4.jpg</url>
      <title>Forem: Akshat Jain</title>
      <link>https://forem.com/akshat_ilen</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/akshat_ilen"/>
    <language>en</language>
    <item>
      <title>Beyond Grep and Vectors: Reimagining Code Retrieval for AI Agents</title>
      <dc:creator>Akshat Jain</dc:creator>
      <pubDate>Mon, 27 Oct 2025 13:48:56 +0000</pubDate>
      <link>https://forem.com/akshat_ilen/beyond-grep-and-vectors-reimagining-code-retrieval-for-ai-agents-4pb2</link>
      <guid>https://forem.com/akshat_ilen/beyond-grep-and-vectors-reimagining-code-retrieval-for-ai-agents-4pb2</guid>
      <description>&lt;p&gt;Not long ago, the idea of an AI assistant refactoring an entire application felt like a distant future. Today, that future is arriving, driven by language models that can use tools to execute complex tasks. However, a critical lesson has emerged from the first wave of agentic systems: even the most advanced model is only as effective as the context it is given.&lt;/p&gt;

&lt;p&gt;The core challenge is not the agent's reasoning ability but its access to information. When an AI coding agent fails, it's often because we have fed it irrelevant, incomplete, or outdated code snippets. The shift from copilot-style autocompletion to autonomous agents isn't incremental—it's a phase change in how code touches code. And our retrieval layer hasn't caught up. It's time to rebuild our approach to retrieval from the ground up.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Friction Point: When Legacy Search Meets Agentic Workloads
&lt;/h2&gt;

&lt;p&gt;Consider a common scenario: you ask a coding agent, "Where is our login logic actually rate-limited?" The response you get reveals the limitations of our current tools.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A &lt;code&gt;grep&lt;/code&gt;-based search dumps pages of literal matches—unrelated constants, comments in test files, and deprecated code.&lt;/li&gt;
&lt;li&gt;A semantic or vector search returns "things that are like rate limits," surfacing conceptually similar but functionally incorrect parts of the codebase.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You paste these fragmented results into the agent's context window. The model generates a confident-sounding response, but the subsequent continuous integration (CI) pipeline disagrees. The problem wasn't the model; it was the quality of the information we fed it.&lt;/p&gt;

&lt;p&gt;Here's a simple test: ask your current setup to "find where we throttle login attempts and increase the backoff by 50%." Does it return a surgical package or a scavenger hunt? The answer reveals everything about whether your retrieval system is ready for agents.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F79f4lbtcq9pfyy8sw1tv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F79f4lbtcq9pfyy8sw1tv.png" alt="Cartoon showing an AI agent confidently returning hundreds of irrelevant search results while a developer looks defeated" width="800" height="800"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Old Search Habits Fail in the Agentic Era
&lt;/h2&gt;

&lt;p&gt;Grep was a miracle when codebases fit in memory. Vector search unlocked semantic understanding we never had before. But both were designed for human-in-the-loop workflows, where tolerance for noise is high and iteration is slow. &lt;/p&gt;

&lt;p&gt;Search tools built for humans operate on the assumption of human pacing. A developer might issue one or two queries, skim the results, and use their own intuition to synthesize an answer. Agentic workflows are fundamentally different.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Volume and Speed:&lt;/strong&gt; An agent fires off dozens of micro-queries in seconds as it explores a codebase.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Precision over Volume:&lt;/strong&gt; It requires just enough context to perform a specific action, not an exhaustive list of every possible match.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Verifiability:&lt;/strong&gt; It must be able to demonstrate &lt;em&gt;why&lt;/em&gt; a particular code snippet is relevant to the immediate task.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If the retrieval layer doesn't respect these requirements, everything downstream becomes fragile and unreliable.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Limitations of Our Toolkit
&lt;/h2&gt;

&lt;p&gt;Our standard tools, &lt;code&gt;grep&lt;/code&gt; and vector search, were designed for a different era and create hidden costs when applied to agentic systems.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsdn3ado7v5fcffhskuby.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsdn3ado7v5fcffhskuby.png" alt="Cartoon depicting grep and vector search as outdated tools being applied to modern agentic problems" width="800" height="800"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Grep: The Literal Search&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Grep is excellent for finding exact string matches. If you already know the precise function or variable name you're looking for, it's unparalleled. However, for the exploratory tasks common in agentic work, its limitations become clear. It has no understanding of indirection or semantic meaning, and it often returns large, noisy blocks of code that pollute the context window and degrade reasoning.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Vector Search: The Semantic Search&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Vector search excels at finding "things like this," making it a powerful tool for conceptual exploration. Yet, this same fuzziness becomes a liability when surgical precision is required. It can easily surface lookalike functions while missing the one critical implementation that needs to be changed. Snippets often arrive decontextualized, shorn from their callers, tests, or configuration files. Furthermore, its reliance on embeddings means it is perpetually at risk of operating on a stale map of a rapidly evolving repository.&lt;/p&gt;

&lt;p&gt;These approaches create downstream "taxes" in the form of latency from bloated context windows, fragility as minor code changes break brittle heuristics, and a fundamental lack of explainability.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Context Window Illusion
&lt;/h2&gt;

&lt;p&gt;You might think: just give the agent the entire codebase. After all, aren't context windows growing exponentially? But context windows aren't free—they're quadratic in cost and linear in confusion. More isn't better; &lt;strong&gt;relevant&lt;/strong&gt; is better. The real win isn't cramming more in; it's delivering exactly what's needed, exactly when it's needed.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb42110diu11pjcwdzpfw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb42110diu11pjcwdzpfw.png" alt="Cartoon showing the futility of cramming entire codebases into context windows instead of providing precise, relevant snippets" width="800" height="800"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Principles for Agent-Ready Retrieval
&lt;/h2&gt;

&lt;p&gt;To build reliable agents, we need a new retrieval paradigm guided by a set of practical principles. The goal is no longer to return the most hits, but the most complete and actionable context.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Return Whole Behaviors:&lt;/strong&gt; Instead of fragmented lines, retrieval should provide complete, edit-safe units, such as an entire function, class, or API handler.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Preserve Adjacency:&lt;/strong&gt; Code should be delivered with its immediate neighbors—the callers, tests, and configuration files that are essential for making a safe and effective change.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Aim for Less, But Complete:&lt;/strong&gt; Two precise, context-aware snippets are exponentially more valuable than twenty fuzzy matches.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Stay Fresh by Default:&lt;/strong&gt; The retrieval system must treat recent changes as a primary signal for relevance, not as an afterthought.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Explain the Relevance:&lt;/strong&gt; Every item returned should be accompanied by a justification for why it was selected in response to the specific query, right now.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Operate in Loops:&lt;/strong&gt; Retrieval should be an interactive process that helps the agent propose, get feedback, and narrow its focus, rather than a one-shot "dump and pray" operation.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  A Simple Litmus Test
&lt;/h2&gt;

&lt;p&gt;Remember that test from earlier? &lt;em&gt;"Find where we throttle login attempts and increase the backoff by 50%."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Does the system return the rate-limiting function, its direct call site, its configuration, and its unit tests as a single, cohesive package? Or does it return a list of keyword hits and semantic lookalikes? The difference in output will directly correlate to how quickly and safely the agent can propose a valid change.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fva3pmjp8uglf5bcgo5lq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fva3pmjp8uglf5bcgo5lq.png" alt="Cartoon showing the futility of cramming entire codebases into context windows instead of providing precise, relevant snippets" width="800" height="800"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Building the Engine for the Agentic Era
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;Grep&lt;/code&gt; isn't flawed, and neither are vectors. They are simply tools from a world where a human was responsible for stitching the context together. The next generation of AI agents requires a retrieval engine that does the stitching first, enabling the agent to land the correct fix on the first try.&lt;/p&gt;

&lt;p&gt;This isn't a hypothetical exercise. At Vyazen, we're building retrieval infrastructure that treats these principles as requirements, not aspirations. Our approach is founded on delivering complete, fresh, and verifiable context so that your agents can ship code, not just suggestions.&lt;/p&gt;

&lt;p&gt;If you're wrestling with the same questions—if this challenge resonates with you—we'd love to learn from your toughest use cases.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;To share a story where an agent missed the mark, please reach out.&lt;/li&gt;
&lt;li&gt;To try our focused beta, visit us at &lt;a href="https://vyazen.dev" rel="noopener noreferrer"&gt;https://vyazen.dev&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;For direct inquiries, you can email us at &lt;a href="//mailto:akshat@vyazen.dev"&gt;akshat@vyazen.dev&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>rag</category>
      <category>llm</category>
    </item>
    <item>
      <title>The Solution Wasn’t Smarter Prompts, It Was Better Context</title>
      <dc:creator>Akshat Jain</dc:creator>
      <pubDate>Wed, 15 Oct 2025 04:04:45 +0000</pubDate>
      <link>https://forem.com/akshat_ilen/the-solution-wasnt-smarter-prompts-it-was-better-context-2c4h</link>
      <guid>https://forem.com/akshat_ilen/the-solution-wasnt-smarter-prompts-it-was-better-context-2c4h</guid>
      <description>&lt;p&gt;&lt;em&gt;How I spent 8+ hours debugging a "compatible" integration—and solved it in 15 minutes with the right context&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;It was 2 AM, and I was staring at a spinning loader that refused to stop. The UI was frozen, the stream had died silently, and I had no idea why.&lt;/p&gt;

&lt;p&gt;I was upgrading our backend to use &lt;a href="https://mastra.ai/" rel="noopener noreferrer"&gt;Mastra&lt;/a&gt;—a TypeScript agent framework for building AI applications with agentic workflows—to work with our existing frontend that consumed &lt;a href="https://ai-sdk.dev/" rel="noopener noreferrer"&gt;Vercel AI SDK&lt;/a&gt; v5 streams. On paper, this should have been straightforward. Mastra advertised compatibility with Vercel v5, and our React frontend was already configured to consume &lt;code&gt;UIMessage&lt;/code&gt; streams over Server-Sent Events (SSE).&lt;/p&gt;

&lt;p&gt;But the first test run? Complete failure.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2gutsjob7esvy75myeuw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2gutsjob7esvy75myeuw.png" alt=" " width="800" height="800"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Setup: What I Was Building
&lt;/h2&gt;

&lt;p&gt;Let me give you some context. We were building a production application with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Frontend&lt;/strong&gt;: React app using Vercel AI SDK's &lt;code&gt;useChat&lt;/code&gt; hook to consume streaming AI responses&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Backend&lt;/strong&gt;: NestJS running on Express (not Next.js)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The Upgrade&lt;/strong&gt;: Integrating Mastra's powerful agentic workflows to replace our simpler LLM calls&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://mastra.ai/" rel="noopener noreferrer"&gt;&lt;strong&gt;Mastra&lt;/strong&gt;&lt;/a&gt; is a TypeScript framework from the team behind Gatsby that lets you build AI agents with workflows, memory, and tool selection. It's designed for production use and can deploy anywhere—not just Next.js.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://ai-sdk.dev/" rel="noopener noreferrer"&gt;&lt;strong&gt;Vercel AI SDK&lt;/strong&gt;&lt;/a&gt; is the TypeScript toolkit for building AI applications, providing standardized APIs for streaming text, structured objects, and building chat interfaces across multiple model providers.&lt;/p&gt;

&lt;p&gt;Both claimed compatibility with each other. Both had great documentation. This should have worked.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Three Non-Negotiables
&lt;/h2&gt;

&lt;p&gt;I had three constraints I couldn't compromise on:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Frontend Contract&lt;/strong&gt;: Our production UI consumed Vercel AI SDK v5 &lt;code&gt;UIMessage&lt;/code&gt; streams over SSE. Rewiring the entire interface wasn't an option—we'd break existing user experiences.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Backend Ambition&lt;/strong&gt;: Mastra's agentic workflows were the whole point of this upgrade. We needed the agent orchestration, memory management, and tool selection capabilities.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Server Infrastructure&lt;/strong&gt;: Our application stack was NestJS on Express. Adopting Next.js solely to support streaming wasn't viable—we had existing middleware, authentication, and infrastructure that couldn't be rewritten.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;These weren't preferences. They were hard boundaries.&lt;/p&gt;

&lt;h2&gt;
  
  
  First Attempts: When "Compatible" Isn't Enough
&lt;/h2&gt;

&lt;p&gt;I started with what seemed obvious:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// In my NestJS controller&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;output&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;format&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;aisdk&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="c1"&gt;// Now what? How do I pipe this to Express response?&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Every example assumed Next.js with Web &lt;code&gt;Response&lt;/code&gt; objects. I was using Express with Node.js &lt;code&gt;ServerResponse&lt;/code&gt;. Fundamentally different APIs.&lt;/p&gt;

&lt;p&gt;I tried everything:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Direct piping → &lt;code&gt;TypeError: res.pipeTo is not a function&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Manual SSE headers → streams died silently&lt;/li&gt;
&lt;li&gt;Reading Mastra's source → found &lt;code&gt;toUIMessageStream()&lt;/code&gt; but still couldn't connect it to Express&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Hours passed. Nothing worked.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fybtpn26f5ydm405hnunb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fybtpn26f5ydm405hnunb.png" alt=" " width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Documentation Trap
&lt;/h2&gt;

&lt;p&gt;Here's what was maddening: every answer I got from AI assistants was &lt;em&gt;technically correct&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;"Yes, Mastra is compatible with Vercel v5." ✓&lt;br&gt;&lt;br&gt;
"Yes, you can stream UIMessages." ✓&lt;br&gt;&lt;br&gt;
"Yes, you need SSE headers." ✓&lt;/p&gt;

&lt;p&gt;But none of this solved my actual problem. Why? Because all the examples, all the documentation, all the Stack Overflow answers assumed a Next.js environment with Web &lt;code&gt;Response&lt;/code&gt; objects.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Then came the really bad advice.&lt;/strong&gt; One AI assistant, after analyzing my problem, confidently told me:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"You need to build a custom parser to convert Mastra's output format to Vercel AI SDK's UIMessage format. Here's a 200-line implementation..."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I spent hours implementing and debugging this custom parser. It was complex, error-prone, and felt wrong. I kept thinking: "Surely someone has solved this before?"&lt;/p&gt;

&lt;p&gt;Spoiler: They had. Mastra &lt;strong&gt;already had&lt;/strong&gt; the conversion logic built-in (&lt;code&gt;toUIMessageStream&lt;/code&gt;). I was building something that already existed because the AI assistant couldn't see across both codebases to know it was there.&lt;/p&gt;

&lt;p&gt;But I didn't know that yet. I didn't even know what the real problem was. Was it:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The data format? (Protocol issue)&lt;/li&gt;
&lt;li&gt;The streaming mechanism? (Transport issue)&lt;/li&gt;
&lt;li&gt;My implementation? (Code issue)&lt;/li&gt;
&lt;li&gt;Some Next.js-specific magic I was missing?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I had no visibility into what was actually broken. I was debugging blind.&lt;/p&gt;

&lt;p&gt;I started questioning if I was missing something obvious. Was I the only person trying to use Mastra with Express? Was I supposed to just rewrite everything in Next.js?&lt;/p&gt;
&lt;h2&gt;
  
  
  The Turning Point: A Different Approach
&lt;/h2&gt;

&lt;p&gt;After 8+ hours of debugging (including wasting time on that unnecessary custom parser), I realized something: &lt;strong&gt;I wasn't asking the wrong questions—I was asking them with the wrong context.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Every AI assistant I'd consulted had partial knowledge:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;One knew about Mastra's API surface&lt;/li&gt;
&lt;li&gt;Another knew about Vercel's streaming format&lt;/li&gt;
&lt;li&gt;Another knew about Express SSE patterns&lt;/li&gt;
&lt;li&gt;One even suggested building a custom parser (that already existed!)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But none of them could see the complete picture across all three codebases simultaneously. They couldn't tell me:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What Mastra &lt;strong&gt;already provided&lt;/strong&gt; internally&lt;/li&gt;
&lt;li&gt;What Vercel &lt;strong&gt;already had&lt;/strong&gt; for similar use cases&lt;/li&gt;
&lt;li&gt;Where exactly the gap was in my specific setup&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I realized the problem: &lt;strong&gt;the issue wasn't in my code—it was somewhere in how Mastra and Vercel AI SDK were supposed to work together.&lt;/strong&gt; To debug this, I needed to look at the actual source code of both frameworks, not just their documentation.&lt;/p&gt;

&lt;p&gt;But reading through two large codebases manually? That would take days.&lt;/p&gt;

&lt;p&gt;I needed a different approach. I used &lt;a href="https://vyazen.dev" rel="noopener noreferrer"&gt;Vyazen&lt;/a&gt; to index all three repositories:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Our application code (NestJS/Express)&lt;/li&gt;
&lt;li&gt;Mastra's source code (the agent framework)&lt;/li&gt;
&lt;li&gt;Vercel AI SDK's source code (the streaming protocol)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Now I could ask an AI agent questions with &lt;strong&gt;complete context&lt;/strong&gt; across all three codebases. Not documentation. Not examples. The actual source code.&lt;/p&gt;

&lt;p&gt;With this unified context, I asked a precise question:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Identify where Mastra converts its internal stream into Vercel v5 UIMessage chunks and where that output is written to a Node/Express response."&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2&gt;
  
  
  The Breakthrough: Seeing the Seam
&lt;/h2&gt;

&lt;p&gt;With access to all three codebases, the AI agent could now give me a precise answer:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;File&lt;/strong&gt;: &lt;code&gt;packages/core/src/stream/aisdk/v5/output.ts&lt;/code&gt; in Mastra&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Function&lt;/strong&gt;: &lt;code&gt;toUIMessageStream()&lt;/code&gt; - converts Mastra's stream to Vercel UIMessage format&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Transport for Next.js&lt;/strong&gt;: Exists in Mastra for Web Response objects&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Transport for Express&lt;/strong&gt;: &lt;strong&gt;Does not exist&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This was the gap. Within minutes, I could see what I didn't need to build:&lt;/p&gt;

&lt;p&gt;✅ &lt;strong&gt;Mastra provided&lt;/strong&gt;: A function &lt;code&gt;toUIMessageStream()&lt;/code&gt; to convert agent streams into Vercel-compatible &lt;code&gt;UIMessage&lt;/code&gt; chunks (I didn't need my custom parser!)&lt;br&gt;&lt;br&gt;
✅ &lt;strong&gt;Mastra provided&lt;/strong&gt;: A transport layer for Next.js/Web &lt;code&gt;Response&lt;/code&gt; to stream these chunks to the browser&lt;br&gt;&lt;br&gt;
❌ &lt;strong&gt;Mastra did NOT provide&lt;/strong&gt;: An equivalent Node.js/Express transport for &lt;code&gt;ServerResponse&lt;/code&gt;&lt;br&gt;&lt;br&gt;
✅ &lt;strong&gt;Vercel AI SDK included&lt;/strong&gt;: Helpers implementing the required SSE semantics—headers, framing, and terminal signals&lt;/p&gt;

&lt;p&gt;This was the revelation. &lt;strong&gt;I finally understood what the actual problem was:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The problem wasn't the protocol. The protocol was fine. Both sides spoke the same language (&lt;code&gt;UIMessage&lt;/code&gt; format).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The problem was the transport layer.&lt;/strong&gt; There was no mechanism to correctly pipe the &lt;code&gt;UIMessage&lt;/code&gt; stream over Express/NestJS with the expected SSE semantics.&lt;/p&gt;

&lt;p&gt;I stared at the screen. All those hours building a parser... for nothing. The gap was so obvious once I could see it.&lt;/p&gt;

&lt;p&gt;I had been debugging the wrong thing. I had been building parsers I didn't need. I had been trying to fix the protocol when the protocol was already working.&lt;/p&gt;

&lt;p&gt;All I needed was a simple transport adapter—about 50 lines of code.&lt;/p&gt;

&lt;p&gt;I felt stupid and relieved at the same time.&lt;/p&gt;

&lt;p&gt;This is what I needed:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Content-Type: text/event-stream
Cache-Control: no-cache, no-transform
Connection: keep-alive
X-Accel-Buffering: no

data: {"type":"text","content":"Hello"}\n\n
data: {"type":"text","content":" world"}\n\n
data: [DONE]\n\n
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each chunk needed to be framed as &lt;code&gt;data: {JSON}\n\n&lt;/code&gt;, concluding with a &lt;code&gt;data: [DONE]\n\n&lt;/code&gt; sentinel to signal completion.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Fix: 15 Minutes of Coding
&lt;/h2&gt;

&lt;p&gt;Once I understood the boundary, the solution was simple:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// The key insight: use Mastra's existing converter + add Express transport&lt;/span&gt;
&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;pipeUIMessageStreamToResponse&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;output&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;uiStream&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;output&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;toUIMessageStream&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt; &lt;span class="c1"&gt;// Mastra already had this!&lt;/span&gt;

  &lt;span class="c1"&gt;// Set SSE headers&lt;/span&gt;
  &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;setHeader&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Content-Type&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;text/event-stream&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;setHeader&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Cache-Control&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;no-cache, no-transform&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="c1"&gt;// Pipe with proper SSE framing: data: {JSON}\n\n&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;sseStream&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;uiStream&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;pipeThrough&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;jsonToSseTransform&lt;/span&gt;&lt;span class="p"&gt;());&lt;/span&gt;
  &lt;span class="c1"&gt;// ... stream to Express response&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;My controller became one line:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="p"&gt;@&lt;/span&gt;&lt;span class="nd"&gt;Post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;chat&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="nf"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;(@&lt;/span&gt;&lt;span class="nd"&gt;Body&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;any&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;@&lt;/span&gt;&lt;span class="nd"&gt;Res&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;Response&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;output&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;format&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;aisdk&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;pipeUIMessageStreamToResponse&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;output&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt; &lt;span class="c1"&gt;// Done.&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I refreshed the page. Messages streamed smoothly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;It worked.&lt;/strong&gt; 15 minutes of coding vs 8+ hours of debugging.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/mastra-ai/mastra/pull/8720" rel="noopener noreferrer"&gt;See the full implementation in PR #8720&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Giving Back: Upstreaming the Solution
&lt;/h2&gt;

&lt;p&gt;I realized this couldn't just be my problem. If I hit this issue, others would too. So I decided to contribute it back to Mastra.&lt;/p&gt;

&lt;p&gt;I opened &lt;a href="https://github.com/mastra-ai/mastra/pull/8720" rel="noopener noreferrer"&gt;PR #8720&lt;/a&gt; with the helper function, comprehensive tests, and documentation. The PR is in review and will help other developers avoid the same 8-hour debugging session.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Precise Context Changed Everything
&lt;/h2&gt;

&lt;p&gt;Let me show you the difference context makes:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Without source code context (what other AI assistants told me):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"Mastra supports Vercel v5 format" ✓ (technically true, but unhelpful)&lt;/li&gt;
&lt;li&gt;"You need SSE headers" ✓ (I already knew this)&lt;/li&gt;
&lt;li&gt;"Build a custom parser to convert formats" ✗ (completely wrong—it already existed!)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These answers were based on documentation and general knowledge. The AI couldn't see what was actually in the code.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;With source code context (using Vyazen):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"Mastra has &lt;code&gt;toUIMessageStream()&lt;/code&gt; in &lt;code&gt;packages/core/src/stream/aisdk/v5/output.ts&lt;/code&gt;" ✓&lt;/li&gt;
&lt;li&gt;"Mastra has transport for Next.js Web Response, but not for Node.js ServerResponse" ✓&lt;/li&gt;
&lt;li&gt;"You need to create a ~50 line adapter, reusing Vercel's SSE framing pattern" ✓&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These answers were based on the actual source code. The AI could see exactly what existed and what was missing.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Key Insight
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;AI agents aren't dumb. They're just working with incomplete information.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;When you ask an AI agent a question with only documentation:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;It guesses based on common patterns&lt;/li&gt;
&lt;li&gt;It suggests rebuilding things that already exist&lt;/li&gt;
&lt;li&gt;It can't tell you what's in the actual source code&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But when you give an AI agent &lt;strong&gt;access to the actual source code&lt;/strong&gt; across multiple repositories:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;It can trace exact function calls and data flows&lt;/li&gt;
&lt;li&gt;It can identify what exists vs what's missing&lt;/li&gt;
&lt;li&gt;It can point you to specific files and functions&lt;/li&gt;
&lt;li&gt;It can suggest minimal solutions that reuse existing code&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is especially critical when integrating third-party frameworks. The issue isn't in your code—it's in understanding how the frameworks work internally. And for that, you need source code context, not just documentation.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8xe9ru38s6wrjso55llz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8xe9ru38s6wrjso55llz.png" alt=" " width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  When You Need Source Code Context
&lt;/h2&gt;

&lt;p&gt;Use this approach when:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Integrating third-party frameworks&lt;/strong&gt; - Documentation tells you what's possible, source code tells you what's actually implemented.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Debugging "compatible but broken" issues&lt;/strong&gt; - The problem is usually at the boundary between systems, which requires seeing both codebases.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Working across multiple repositories&lt;/strong&gt; - Microservices, internal libraries, SDKs—understanding how they connect requires unified context.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Avoiding duplicate work&lt;/strong&gt; - Before building something, check if it already exists in another repo.&lt;/p&gt;

&lt;p&gt;The pattern is simple: &lt;strong&gt;Give your AI agent precise context, get precise answers.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;The resolution to my 8-hour debugging nightmare didn't come from better prompts or a smarter AI. It came from &lt;strong&gt;better context&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Once I could see across all three repositories, the invisible seam became visible. And once visible, it was solvable:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;✅ 50 lines of code&lt;/li&gt;
&lt;li&gt;✅ Zero breaking changes
&lt;/li&gt;
&lt;li&gt;✅ 15 minutes to implement&lt;/li&gt;
&lt;li&gt;✅ Production-ready&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Context &amp;gt; cleverness, every time.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fg33u8io15reingmumtb0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fg33u8io15reingmumtb0.png" alt=" " width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Try It Yourself
&lt;/h2&gt;

&lt;p&gt;If you're working across multiple repositories—microservices, SDKs, internal libraries—&lt;a href="https://vyazen.dev" rel="noopener noreferrer"&gt;Vyazen&lt;/a&gt; helps you give AI agents the source code context they need to provide accurate answers.&lt;/p&gt;

&lt;p&gt;And remember: when you’re stuck on a “compatible but broken” integration, the answer isn’t a smarter AI. It’s giving your AI the complete context.&lt;/p&gt;




</description>
      <category>ai</category>
      <category>mcp</category>
      <category>codesearch</category>
      <category>contextengineering</category>
    </item>
  </channel>
</rss>
