<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Mukunda Rao Katta</title>
    <description>The latest articles on Forem by Mukunda Rao Katta (@mukundakatta).</description>
    <link>https://forem.com/mukundakatta</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3915555%2F5abcf328-9b6f-487e-a7ed-d61ebab7e3b1.jpeg</url>
      <title>Forem: Mukunda Rao Katta</title>
      <link>https://forem.com/mukundakatta</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/mukundakatta"/>
    <language>en</language>
    <item>
      <title>Six Reliability Primitives for LLM Agents</title>
      <dc:creator>Mukunda Rao Katta</dc:creator>
      <pubDate>Fri, 08 May 2026 08:31:49 +0000</pubDate>
      <link>https://forem.com/mukundakatta/six-reliability-primitives-for-llm-agents-m13</link>
      <guid>https://forem.com/mukundakatta/six-reliability-primitives-for-llm-agents-m13</guid>
      <description>&lt;p&gt;Reliability concerns for LLM agents are typically bundled into one heavy framework that asks you to adopt prompting, tool routing, and runtime governance as a single dependency. Production teams want them à la carte. They want small primitives they can drop in around existing tool calls without buying into a new programming model.&lt;/p&gt;

&lt;p&gt;That observation is the design centre of &lt;strong&gt;agent-stack&lt;/strong&gt;: six small, single-concern reliability libraries published independently to &lt;strong&gt;npm&lt;/strong&gt;, &lt;strong&gt;PyPI&lt;/strong&gt;, and the &lt;strong&gt;Model Context Protocol&lt;/strong&gt; registry. Each library is zero-dependency, under 500 lines of code, and addresses one specific failure mode that production agent teams have to handle.&lt;/p&gt;

&lt;p&gt;This post is a tour of the six primitives, the cross-cutting invariants they enforce, and the trade-offs of "composable by inclusion" instead of "composable by framework."&lt;/p&gt;

&lt;h2&gt;
  
  
  The six primitives
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Library&lt;/th&gt;
&lt;th&gt;Concern&lt;/th&gt;
&lt;th&gt;Failure mode it addresses&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;AgentFit&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Context-window fitting&lt;/td&gt;
&lt;td&gt;Token-aware truncation. Pluggable tokenizers for OpenAI / Anthropic / open models.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;AgentGuard&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Network egress allowlist&lt;/td&gt;
&lt;td&gt;Blocks "agent suddenly POSTs PHI/secrets to attacker.com" at the network layer.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;AgentSnap&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Snapshot tests for tool calls&lt;/td&gt;
&lt;td&gt;Catches silent regressions when a model's tool-call shape changes after a deploy.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;AgentVet&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Tool-arg validation&lt;/td&gt;
&lt;td&gt;Throws &lt;code&gt;ToolArgError&lt;/code&gt; with an LLM-friendly retry hint, so the next turn can self-correct.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;AgentCast&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Structured-output retry&lt;/td&gt;
&lt;td&gt;BYO-LLM JSON validator + retry loop for malformed responses during brown-outs.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;AgentBudget&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Token + dollar caps&lt;/td&gt;
&lt;td&gt;Prevents runaway loops billing $1000 on a single user query.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Three runtimes per primitive
&lt;/h2&gt;

&lt;p&gt;Every library ships in three runtime forms:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# TypeScript (npm)&lt;/span&gt;
npm i @mukundakatta/agentvet @mukundakatta/agentguard @mukundakatta/agentbudget

&lt;span class="c"&gt;# Python (PyPI)&lt;/span&gt;
pip &lt;span class="nb"&gt;install &lt;/span&gt;agentvet agentguard agentbudget
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="err"&gt;//&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;MCP&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;server&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;(Claude&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;Desktop&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;config)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"mcpServers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"agentvet"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"npx"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"args"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"-y"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"@mukundakatta/agentvet-mcp"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"agentguard"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"npx"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"args"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"-y"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"@mukundakatta/agentguard-mcp"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The Python and TypeScript surfaces are 1:1 by design — a Python team and a TypeScript team can interoperate on the same primitives. The MCP variant lets a remote LLM use the primitive as a tool: ask AgentVet to validate a tool call before executing it, ask AgentGuard to check an outbound URL before sending the request.&lt;/p&gt;

&lt;h2&gt;
  
  
  Composable by inclusion, not by framework
&lt;/h2&gt;

&lt;p&gt;Every primitive has the same shape: a single class or function as the public surface, a typed error carrying retry-friendly context, and an opt-in automatic adapter for popular provider response shapes. Nothing depends on anything else. You can use AgentBudget without AgentFit. You can use AgentVet's MCP variant without ever touching the npm package.&lt;/p&gt;

&lt;p&gt;Compare with framework-style libraries that bundle reliability concerns: you get them all or none, you adopt their orchestration model, you migrate to their abstractions. agent-stack inverts that.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this matters in practice
&lt;/h2&gt;

&lt;p&gt;Healthcare amplifies every reliability concern. A FHIR-querying agent has to be defensive about: only calling sanctioned endpoints, never leaking PHI in logs, abstaining when the right tool is not on the list, validating that a &lt;code&gt;patient_id&lt;/code&gt; looks like a real FHIR id before it hits the tool, never exceeding a budget on a single patient query, and producing structured outputs the downstream system can actually parse. Existing agent frameworks address one or two of those concerns. agent-stack gives you all six in libraries you adopt independently.&lt;/p&gt;

&lt;p&gt;The same pattern applies to any production setting where the cost of an agent silently failing is higher than the cost of a slightly-too-defensive primitive: financial services, internal corporate tools, customer support automation, anything billable.&lt;/p&gt;

&lt;h2&gt;
  
  
  Artifact paper
&lt;/h2&gt;

&lt;p&gt;The full design rationale, the cross-cutting invariants, and the operational questions that emerge when reliability is split across many small dependencies are documented in a peer-reviewable artifact paper:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Zenodo DOI:&lt;/strong&gt; &lt;a href="https://doi.org/10.5281/zenodo.20074702" rel="noopener noreferrer"&gt;10.5281/zenodo.20074702&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;HuggingFace Hub:&lt;/strong&gt; &lt;a href="https://huggingface.co/mukunda1729/agent-stack" rel="noopener noreferrer"&gt;huggingface.co/mukunda1729/agent-stack&lt;/a&gt; (HF DOI &lt;a href="https://doi.org/10.57967/hf/8720" rel="noopener noreferrer"&gt;10.57967/hf/8720&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Source:&lt;/strong&gt; &lt;a href="https://github.com/MukundaKatta/agent-stack" rel="noopener noreferrer"&gt;github.com/MukundaKatta/agent-stack&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The paper is currently under review at the &lt;strong&gt;ASE 2026 Tools track&lt;/strong&gt;. Source for every library is archived in &lt;strong&gt;Software Heritage&lt;/strong&gt;. Every package has CI on GitHub Actions, snapshot tests, and a &lt;code&gt;CITATION.cff&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's next
&lt;/h2&gt;

&lt;p&gt;The natural next primitive is &lt;strong&gt;AgentTrace&lt;/strong&gt; for cost and latency telemetry. After that, a combined &lt;code&gt;@mukundakatta/agent-stack&lt;/code&gt; meta-package that imports all six with sensible defaults for production agents. The healthcare-aware AgentGuard preset (curated allowlist of FHIR endpoints, PHI redaction policies) is a natural specialization.&lt;/p&gt;

&lt;p&gt;If you ship LLM agents in production and you want one of the primitives without buying into a framework, install only the library you need. Each one stands alone.&lt;/p&gt;

&lt;p&gt;Source on GitHub. Paper on Zenodo. DOI on HuggingFace. Six primitives, three runtimes, one unified surface for production reliability.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>mcp</category>
      <category>opensource</category>
    </item>
    <item>
      <title>Choosing Gemma 4 for Local AI Apps: A Builder's Field Guide</title>
      <dc:creator>Mukunda Rao Katta</dc:creator>
      <pubDate>Thu, 07 May 2026 23:34:58 +0000</pubDate>
      <link>https://forem.com/mukundakatta/choosing-gemma-4-for-local-ai-apps-a-builders-field-guide-5105</link>
      <guid>https://forem.com/mukundakatta/choosing-gemma-4-for-local-ai-apps-a-builders-field-guide-5105</guid>
      <description>&lt;p&gt;This is my submission for the &lt;strong&gt;Write About Gemma 4&lt;/strong&gt; prompt in the DEV Gemma 4 Challenge.&lt;/p&gt;

&lt;p&gt;Gemma 4 is exciting because it gives builders a real model family to reason about, not just a single model name to drop into a README. The useful question is not "Can I call an AI model?" The useful question is "Which Gemma 4 model belongs at the center of this workflow, and why?"&lt;/p&gt;

&lt;p&gt;Here is the field guide I wish I had before building with it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Start With the Job
&lt;/h2&gt;

&lt;p&gt;Before picking a model size, write down the job in plain language.&lt;/p&gt;

&lt;p&gt;For example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;summarize long, messy project notes&lt;/li&gt;
&lt;li&gt;classify a short message on-device&lt;/li&gt;
&lt;li&gt;reason across several files&lt;/li&gt;
&lt;li&gt;read screenshots or mixed media&lt;/li&gt;
&lt;li&gt;generate a structured plan from incomplete context&lt;/li&gt;
&lt;li&gt;help a user make a decision without exposing private data&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That first sentence matters. If the job is tiny, you probably do not need the biggest model. If the job involves long context, ambiguity, tradeoffs, and user trust, a larger reasoning model can be worth the extra cost.&lt;/p&gt;

&lt;h2&gt;
  
  
  When I Would Reach for Gemma 4 31B
&lt;/h2&gt;

&lt;p&gt;I would start with Gemma 4 31B when the task needs richer reasoning.&lt;/p&gt;

&lt;p&gt;Good fits:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;long notes or documents&lt;/li&gt;
&lt;li&gt;planning and critique&lt;/li&gt;
&lt;li&gt;multi-step analysis&lt;/li&gt;
&lt;li&gt;structured outputs with explanations&lt;/li&gt;
&lt;li&gt;product workflows where the model must surface assumptions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The key is that 31B should be doing more than autocomplete. It should help the user see something more clearly: a risk, a tradeoff, a missing fact, or a next step.&lt;/p&gt;

&lt;p&gt;In my BriefBench project, I used 31B as the default reasoning path because the app asks Gemma 4 to turn messy context into a decision brief. The model has to preserve constraints, explain a model choice, identify risks, and return sections the UI can render.&lt;/p&gt;

&lt;h2&gt;
  
  
  When Smaller Gemma 4 Models Make More Sense
&lt;/h2&gt;

&lt;p&gt;Smaller Gemma 4 models are not lesser choices. They are different product choices.&lt;/p&gt;

&lt;p&gt;I would look at smaller or edge-friendly Gemma 4 options when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;latency matters more than perfect prose&lt;/li&gt;
&lt;li&gt;the task is repeated many times&lt;/li&gt;
&lt;li&gt;the user is on modest hardware&lt;/li&gt;
&lt;li&gt;privacy requires local execution&lt;/li&gt;
&lt;li&gt;the output is narrow and easy to validate&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Examples:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;tagging support tickets&lt;/li&gt;
&lt;li&gt;drafting short replies&lt;/li&gt;
&lt;li&gt;extracting fields from predictable text&lt;/li&gt;
&lt;li&gt;running an offline assistant on a small device&lt;/li&gt;
&lt;li&gt;doing first-pass filtering before a larger model reviews only the hard cases&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is where local AI becomes more than a slogan. A smaller local model can be the right answer when the app should feel instant, private, and cheap to run.&lt;/p&gt;

&lt;h2&gt;
  
  
  Design the UI Around Model Uncertainty
&lt;/h2&gt;

&lt;p&gt;A lot of AI apps make the same mistake: they treat model output like a final answer.&lt;/p&gt;

&lt;p&gt;For practical tools, I prefer interfaces that expose:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;assumptions&lt;/li&gt;
&lt;li&gt;confidence boundaries&lt;/li&gt;
&lt;li&gt;missing inputs&lt;/li&gt;
&lt;li&gt;risks&lt;/li&gt;
&lt;li&gt;alternate options&lt;/li&gt;
&lt;li&gt;human review moments&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This makes the model more useful because the user can inspect the reasoning instead of simply trusting the final paragraph.&lt;/p&gt;

&lt;p&gt;Gemma 4 is especially useful in workflows where the output becomes structured UI. Instead of asking for one long answer, ask for JSON sections:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"summary"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"..."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"modelRationale"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"..."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"risks"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"buildPlan"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"assumptions"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That small shift changes the product. The model is no longer just writing. It is powering an interface.&lt;/p&gt;

&lt;h2&gt;
  
  
  Hosted Prototype, Local Production
&lt;/h2&gt;

&lt;p&gt;One practical pattern is:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;prototype with a hosted API&lt;/li&gt;
&lt;li&gt;learn the exact prompts and output schema&lt;/li&gt;
&lt;li&gt;move privacy-sensitive workflows toward local inference&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This keeps early development fast while still respecting the reason many people care about Gemma: the ability to build useful AI without sending every sensitive note, image, or document to a remote service.&lt;/p&gt;

&lt;p&gt;The important thing is to be honest in the product. If the demo uses a hosted API, say so. If the production vision is local-first, explain which data should stay on-device and why.&lt;/p&gt;

&lt;h2&gt;
  
  
  A Simple Selection Checklist
&lt;/h2&gt;

&lt;p&gt;When choosing a Gemma 4 model, I would ask:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;How long is the input?&lt;/li&gt;
&lt;li&gt;How complex is the reasoning?&lt;/li&gt;
&lt;li&gt;Does the user need the answer instantly?&lt;/li&gt;
&lt;li&gt;Can the output be automatically checked?&lt;/li&gt;
&lt;li&gt;Is the data sensitive?&lt;/li&gt;
&lt;li&gt;Will this run once, or thousands of times?&lt;/li&gt;
&lt;li&gt;Is the model generating content, making a decision aid, or controlling a workflow?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If the answer involves long context, messy reasoning, and explanation, start larger.&lt;/p&gt;

&lt;p&gt;If the answer involves fast repeated classification, extraction, or local privacy, start smaller.&lt;/p&gt;

&lt;p&gt;If the task is safety-critical, design the product so Gemma 4 assists a human rather than silently deciding for one.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Makes a Good Gemma 4 Project
&lt;/h2&gt;

&lt;p&gt;A strong Gemma 4 project should make the model choice visible.&lt;/p&gt;

&lt;p&gt;That means the README, demo, or article should answer:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What does Gemma 4 do that is central to the product?&lt;/li&gt;
&lt;li&gt;Which model did you choose?&lt;/li&gt;
&lt;li&gt;What tradeoff did you accept?&lt;/li&gt;
&lt;li&gt;What would you change for production?&lt;/li&gt;
&lt;li&gt;How does the user inspect or correct the output?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Those questions are more interesting than a generic "AI-powered" label. They show that the builder understood the model as part of the system, not as decoration.&lt;/p&gt;

&lt;h2&gt;
  
  
  Closing Thought
&lt;/h2&gt;

&lt;p&gt;The best Gemma 4 apps will not all use the largest model. They will use the right model for the job.&lt;/p&gt;

&lt;p&gt;Sometimes that means 31B for deeper reasoning. Sometimes it means a smaller local model because privacy, latency, and cost are the real product requirements.&lt;/p&gt;

&lt;p&gt;That is what makes Gemma 4 fun to build with: it invites us to think like product engineers again.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;/li&gt;
&lt;/ol&gt;

</description>
      <category>gemma</category>
      <category>gemmachallenge</category>
      <category>devchallenge</category>
      <category>ai</category>
    </item>
    <item>
      <title>I Built BriefBench: A Gemma 4 Tool That Turns Messy Notes Into Model Decisions</title>
      <dc:creator>Mukunda Rao Katta</dc:creator>
      <pubDate>Thu, 07 May 2026 23:10:00 +0000</pubDate>
      <link>https://forem.com/mukundakatta/i-built-briefbench-a-gemma-4-tool-that-turns-messy-notes-into-model-decisions-3m5p</link>
      <guid>https://forem.com/mukundakatta/i-built-briefbench-a-gemma-4-tool-that-turns-messy-notes-into-model-decisions-3m5p</guid>
      <description>&lt;p&gt;This is my submission for the &lt;strong&gt;Build With Gemma 4&lt;/strong&gt; prompt in the DEV Gemma 4 Challenge.&lt;/p&gt;

&lt;p&gt;BriefBench is a small local-first web app that helps builders turn rough project notes into a structured decision brief. The goal is not to hide Gemma 4 behind a generic chat interface. The goal is to make the model's work visible: read context, explain the model choice, surface risks, propose a build plan, and suggest a story angle for a technical post.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Built
&lt;/h2&gt;

&lt;p&gt;You paste messy notes about a project: users, constraints, privacy concerns, available hardware, and what you are trying to decide. BriefBench asks Gemma 4 to return strict JSON with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;summary&lt;/li&gt;
&lt;li&gt;model rationale&lt;/li&gt;
&lt;li&gt;risks&lt;/li&gt;
&lt;li&gt;build plan&lt;/li&gt;
&lt;li&gt;user experience notes&lt;/li&gt;
&lt;li&gt;DEV post angle&lt;/li&gt;
&lt;li&gt;assumptions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The UI then renders those sections as a scannable brief and lets you copy the result as Markdown.&lt;/p&gt;

&lt;h2&gt;
  
  
  Demo
&lt;/h2&gt;

&lt;p&gt;Try the app here:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://gemma-4-briefbench.vercel.app" rel="noopener noreferrer"&gt;https://gemma-4-briefbench.vercel.app&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The public demo runs in demo mode by default so anyone can test the full workflow immediately. If you run it locally with a Gemma 4 API key, the same interface can call Gemma 4 through Google AI Studio or OpenRouter.&lt;/p&gt;

&lt;h2&gt;
  
  
  Code
&lt;/h2&gt;

&lt;p&gt;Repository:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/MukundaKatta/gemma-4-briefbench" rel="noopener noreferrer"&gt;https://github.com/MukundaKatta/gemma-4-briefbench&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  How I Used Gemma 4
&lt;/h2&gt;

&lt;p&gt;The challenge page asks builders to make a clear case for model selection, so I designed the app around that idea.&lt;/p&gt;

&lt;p&gt;Gemma 4 is a good fit because the work is context-heavy but not just raw generation. The model has to read incomplete notes, preserve constraints, reason about tradeoffs, and produce structured output that a user can inspect.&lt;/p&gt;

&lt;p&gt;For the prototype, I defaulted to &lt;code&gt;gemma-4-31b-it&lt;/code&gt; because the 31B dense model is the best fit for richer reasoning over messy planning notes. For production edge scenarios, the app also talks about when Gemma 4 E2B or E4B would be more appropriate, especially for privacy-sensitive workflows running on phones, small devices, or a Raspberry Pi.&lt;/p&gt;

&lt;h2&gt;
  
  
  How It Works
&lt;/h2&gt;

&lt;p&gt;The project is intentionally simple:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a no-framework Node server&lt;/li&gt;
&lt;li&gt;a static HTML/CSS/JS frontend&lt;/li&gt;
&lt;li&gt;one &lt;code&gt;/api/analyze&lt;/code&gt; endpoint&lt;/li&gt;
&lt;li&gt;provider options for demo mode, Google AI Studio, and OpenRouter&lt;/li&gt;
&lt;li&gt;API keys stay on the server&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The system prompt asks Gemma 4 to behave as a local-first project brief analyzer and return strict JSON. The frontend turns that JSON into sections instead of displaying a wall of text.&lt;/p&gt;

&lt;p&gt;Demo mode exists so anyone can review the app immediately, even without an API key. With a key configured, the same interface can call Gemma 4 through Google AI Studio or OpenRouter.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Gemma 4 Unlocks
&lt;/h2&gt;

&lt;p&gt;The useful part is not that the app can summarize notes. The useful part is that it makes the decision process visible.&lt;/p&gt;

&lt;p&gt;For hackathon builders, this matters. It is easy to say "I used an AI model." It is harder, and more valuable, to explain:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;why this model was chosen&lt;/li&gt;
&lt;li&gt;what constraints shaped that choice&lt;/li&gt;
&lt;li&gt;what risks remain&lt;/li&gt;
&lt;li&gt;what the user should do next&lt;/li&gt;
&lt;li&gt;how the model output becomes product UI&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;BriefBench turns that reasoning into the core product experience.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Would Add Next
&lt;/h2&gt;

&lt;p&gt;The next version would add:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;local Ollama or llama.cpp support for fully offline Gemma 4 runs&lt;/li&gt;
&lt;li&gt;file upload for Markdown, CSV, and PDF notes&lt;/li&gt;
&lt;li&gt;side-by-side model comparison between 31B, 26B MoE, and smaller edge models&lt;/li&gt;
&lt;li&gt;saved briefs for teams evaluating multiple AI project ideas&lt;/li&gt;
&lt;li&gt;export templates for DEV posts, README files, and product specs&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Try It
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm start
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then open:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;http://localhost:4174
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The app runs in demo mode by default. To use Gemma 4 through an API, set &lt;code&gt;GEMMA_PROVIDER&lt;/code&gt; and the matching API key in &lt;code&gt;.env&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Closing Thought
&lt;/h2&gt;

&lt;p&gt;The best local AI tools should make users feel more capable, not more dependent. BriefBench uses Gemma 4 as a thinking partner for the unglamorous but important middle step: turning rough context into a decision someone can actually act on.&lt;/p&gt;

</description>
      <category>gemma</category>
      <category>gemmachallenge</category>
      <category>devchallenge</category>
    </item>
    <item>
      <title>I Got Burned by Prompt Injection in Production. Here Are 2 Tiny npm Libs That Stopped It.</title>
      <dc:creator>Mukunda Rao Katta</dc:creator>
      <pubDate>Wed, 06 May 2026 19:18:22 +0000</pubDate>
      <link>https://forem.com/mukundakatta/i-got-burned-by-prompt-injection-in-production-here-are-2-tiny-npm-libs-that-stopped-it-438i</link>
      <guid>https://forem.com/mukundakatta/i-got-burned-by-prompt-injection-in-production-here-are-2-tiny-npm-libs-that-stopped-it-438i</guid>
      <description>&lt;p&gt;A user pasted a help article into our agent. Three minutes later the agent silently rewrote a customer email, leaked an internal URL, and tried to fetch a &lt;code&gt;.zip&lt;/code&gt; from a domain none of us had ever seen.&lt;/p&gt;

&lt;p&gt;Nothing in the LLM was wrong. The problem was upstream. Retrieved text walked into the prompt with no inspection, and the agent treated it as gospel.&lt;/p&gt;

&lt;p&gt;I wrote up the lessons as a short preprint. The two npm libs below are the working code behind it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The two libs
&lt;/h2&gt;

&lt;h3&gt;
  
  
  &lt;code&gt;@mukundakatta/prompt-injection-shield&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;A small-rule scanner for prompt-injection patterns in untrusted text. No heuristics, no ML, no weights. Just regex-grade rules with a typed &lt;code&gt;risk_reasons&lt;/code&gt; array so you can log, gate, or strip lines.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install&lt;/span&gt; @mukundakatta/prompt-injection-shield
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;scan&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@mukundakatta/prompt-injection-shield&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;r&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;scan&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;retrievedDoc&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;risk_score&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;warn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;blocked:&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;risk_reasons&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;What it catches:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"ignore previous instructions" and family&lt;/li&gt;
&lt;li&gt;system-prompt impersonation&lt;/li&gt;
&lt;li&gt;tool-call hijack patterns&lt;/li&gt;
&lt;li&gt;url-based exfil hints&lt;/li&gt;
&lt;li&gt;secret patterns the model should not see&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When a rule fires, you get the line, the rule id, and a recommendation. Strip, redact, drop, or feed it to your audit trail. Up to you.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;code&gt;@mukundakatta/vector-poison-score&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;Same idea, retrieval side. Score chunks before they go into context.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install&lt;/span&gt; @mukundakatta/vector-poison-score
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;score&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@mukundakatta/vector-poison-score&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;s&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;score&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;s&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;poison_score&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="nf"&gt;skip&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;What it scores:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;oversized chunks (token bloat attacks)&lt;/li&gt;
&lt;li&gt;secret-exfiltration patterns inside retrieved text&lt;/li&gt;
&lt;li&gt;suspicious link clusters&lt;/li&gt;
&lt;li&gt;mixed-language anomalies in technical docs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Weights are tunable. Defaults are conservative. Both libs have &lt;strong&gt;zero runtime dependencies&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why "small rules"
&lt;/h2&gt;

&lt;p&gt;Big ML defenses are expensive, opaque, and hard to audit when something slips. Small rules are the opposite. You can read them. You can grep them. You can fork the file when your threat model is different from mine.&lt;/p&gt;

&lt;p&gt;Same logic as a linter. Not perfect. Not sexy. Catches a huge chunk of the dumb stuff before the model has to think about it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where they sit in the pipeline
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;retrieval -&amp;gt; [vector-poison-score] -&amp;gt; reranker
                                      |
                                      v
              tool output -&amp;gt; [prompt-injection-shield] -&amp;gt; prompt
                                                          |
                                                          v
                                                         LLM
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Two checkpoints. Cheap. Easy to disable per request. No effect on latency above the noise floor.&lt;/p&gt;

&lt;h2&gt;
  
  
  The preprint
&lt;/h2&gt;

&lt;p&gt;Full writeup with threat model, rule design, and limitations:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Zenodo DOI: &lt;a href="https://doi.org/10.5281/zenodo.20057056" rel="noopener noreferrer"&gt;10.5281/zenodo.20057056&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Figshare DOI: &lt;a href="https://doi.org/10.6084/m9.figshare.32193543" rel="noopener noreferrer"&gt;10.6084/m9.figshare.32193543&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;GitHub bundle: &lt;a href="https://github.com/MukundaKatta/rag-guardrails-paper" rel="noopener noreferrer"&gt;MukundaKatta/rag-guardrails-paper&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;License is CC BY 4.0 on the paper, MIT on the code. Both libs are tiny. Both are forkable in five minutes.&lt;/p&gt;

&lt;h2&gt;
  
  
  What this is not
&lt;/h2&gt;

&lt;p&gt;Not a replacement for a full security review. Not a benchmark claim. Not a model. The whole thesis is that an inspectable, boring baseline between retrieval and prompt construction is worth more than nothing, and most teams ship with nothing.&lt;/p&gt;

&lt;p&gt;If you build agentic RAG, drop these in front of your prompt. Then run a real audit later.&lt;/p&gt;

</description>
      <category>aisecurityragjavascript</category>
    </item>
    <item>
      <title>I Built 5 Tiny Libraries to Stop My AI Agents from Misbehaving in Production</title>
      <dc:creator>Mukunda Rao Katta</dc:creator>
      <pubDate>Wed, 06 May 2026 08:28:15 +0000</pubDate>
      <link>https://forem.com/mukundakatta/i-built-5-tiny-libraries-to-stop-my-ai-agents-from-misbehaving-in-production-3oni</link>
      <guid>https://forem.com/mukundakatta/i-built-5-tiny-libraries-to-stop-my-ai-agents-from-misbehaving-in-production-3oni</guid>
      <description>&lt;p&gt;I've been building production AI agents for a while now. RAG pipelines, agentic workflows, multi-model routers. I kept hitting the same five problems over and over. Not architectural problems. Plumbing problems.&lt;/p&gt;

&lt;p&gt;The kind that don't show up in tutorials but wreck you in production:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The agent stuffs too much into the context window and the API errors out&lt;/li&gt;
&lt;li&gt;A tool fires an HTTP request to some random domain it hallucinated&lt;/li&gt;
&lt;li&gt;You change a prompt and have no idea if the agent's behavior changed&lt;/li&gt;
&lt;li&gt;The LLM passes malformed args to a tool and it blows up downstream&lt;/li&gt;
&lt;li&gt;The model returns &lt;code&gt;{ "result": "here is the JSON: {...}" }&lt;/code&gt; instead of clean JSON and your parser chokes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I fixed each of these once, copy-pasted the fix into three other projects, and then got tired of that. So I extracted them into five small libraries. Zero dependencies each. TypeScript-first. Under 300 lines per package.&lt;/p&gt;

&lt;p&gt;Here's the stack, in the order you'd use it at runtime:&lt;/p&gt;




&lt;h2&gt;
  
  
  1. agentfit - fit messages to the context window
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Problem:&lt;/strong&gt; Your conversation history grows until the API throws a &lt;code&gt;context_length_exceeded&lt;/code&gt; error at the worst possible moment.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt; Token-aware truncation before the API call.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;fitMessages&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@mukundakatta/agentfit&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fitMessages&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;history&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;maxTokens&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;8000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;strategy&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;drop-middle&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;// keeps system + recent, drops stale middle&lt;/span&gt;
  &lt;span class="na"&gt;tokenizer&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;cl100k&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;openai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;gpt-4o&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Strategies: &lt;code&gt;drop-oldest&lt;/code&gt;, &lt;code&gt;drop-middle&lt;/code&gt;, &lt;code&gt;summarize-oldest&lt;/code&gt;. Pluggable tokenizer if you're not on OpenAI's encoding.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.npmjs.com/package/@mukundakatta/agentfit" rel="noopener noreferrer"&gt;npm&lt;/a&gt; · &lt;a href="https://github.com/MukundaKatta/agentfit" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  2. agentguard - network egress firewall
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Problem:&lt;/strong&gt; Your agent calls a tool that makes an HTTP request. The LLM hallucinates a URL. Your server pings something it shouldn't.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt; Declare an allowlist. Throw on anything outside it.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;createGuard&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@mukundakatta/agentguard&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;guard&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;createGuard&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;allow&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;api.openai.com&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;your-internal-api.com&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;s3.amazonaws.com&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="c1"&gt;// wrap your fetch&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;safeFetch&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;guard&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;wrap&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;// now any attempt to hit an unlisted domain throws AgentGuardError&lt;/span&gt;
&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;safeFetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;https://hallucinated-domain.xyz/exfiltrate&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="c1"&gt;// throws&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Useful in CI too - run your agent tests with a strict allowlist and catch unexpected egress before it hits prod.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.npmjs.com/package/@mukundakatta/agentguard" rel="noopener noreferrer"&gt;npm&lt;/a&gt; · &lt;a href="https://github.com/MukundaKatta/agentguard" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  3. agentsnap - snapshot tests for tool-call traces
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Problem:&lt;/strong&gt; You tweak a prompt and have no idea if it changed the agent's tool-call behavior. No diff, no signal, just vibes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt; Snapshot the tool-call trace, not just the final output.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;snap&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@mukundakatta/agentsnap&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="nf"&gt;test&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;research agent calls search before summarize&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;trace&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;runMyAgent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;summarize recent AI papers&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;snap&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;trace&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;research-agent-trace&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="c1"&gt;// First run: writes the snapshot.&lt;/span&gt;
  &lt;span class="c1"&gt;// Subsequent runs: diffs against it. Fails if tool calls changed.&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Snapshots store the ordered sequence of tool names, arg shapes, and return types, not the raw values (which are flaky). Works with any agent runner: LangGraph, LlamaIndex, raw OpenAI function calls, whatever.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.npmjs.com/package/@mukundakatta/agentsnap" rel="noopener noreferrer"&gt;npm&lt;/a&gt; · &lt;a href="https://github.com/MukundaKatta/agentsnap" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  4. agentvet - validate tool args before execution
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Problem:&lt;/strong&gt; The LLM calls &lt;code&gt;send_email&lt;/code&gt; but forgets the &lt;code&gt;subject&lt;/code&gt; field. Your handler throws a cryptic error. The agent retries blindly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt; Validate before execution. Return an LLM-friendly error message so it can self-correct.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;vetTool&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@mukundakatta/agentvet&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;sendEmail&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;vetTool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;send_email&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;schema&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;to&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;string&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;required&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="na"&gt;subject&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;string&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;required&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="na"&gt;body&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;string&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;required&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;to&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;subject&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;body&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;mailer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;send&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;to&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;subject&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;body&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;// If the LLM omits `subject`, the call returns:&lt;/span&gt;
&lt;span class="c1"&gt;// "send_email rejected your args: missing required field: subject.&lt;/span&gt;
&lt;span class="c1"&gt;//  Please call again with the corrected arguments."&lt;/span&gt;
&lt;span class="c1"&gt;// - ready to feed straight back into the next turn.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://www.npmjs.com/package/@mukundakatta/agentvet" rel="noopener noreferrer"&gt;npm&lt;/a&gt; · &lt;a href="https://github.com/MukundaKatta/agentvet" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  5. agentcast - structured output enforcer
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Problem:&lt;/strong&gt; You ask for JSON. You get &lt;code&gt;Sure! Here's the JSON you asked for: { ... }&lt;/code&gt;. Or worse, truncated JSON. Your parser throws. The agent moves on as if nothing happened.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt; Validate-and-retry loop with schema enforcement.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;cast&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@mukundakatta/agentcast&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;cast&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Extract the company name and founding year from this text: ...&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;schema&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;company&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;string&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="na"&gt;founded&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;number&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="na"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;openai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="p"&gt;...&lt;/span&gt; &lt;span class="p"&gt;}),&lt;/span&gt;
  &lt;span class="na"&gt;maxRetries&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="c1"&gt;// result is guaranteed to match the schema or throw after maxRetries&lt;/span&gt;
&lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;company&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;founded&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;On a bad response it builds a corrective prompt automatically and retries. BYO LLM client and BYO validator. The library is the loop, not the dependencies.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.npmjs.com/package/@mukundakatta/agentcast" rel="noopener noreferrer"&gt;npm&lt;/a&gt; · &lt;a href="https://github.com/MukundaKatta/agentcast" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  How they fit together
&lt;/h2&gt;

&lt;p&gt;At runtime, the order makes sense:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;fitMessages   →  trim history before the API call
    ↓
agentguard    →  wrap fetch so tools can't call arbitrary URLs
    ↓
agentvet      →  validate tool args before the handler runs
    ↓
agentcast     →  enforce structured output after the LLM responds
    ↓
agentsnap     →  in tests, snapshot the trace to catch regressions
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You don't have to use all five. Each is a drop-in. I use all five in my production pipelines and two or three in smaller scripts.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why not one big framework?
&lt;/h2&gt;

&lt;p&gt;Because one big framework becomes the thing you fight. Each of these solves exactly one problem, has no opinions about the rest of your stack, and disappears when you don't need it.&lt;/p&gt;

&lt;p&gt;All five are on npm under &lt;code&gt;@mukundakatta/&lt;/code&gt;. MIT licensed. PRs open.&lt;/p&gt;

&lt;p&gt;If you're building agents in production and hitting these same walls or different walls, I'd like to hear about it in the comments.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>javascript</category>
      <category>opensource</category>
      <category>agents</category>
    </item>
  </channel>
</rss>
