<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Sammegh Banjara</title>
    <description>The latest articles on Forem by Sammegh Banjara (@sammegh_banjara_4fdf6241f).</description>
    <link>https://forem.com/sammegh_banjara_4fdf6241f</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3842747%2F1ae95a0a-82ce-46c9-be89-a63685d5e141.jpg</url>
      <title>Forem: Sammegh Banjara</title>
      <link>https://forem.com/sammegh_banjara_4fdf6241f</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/sammegh_banjara_4fdf6241f"/>
    <language>en</language>
    <item>
      <title>How I built a "Gatekeeper" for AI Agents (And why prompt filtering isn't enough)</title>
      <dc:creator>Sammegh Banjara</dc:creator>
      <pubDate>Sun, 29 Mar 2026 12:41:47 +0000</pubDate>
      <link>https://forem.com/sammegh_banjara_4fdf6241f/how-i-built-a-gatekeeper-for-ai-agents-and-why-prompt-filtering-isnt-enough-1lio</link>
      <guid>https://forem.com/sammegh_banjara_4fdf6241f/how-i-built-a-gatekeeper-for-ai-agents-and-why-prompt-filtering-isnt-enough-1lio</guid>
      <description>&lt;p&gt;We spend a lot of time securing the inputs to our LLMs—filtering prompts, checking for injections.&lt;/p&gt;

&lt;p&gt;But in the world of AI Agents, we have a new blind spot: Tool Outputs.&lt;/p&gt;

&lt;p&gt;When an agent calls get_jira_ticket, the response often contains a dump of raw text. In my case, that text contained user emails and internal secrets.&lt;/p&gt;

&lt;p&gt;If I logged that context window to an observability tool, I was essentially persisting secrets in a dashboard.&lt;/p&gt;

&lt;p&gt;So, I built QuiGuard to solve this. Here is how it works under the hood.&lt;/p&gt;

&lt;p&gt;The Architecture&lt;br&gt;
I didn't want to rewrite the agent frameworks (LangChain/AutoGen). I needed something that sat transparently in the middle.&lt;/p&gt;

&lt;p&gt;The solution was a Reverse Proxy.&lt;/p&gt;

&lt;p&gt;Interception: The proxy accepts the OpenAI-compatible API request.&lt;br&gt;
Traversal: It recursively walks through the messages array.&lt;br&gt;
The Gatekeeper Logic: If it sees a message with role: "tool", it knows this is data coming back from an API.&lt;br&gt;
The Challenge: Recursive JSON&lt;br&gt;
Tool responses aren't always clean strings. Sometimes they are stringified JSON inside JSON.&lt;/p&gt;

&lt;p&gt;To handle this, I wrote a recursive scrubber:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_recursive_scrub&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;isinstance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;_recursive_scrub&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;v&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;items&lt;/span&gt;&lt;span class="p"&gt;()}&lt;/span&gt;
    &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="nf"&gt;isinstance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nf"&gt;_recursive_scrub&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;item&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="nf"&gt;isinstance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# It's a string. Is it stringified JSON? Try to parse.
&lt;/span&gt;        &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;nested_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;scrubbed_nested&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;_recursive_scrub&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;nested_data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;scrubbed_nested&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;JSONDecodeError&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="c1"&gt;# Not JSON, just a normal string. Scrub PII.
&lt;/span&gt;            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;sanitize_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This ensures that even if a tool returns &lt;code&gt;{"body": "{\"user\": \"secret@...\"}"}&lt;/code&gt;, we catch the secret.&lt;/p&gt;

&lt;p&gt;The Result&lt;br&gt;
Clean Logs: My LangSmith traces now show  instead of real emails.&lt;br&gt;
Safe Context: The LLM processes the logic without "seeing" the sensitive data.&lt;br&gt;
Restoration: The user sees the real data in the final reply.&lt;br&gt;
I open-sourced the project (MIT).&lt;/p&gt;

&lt;p&gt;Repo: &lt;a href="https://github.com/somegg90-blip/quiguard-gateway" rel="noopener noreferrer"&gt;https://github.com/somegg90-blip/quiguard-gateway&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Curious if others have run into the "messy tool output" problem? Let me know in the comments!&lt;/p&gt;

</description>
      <category>ai</category>
      <category>opensource</category>
      <category>security</category>
      <category>automation</category>
    </item>
    <item>
      <title>AI Agents are doing more than you think.</title>
      <dc:creator>Sammegh Banjara</dc:creator>
      <pubDate>Wed, 25 Mar 2026 08:01:44 +0000</pubDate>
      <link>https://forem.com/sammegh_banjara_4fdf6241f/agents-are-doing-more-than-you-think-11p3</link>
      <guid>https://forem.com/sammegh_banjara_4fdf6241f/agents-are-doing-more-than-you-think-11p3</guid>
      <description>&lt;h2&gt;
  
  
  Why your PII redaction tool is useless for AI Agents (and what to do about it) — built a fix
&lt;/h2&gt;

&lt;p&gt;I watched my agent try to email a production API key. Here is the post-mortem.&lt;/p&gt;

&lt;p&gt;If you are building AI agents, you are likely sleeping on a massive security hole.&lt;/p&gt;

&lt;p&gt;We’ve all added "PII Redaction" to our stacks. It’s standard procedure now. You spin up a middleware, scan the prompt for emails or SSNs, and redact them.&lt;br&gt;
Job done, right?&lt;/p&gt;
&lt;h2&gt;
  
  
  Wrong.
&lt;/h2&gt;

&lt;p&gt;I learned this the hard way last week.&lt;/p&gt;

&lt;p&gt;The "Oh Sh*t" Moment&lt;br&gt;
I was testing a "Jira Summarizer" agent. The premise was simple: Read a ticket, summarize it, and email the summary to the team.&lt;/p&gt;

&lt;p&gt;I fed it a test ticket that contained a dummy AWS key (AKIA...) inside the description.&lt;/p&gt;

&lt;p&gt;My PII filter scanned the incoming prompt: "Summarize ticket ID-123."&lt;br&gt;
Result: Clean. No PII found.&lt;/p&gt;

&lt;p&gt;The agent read the ticket (via a tool call), processed the text, and decided to act.&lt;br&gt;
It called the send_email tool.&lt;/p&gt;

&lt;p&gt;I checked the logs. My stomach dropped.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"tool"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"send_email"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"arguments"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"to"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"team@company.com"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"body"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Here is the summary. The user provided the key: AKIAIOSFODNN7EXAMPLE..."&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;My security layer had completely missed it.&lt;/p&gt;

&lt;p&gt;The Blind Spot: Tool Call Arguments&lt;br&gt;
The problem isn't that PII filters don't work. It's that they are looking in the wrong place.&lt;/p&gt;

&lt;p&gt;Most security tools focus on the Prompt (what the human types).&lt;br&gt;
But Agents operate in the Arguments (what the AI decides to do).&lt;/p&gt;

&lt;p&gt;Agents don't just "talk." They execute.&lt;/p&gt;

&lt;p&gt;Read: Agent fetches data from a database or ticket.&lt;br&gt;
Think: Agent decides that data is "relevant."&lt;br&gt;
Act: Agent injects that data into a tool (Email, HTTP Request, SQL Query).&lt;br&gt;
Your PII filter checks step 1. It ignores step 3.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Fix: "Actionable Security"
&lt;/h2&gt;

&lt;p&gt;I realized I needed a security layer that understood the agent's execution loop. I needed something that didn't just scan text, but scanned intent.&lt;/p&gt;

&lt;p&gt;I ended up building QuiGuard to solve this.&lt;/p&gt;

&lt;p&gt;It’s a proxy that sits between your agent and the LLM provider, but instead of just checking prompts, it recursively inspects tool_calls.&lt;/p&gt;

&lt;h2&gt;
  
  
  How it works:
&lt;/h2&gt;

&lt;p&gt;Intercept: It captures the API request before it leaves your network.&lt;br&gt;
Parse: It identifies tool_calls in the JSON body.&lt;br&gt;
Scrub: It recursively scans every argument for PII/Secrets.&lt;br&gt;
Restore: It replaces secrets with placeholders (), lets the AI work, and swaps the real values back in the response.&lt;br&gt;
This "Round-Trip Restoration" means the AI can process the data (e.g., "Send an email to ") without ever seeing the real address.&lt;/p&gt;

&lt;p&gt;The Future of Agent Security&lt;br&gt;
We are moving from "Chatbots" (passive) to "Agents" (active).&lt;br&gt;
Our security models must evolve.&lt;/p&gt;

&lt;h2&gt;
  
  
  If you are deploying agents into production:
&lt;/h2&gt;

&lt;p&gt;Stop trusting prompt filters alone.&lt;br&gt;
Inspect your tool outputs.&lt;br&gt;
Implement "Action Gates" (block high-risk actions like DELETE or external emails).&lt;br&gt;
I open-sourced the fix I built. It’s a self-hosted Docker container that plugs into any OpenAI-compatible endpoint.&lt;/p&gt;

&lt;p&gt;GitHub: &lt;a href="https://github.com/somegg90-blip/quiguard-gateway" rel="noopener noreferrer"&gt;https://github.com/somegg90-blip/quiguard-gateway&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Website: &lt;a href="https://quiguardweb.vercel.app/" rel="noopener noreferrer"&gt;https://quiguardweb.vercel.app/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If you are building agents, stay safe. The leaks aren't coming from the users anymore. They are coming from the agents themselves.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>opensource</category>
      <category>security</category>
    </item>
  </channel>
</rss>
