How I built a "Gatekeeper" for AI Agents (And why prompt filtering isn't enough)

Sammegh Banjara — Sun, 29 Mar 2026 12:41:47 +0000

We spend a lot of time securing the inputs to our LLMs—filtering prompts, checking for injections.

But in the world of AI Agents, we have a new blind spot: Tool Outputs.

When an agent calls get_jira_ticket, the response often contains a dump of raw text. In my case, that text contained user emails and internal secrets.

If I logged that context window to an observability tool, I was essentially persisting secrets in a dashboard.

So, I built QuiGuard to solve this. Here is how it works under the hood.

The Architecture
I didn't want to rewrite the agent frameworks (LangChain/AutoGen). I needed something that sat transparently in the middle.

The solution was a Reverse Proxy.

Interception: The proxy accepts the OpenAI-compatible API request.
Traversal: It recursively walks through the messages array.
The Gatekeeper Logic: If it sees a message with role: "tool", it knows this is data coming back from an API.
The Challenge: Recursive JSON
Tool responses aren't always clean strings. Sometimes they are stringified JSON inside JSON.

To handle this, I wrote a recursive scrubber:

def _recursive_scrub(data):
    if isinstance(data, dict):
        return {k: _recursive_scrub(v) for k, v in data.items()}
    elif isinstance(data, list):
        return [_recursive_scrub(item) for item in data]
    elif isinstance(data, str):
        # It's a string. Is it stringified JSON? Try to parse.
        try:
            nested_data = json.loads(data)
            scrubbed_nested = _recursive_scrub(nested_data)
            return json.dumps(scrubbed_nested)
        except json.JSONDecodeError:
            # Not JSON, just a normal string. Scrub PII.
            return sanitize_text(data)
    else:
        return data

This ensures that even if a tool returns {"body": "{\"user\": \"secret@...\"}"}, we catch the secret.

The Result
Clean Logs: My LangSmith traces now show instead of real emails.
Safe Context: The LLM processes the logic without "seeing" the sensitive data.
Restoration: The user sees the real data in the final reply.
I open-sourced the project (MIT).

Repo: https://github.com/somegg90-blip/quiguard-gateway

Curious if others have run into the "messy tool output" problem? Let me know in the comments!

AI Agents are doing more than you think.

Sammegh Banjara — Wed, 25 Mar 2026 08:01:44 +0000

Why your PII redaction tool is useless for AI Agents (and what to do about it) — built a fix

I watched my agent try to email a production API key. Here is the post-mortem.

If you are building AI agents, you are likely sleeping on a massive security hole.

We’ve all added "PII Redaction" to our stacks. It’s standard procedure now. You spin up a middleware, scan the prompt for emails or SSNs, and redact them.
Job done, right?

Wrong.

I learned this the hard way last week.

The "Oh Sh*t" Moment
I was testing a "Jira Summarizer" agent. The premise was simple: Read a ticket, summarize it, and email the summary to the team.

I fed it a test ticket that contained a dummy AWS key (AKIA...) inside the description.

My PII filter scanned the incoming prompt: "Summarize ticket ID-123."
Result: Clean. No PII found.

The agent read the ticket (via a tool call), processed the text, and decided to act.
It called the send_email tool.

I checked the logs. My stomach dropped.

{
  "tool": "send_email",
  "arguments": {
    "to": "team@company.com",
    "body": "Here is the summary. The user provided the key: AKIAIOSFODNN7EXAMPLE..."
  }
}

My security layer had completely missed it.

The Blind Spot: Tool Call Arguments
The problem isn't that PII filters don't work. It's that they are looking in the wrong place.

Most security tools focus on the Prompt (what the human types).
But Agents operate in the Arguments (what the AI decides to do).

Agents don't just "talk." They execute.

Read: Agent fetches data from a database or ticket.
Think: Agent decides that data is "relevant."
Act: Agent injects that data into a tool (Email, HTTP Request, SQL Query).
Your PII filter checks step 1. It ignores step 3.

The Fix: "Actionable Security"

I realized I needed a security layer that understood the agent's execution loop. I needed something that didn't just scan text, but scanned intent.

I ended up building QuiGuard to solve this.

It’s a proxy that sits between your agent and the LLM provider, but instead of just checking prompts, it recursively inspects tool_calls.

How it works:

Intercept: It captures the API request before it leaves your network.
Parse: It identifies tool_calls in the JSON body.
Scrub: It recursively scans every argument for PII/Secrets.
Restore: It replaces secrets with placeholders (), lets the AI work, and swaps the real values back in the response.
This "Round-Trip Restoration" means the AI can process the data (e.g., "Send an email to ") without ever seeing the real address.

The Future of Agent Security
We are moving from "Chatbots" (passive) to "Agents" (active).
Our security models must evolve.

If you are deploying agents into production:

Stop trusting prompt filters alone.
Inspect your tool outputs.
Implement "Action Gates" (block high-risk actions like DELETE or external emails).
I open-sourced the fix I built. It’s a self-hosted Docker container that plugs into any OpenAI-compatible endpoint.

GitHub: https://github.com/somegg90-blip/quiguard-gateway

Website: https://quiguardweb.vercel.app/

If you are building agents, stay safe. The leaks aren't coming from the users anymore. They are coming from the agents themselves.