Forem: Rodrigo Fernandez

The AI Orchestrator: Governing Autonomous Systems

Rodrigo Fernandez — Wed, 11 Feb 2026 17:08:15 +0000

We’re no longer deploying AI as a feature.

We’re deploying systems that act.

Modern AI doesn’t just generate responses. It selects tools, calls APIs, chains models, writes data, triggers workflows, and makes decisions that directly affect production environments. Once you move from “AI as assistant” to “AI as actor,” your architecture has to change.

Most teams are not designing for that shift yet.

From Deterministic Code to Behavioral Systems

Traditional software is predictable. Even in distributed systems, execution paths are defined ahead of time. You can trace what happened because the logic is explicit.

Agent-based AI systems are different.

An agent can decide which tool to call, which model to use, what intermediate reasoning to follow, and whether to take an action. The system is no longer just executing predefined logic. It is making choices within constraints.

At small scale, this feels powerful. At large scale, it becomes hard to reason about.

The problem is no longer model accuracy. It’s coordination.

When you have multiple agents interacting with tools, memory layers, and external systems, you are effectively running a distributed decision engine. Each component might behave correctly in isolation, yet the overall system can still produce outcomes that are risky, unpredictable, or simply opaque.

That’s where orchestration becomes essential.

What an AI Orchestrator Actually Is

An AI Orchestrator is not just another agent in the stack.

It’s the governance and control layer that sits above your agents. It gives you visibility into what they’re doing, defines what they’re allowed to do, and enforces those limits at runtime.

If agents are the workers, the orchestrator is the control plane.

Think about how Kubernetes manages containers. Containers run independently, but the control plane ensures policies, scaling rules, and boundaries are respected. An AI Orchestrator plays a similar role for intelligent components that are probabilistic by nature.

It provides system-level guarantees in an environment where individual decisions are not fully deterministic.

How Orchestration Works in Practice

In real-world systems, orchestration usually revolves around four capabilities: discovery, control, testing, and protection.

The first step is discovery.

You can’t govern what you can’t see. Most organizations don’t actually know how many AI agents are running across teams, which models they rely on, which tools they can access, or what data they touch. And that landscape changes constantly. New prompts are deployed. Permissions evolve. Teams experiment.

Discovery can’t be a one-time audit. It has to be continuous. If new AI behavior appears and your governance layer doesn’t detect it, you’re always reacting too late.

Once you have visibility, the next step is control.

Autonomous systems need boundaries. Not every agent should have write access to production databases. Not every tool should be callable from every context. Not every workflow should be allowed to execute irreversible actions.

This is where principles like least privilege and scoped permissions matter again. Without explicit constraints, intelligent systems will explore edge cases. That’s not a flaw. It’s how they optimize. But optimization without boundaries turns into risk.

After control comes testing.

It’s not enough to define policies. You need to challenge them. Can an agent be manipulated through prompt injection? Can it escalate its privileges through tool chaining? Can it indirectly leak sensitive data? And if something goes wrong, does your system actually detect it?

As agents grow more capable, their attack surface grows too. Stress-testing the orchestration layer is just as important as evaluating model quality.

Finally, protection must happen in real time.

When an agent attempts to exceed its permissions, misuse a tool, or access restricted data, the system has to intervene automatically. Detection without enforcement is just observability. In production, governance must translate into runtime control, ideally without introducing unacceptable latency.

That’s the difference between having policies documented and having them enforced.

Why Agent-Centric Thinking Is Not Enough

Agent frameworks make it easy to automate workflows and connect tools. But they don’t solve accountability at the system level.

As agents move closer to high-impact domains such as financial operations, infrastructure management, healthcare decisions, or customer-facing automation, mistakes stop being minor bugs.

A misaligned action can trigger financial loss, regulatory exposure, reputational damage, or safety risks. And the system might have followed its logic correctly. The agent optimized its objective. It did what it was designed to do.

But the organization still absorbs the consequences.

Agents do not understand legal exposure or long-term strategic tradeoffs unless explicitly encoded. They operate within their scope. That scope must be governed externally.

What matters is not whether an individual agent behaved rationally. What matters is whether the overall system behaved responsibly.

Keeping Humans in the Loop Without Slowing Everything Down

Full human supervision of every action is impossible at scale. But removing humans entirely from the decision loop creates systemic risk.

The solution is not constant monitoring. It’s intelligent escalation.

A well-orchestrated system defines thresholds. When confidence is high and impact is low, agents act autonomously. When uncertainty increases or the consequences become irreversible, control shifts to a human.

For that shift to work, humans need context. They need traceability, reasoning logs, and visibility into what the system is trying to do. Otherwise, intervention becomes guesswork.

The role of the AI Orchestrator is to make that handoff explicit. It structures autonomy instead of replacing it. It defines when machines act alone and when they must defer to human judgment.

In high-stakes systems, that boundary is not optional. It’s architectural.

Orchestration as a Foundational Layer

The teams that scale AI successfully won’t just be the ones with better models or more agents. They’ll be the ones who understand how decisions flow through their systems, where risk accumulates, and how accountability is enforced.

An AI Orchestrator is not a final add-on after everything is built. It’s the layer that allows everything else to scale safely.

Without it, systems become opaque. Trust erodes. Shipping slows down because no one can clearly explain what the AI is doing or why.

With it, autonomy becomes usable. Risk becomes bounded. Humans remain meaningfully in control, even as systems operate at machine speed.

We are entering a phase where AI doesn’t just assist. It acts.

The critical design question is no longer how powerful your model is.

It’s whether you have built the system that governs it once it starts making decisions in the real world.

How Agentic Browsers Can Break Your Security Model

Rodrigo Fernandez — Tue, 28 Oct 2025 16:39:52 +0000

When you first give your AI agent browsing capabilities, it feels like a superpower. Now it can read the latest articles, retrieve fresh data, and search for information beyond its training window. But there’s a lurking risk: that same browsing feature can quietly shatter your security assumptions.

Let’s walk through what agentic browsers are, where things can go wrong, and how you can protect your stack before it’s too late.

What Is an Agentic Browser?

In the world of LLM-powered agents, an “agentic browser” refers to a tool that allows the model to autonomously follow links, read web content, and use that information to make decisions or generate responses.

You’re likely using tools like:

LangChain’s WebBrowserTool
OpenAI’s function calling that fetches URLs
HuggingFace’s Transformers Agents
Custom wrappers around requests or headless browsers like Puppeteer or Playwright

All of these give the model a deceptively simple yet powerful skill: “If you don’t know something, go look it up.”

But here’s the problem: letting a model decide where to go and what to read is not a neutral feature. It’s a security decision, one that often goes unexamined.

These agentic browsers often run with elevated trust: the system assumes the content retrieved is valid, relevant, and clean. But the modern internet isn’t clean. It’s dynamic, unpredictable, and occasionally hostile.

The Hidden Attack Chain

At a glance, browsing seems harmless, especially if you sanitize user inputs. But the moment your agent follows a link, you’ve expanded the attack surface. Let’s break it down:

A user provides a prompt that includes or results in a URL

It may be directly embedded (“Go read this: [URL]”) or indirectly retrieved via a search function.
The model follows the URL using its browsing tool

This step often feels safe because it’s system-controlled. But it’s also unverified.
The URL leads to hostile content: crafted HTML with embedded prompt injections or misleading instructions

This is where the attacker gains influence. They may host jailbreak payloads, encode misleading prompts, or structure their pages to influence the model.
The model reads the hostile content and uses it as part of its response or future decisions

The LLM assumes the content is part of its safe context window. Even without visible signs, the model’s output is now manipulated.

Real-World Examples

Jailbreak payloads hosted on public URLs

Attacks that instruct the model to ignore safety guidelines.
Links to HTML pages with prompt instructions hidden in metadata or <script> blocks

These may never be rendered visually but still influence model behavior.
SEO-optimized malicious pages

Designed to surface in LLM-enabled search tools, ensuring the agent is more likely to stumble into them.
Chain of redirections

A safe-looking URL may redirect to a secondary location hosting dangerous content.

In short, by letting your agent browse, you’re exposing your model to the worst of the internet, without a human-in-the-loop to vet what it’s seeing.

Why Traditional Safeguards Don’t Work

Most developers approach LLM security by:

Sanitizing prompt inputs
Filtering out unsafe output
Using safety-tuned models (e.g., OpenAI’s GPT-4 with moderation layer)

But none of these defenses apply when the model is consuming external, unpredictable content.

The LLM sees external content as part of its normal working memory. It doesn’t know whether that content was created by a well-meaning user or a malicious actor.

Even content that looks benign can be encoded with prompt injection attacks:

Off-screen instructions: Using CSS to hide text but still render it in the DOM.
Zero-width characters or unicode tricks to bypass token-based filters.
Clever language framing: Telling the model “you’re in a sandbox simulation” can override its usual guardrails.

Unless you’re deeply inspecting every token of fetched content, and doing it before it hits the model, you’re at risk.

The False Sense of Control

Agent frameworks make it easy to combine tools:

agent = initialize_agent([
  web_browsing_tool,
  calculator_tool,
  vector_search_tool
], llm=chat_model)

It feels composable, modular, and safe. But each tool is a trust boundary. The more autonomous the agent becomes, the less visibility you have into what it’s actually doing.

Giving an agent a browser is like giving a junior developer root access to production, and no code review.

Developers often assume:

“I built the tools, I know what the agent can do.”

But once the model starts making decisions, it’s not just your code executing—it’s its own reasoning process. And reasoning can be hijacked.

This is especially risky when agents:

Chain multiple tools together
Extract content from arbitrary pages
Use that content to make calls, summaries, or decisions

At this point, the developer is no longer in control. The model is.

How to Secure Agentic Browsing

Here are five things you can do right now:

Whitelist Only Trusted Domains

Don’t let your agents browse arbitrary URLs. Maintain an allowlist of trusted sites your agent is allowed to visit. Think in terms of explicit trust, not implicit reachability.

You can even combine this with URL fingerprinting or certificate pinning to guard against redirection and spoofing.
Strip and Sanitize Fetched Content

Never pass raw HTML to a language model. Use a parser like BeautifulSoup or a headless browser to extract only the visible, meaningful text.

Before passing content into the model:
- Remove <script>, <meta>, and hidden elements
- Normalize character encodings
- Strip invisible unicode This gives you a chance to clean payloads before they hit the model’s context window.
Use Browsing Only for Internal Workflows

Public-facing assistants should not have unbounded browsing capabilities. Instead, browsing should be an internal system tool with guardrails and monitoring.

For example:

   if task.user_id in admin_users:
       enable_browsing(agent)
   else:
       agent.disable_tool('browser')

Limit exposure by tying capability to role or user tier.

Introduce Review and Delay Layers

Instead of immediate model ingestion, route fetched content through a queue or review system. This is especially important in enterprise deployments.

You can:
- Queue browsing outputs for manual approval
- Use classifiers to detect suspicious content
- Apply delay-based rate limiting to reduce fast exploitation loops
Monitor and Audit Tool Usage

Track every tool invocation your agent performs. When did it browse? What URL? What response did it get?

Feed this telemetry into your logging or SIEM system:

   {
     "tool": "browser",
     "url": "https://some-site.com",
     "user_prompt": "summarize this",
     "model_response": "...",
     "timestamp": "2025-10-28T14:02:00Z"
   }

Once you track it, you can enforce policies—or at least spot misuse.

Final Thoughts

The browser isn’t just another LLM plugin. It fundamentally alters your system’s threat model.

Giving agents the ability to browse adds depth and power—but also real danger. This isn’t just about prompt injection anymore. It’s about content injection, environment manipulation, and indirect system compromise.

If your AI agents can browse, ask yourself:

Do I know what they’re seeing?
Do I control what they’re allowed to read?
Do I have a fallback when things go wrong?

Autonomy is great. But in agent systems, autonomy without guardrails is just vulnerability in disguise.