Forem: ZiLing

Why Execution Boundaries Matter More Than AI Guardrails

ZiLing — Wed, 14 Jan 2026 18:18:39 +0000

Why Execution Boundaries Matter More Than AI Guardrails

Probabilistic Prompts vs. Deterministic Runtime Safety

The problem isn’t that AI models are “careless”

Over the past year, we’ve seen rapid improvements in AI guardrails built directly into models — better refusals, safer completions, and increasingly aggressive alignment tuning.

And yet, something still feels fundamentally off.

When an AI agent is allowed to read files, make network requests, or spawn processes, we are no longer dealing with a purely conversational system.

We are dealing with code execution.

At that point, the question is no longer:

“Will the model behave responsibly?”

The real question becomes:

Where does responsibility actually live?

Guardrails inside the model are probabilistic by design

Model-level guardrails operate on probabilities.

They rely on:

pattern recognition,
learned safety heuristics,
statistical correlations between inputs and “safe” outputs.

This works reasonably well for tasks like text generation or summarization.

But probabilistic systems have an unavoidable property:

They can never guarantee correctness on a single execution.

“Most of the time” is not good enough when:

a wrong file path deletes data,
a misinterpreted URL triggers SSRF,
a subtle prompt variation bypasses a refusal.

You can prompt better.

You can fine-tune more.

You can stack system messages.

But in the end, you are still asking a probabilistic system to police itself.

Execution changes everything

The moment an agent can act, not just respond, the safety model must change.

Execution has characteristics that language does not:

it is stateful,
it has side effects,
it is often irreversible.

Once a process is spawned or a file is deleted, there is no “retry with a better prompt”.

This is where the concept of an execution boundary becomes critical.

An execution boundary is the point where:

intent becomes action,
language becomes effect,
probability must give way to determinism.

Deterministic safety belongs at the execution boundary

Execution boundaries are enforced by code, not by intent.

They answer binary questions:

Is this file path allowed?
Is this network address private or public?
Is this process permitted under the current policy?

These checks are:

explicit,
repeatable,
and free of ambiguity.

This is not about distrusting AI models.

It is about placing guarantees where guarantees are actually possible.

What a deterministic boundary looks like

Here is a simplified, conceptual example of what a deterministic execution boundary might look like in practice.

This example is not about how the model reasons — it’s about what the runtime enforces:

// A deterministic policy does not "think" — it enforces.
{
  "policy": "enforce",
  "rules": [
    {
      "id": "fs_write_limit",
      "type": "filesystem",
      "action": "allow",
      "pattern": "/app/data/temp/*"
    },
    {
      "id": "block_sensitive_paths",
      "type": "filesystem",
      "action": "deny",
      "pattern": ["/etc/*", "/usr/bin/*"]
    }
  ]
}

A model cannot reliably allow access to

/app/data/temp/file.txt

while blocking

/etc/passwd

100% of the time via prompts alone.

A runtime execution boundary can.

Why “fail-fast” matters more than “self-correct”

A common argument is that agents can detect and fix their own mistakes.

In practice, this breaks down quickly:

the agent may not realize it crossed a boundary,
the context explaining the violation may be lost,
retries may amplify damage instead of preventing it.

Fail-fast systems behave differently:

unsafe actions are rejected immediately,
no partial side effects occur,
the system state remains consistent.

This is not an AI-specific idea.

We don’t let databases “try their best” to enforce constraints.

We don’t let operating systems “probably” respect permissions.

Agent runtimes should not be an exception.

Auditability is not optional

When something goes wrong, you need clear answers:

What was attempted?
Why was it blocked?
Which rule triggered the decision?

Probabilistic refusals are hard to audit.

They often explain what was refused, but not why at a system level.

Deterministic execution boundaries produce artifacts:

traces,
decision logs,
rule evaluations.

These artifacts matter for:

debugging,
compliance,
incident response.

If an agent operates in a real environment, its actions must be explainable after the fact, not just “well-intended” at runtime.

Closing thoughts

As AI agents gain more autonomy, the cost of a single mistake increases.

At that scale, safety cannot live entirely inside the model.

It must live at the execution boundary:

enforced by deterministic code,
observable through audit logs,
designed to fail fast rather than recover late.

This is not a philosophical position.

It is a systems engineering one.

And systems tend to punish us quickly when we ignore their boundaries.

Epilogue

This line of thinking is what led me to build
FailCore —
an open-source, fail-fast execution boundary for AI agents.

The project is still evolving, but its core goal is simple:

Make unsafe actions impossible to execute, regardless of how they are generated.

I Thought It Was Refactoring My Code. It Actually Wiped It Out.

ZiLing — Wed, 31 Dec 2025 13:34:51 +0000

3 Months of Code, Gone in 5 Seconds

I’m still a bit shaky as I type this.

A few weeks ago, I was using an LLM-based automation to refactor a project’s directory structure.

The goal was simple: clean things up, reorganize a few core modules — nothing risky.

During the planning stage, everything looked perfect.

Clear reasoning. Careful steps. It even reassured me:

“For safety, I will scan the directories first.”

I let it run in the background and went to do something else.

When I came back, my code was gone.

Not moved.

Not misplaced.

Physically deleted.

Because of a subtle path hallucination, the model interpreted my project root as a temporary directory.

There was no warning. No error. Nothing suspicious before execution.

In about 5 seconds, it “optimized” 3 months of my work into a blank screen.

That was the moment I realized:

the word “refactor” in the title was doing a lot of lying.

Why Prompt Engineering Isn’t Enough

This accident taught me a hard lesson:

AI failures don’t usually happen during “thinking” —

they happen during “doing.”

We spend an enormous amount of time designing prompt guardrails, trying to convince models to behave safely.

But in practice:

Hallucinations are inevitable

A model can promise safety in text, then hallucinate a destructive path at the exact millisecond it generates a tool call.

Execution is irreversible

Once an AI has filesystem or network access, every action produces real-world side effects.

There is no “undo” button.

Running AI automation without execution-time protection is basically

barefoot running on broken glass.

FailCore: Not a Framework, Just a Safety Belt

I didn’t want to build another heavy framework.

FailCore exists for one reason:

that incident made it obvious what I was missing.

After the failure, I realized I needed three very concrete things.

1. Execution-Time Interception

That path hallucination made one thing clear:

safety checks can’t stop at the prompt layer.

FailCore hooks into tool calls at the Python runtime level.

If an automated process tries to touch an unauthorized directory or a dangerous network target

(for example, an internal IP that could trigger SSRF),

the circuit is broken before the side effect happens.

2. A “Black Box” Audit Trail

During those 5 seconds, I had no idea what the system was actually doing.

So I needed evidence.

FailCore turns raw execution traces into an HTML audit report, showing:

when an action happened
what parameters were used
which resource was targeted
and why it was allowed or blocked

This was the first time I could actually see what the AI did, step by step.

3. Deterministic Replay

I didn’t want to burn tokens or risk my environment just to reproduce a failure.

With FailCore, you can take a recorded execution trace and replay it locally —

without re-running dangerous operations —

to pinpoint exactly where the logic went wrong.

Opening the Black Box

Below is a prototype of the HTML audit report generated by FailCore:

This isn’t just about preventing accidents.

It’s about observability.

For developers: debugging non-deterministic failures with 100% replay accuracy
For teams: maintaining an auditable trail of automated actions
For AI systems: operating within explicit, enforceable boundaries

Final Thoughts

AI is incredibly good at writing code.

But we shouldn’t let it be the judge, jury, and executioner of our local file systems.

FailCore is still a work in progress, but it’s what allows me to keep running AI automation on my own machine without fear.

If you’re letting AI touch the real world,

execution safety deserves its own layer.

👉 GitHub: https://github.com/Zi-Ling/failcore

If there’s interest, I can write a follow-up post explaining how the Python runtime hooks actually work.

I Monkey-Patched Python to Stop AI Agents from Accessing Private Networks

ZiLing — Thu, 25 Dec 2025 11:31:20 +0000

Most AI agent failures aren’t caused by bad plans.

They’re caused by unsafe execution.

After building and debugging multiple agent systems, I kept running into the same problems:

Tools being called with unexpected arguments
Network or filesystem side effects happening too early
Agents “succeeding” while silently doing the wrong thing
Failures that were impossible to reproduce after the fact

So I built FailCore — a small execution-time safety runtime for AI agents.

What is FailCore?

FailCore is not an agent framework.
It doesn’t plan, reason, or store memory.

Instead, it focuses on one thing:

Enforcing safety at the Python execution boundary.

Rather than relying on better prompts or smarter planning, FailCore intercepts tool execution before side effects happen.

This allows it to:

Block unsafe filesystem access
Prevent private network / SSRF-style calls
Validate tool inputs and outputs
Record deterministic execution traces for replay and audit

Why execution-time safety?

Most agent systems try to solve safety upstream:

Better prompts
More constraints
More planning logic

In practice, that’s brittle.

Execution is where real damage happens — file writes, HTTP calls, system commands.
Once those occur, it’s already too late.

FailCore takes a different approach:

Assume plans can be wrong.

Make execution boring, strict, and observable.

A quick demo

Below is a short demo showing FailCore blocking a real tool-use attempt before any side effect occurs.

The agent believes the call succeeded.
The system never lets the unsafe action run.

This is hard to achieve with prompt-level constraints alone,
because the side effect is already triggered by the time the model is wrong.

Show me the code

Instead of wrapping every tool manually, FailCore lets you define a secure Session.

from failcore import (
    Session,
    presets,
    ToolMetadata,
    RiskLevel,
    SideEffect,
    DefaultPolicy,
    SecurityError,
)

# 1. Initialize a secure session
# We enforce a strict policy: No private IPs, No local file access
session = Session(validator=presets.net_safe(strict=True))

# 2. Register a tool with explicit risk metadata
# This tells FailCore: "This function touches the network, watch it closely."
session.register(
    "http_get",
    http_get, # assuming this is your wrapper function
    metadata=ToolMetadata(
        risk_level=RiskLevel.HIGH,
        side_effect=SideEffect.NETWORK,
        default_policy=DefaultPolicy.BLOCK
    )
)

# 3. Scenario: The Agent tries an SSRF Attack
# Target: AWS Metadata Endpoint (169.254.169.254)
try:
    session.call("http_get", url="http://169.254.169.254/latest/meta-data/")
except SecurityError as e:
    # 🛡️ FailCore intercepts the call BEFORE the socket opens
    print(f"Attack Neutralized: {e}") 
    # Output: "SecurityError: Access to private IP range 169.254.0.0/16 is blocked."

# 4. Scenario: Legitimate Traffic
# Target: Public Internet
result = session.call("http_get", url="https://api.github.com")
print("Success:", result.status)

How it works (high level)

FailCore hooks into the Python execution layer and wraps tool calls with:

Pre-execution validation
Policy-based permission checks
Side-effect interception
Structured trace recording

The trace format is deterministic and replayable, which makes it possible to:

Debug agent failures after the fact
Audit what would have happened
Re-run executions without re-triggering side effects

Design details are documented here:
👉 https://github.com/zi-ling/failcore/blob/main/DESIGN.md

What FailCore is not

To set expectations clearly:

❌ Not a sandbox
❌ Not a VM or container
❌ Not a replacement for OS-level security
❌ Not an agent framework

It’s a small, composable execution safety layer that can sit underneath existing agent stacks.

Why open source?

I’m sharing this because execution-time safety feels like a missing layer in many agent systems.

If you’ve ever dealt with:

Non-reproducible agent bugs
“It worked yesterday” failures
Unsafe tool calls slipping through

You might find this useful.
If not today, maybe later.

Source code

GitHub:

👉 https://github.com/zi-ling/failcore

If you’ve run into similar execution-layer issues in agent systems, I’d love to hear how you handled them.
If this is a problem you’ve run into before,
you might want to star the repo and come back to it later.