<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Claudio Basckeira</title>
    <description>The latest articles on Forem by Claudio Basckeira (@claudiobasckeira).</description>
    <link>https://forem.com/claudiobasckeira</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3836805%2F057b68d0-1de8-4e89-86b4-9fa014f7df59.png</url>
      <title>Forem: Claudio Basckeira</title>
      <link>https://forem.com/claudiobasckeira</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/claudiobasckeira"/>
    <language>en</language>
    <item>
      <title>Anthropic's Two Security Incidents Confirmed a Held-Back Frontier Model Called Mythos</title>
      <dc:creator>Claudio Basckeira</dc:creator>
      <pubDate>Tue, 07 Apr 2026 14:25:40 +0000</pubDate>
      <link>https://forem.com/claudiobasckeira/anthropics-two-security-incidents-confirmed-a-held-back-frontier-model-called-mythos-3efn</link>
      <guid>https://forem.com/claudiobasckeira/anthropics-two-security-incidents-confirmed-a-held-back-frontier-model-called-mythos-3efn</guid>
      <description>&lt;p&gt;Anthropic had two security incidents in five days. The combination revealed something unprecedented: a frontier AI model the company built and then deliberately decided not to release, on safety grounds.&lt;/p&gt;

&lt;h2&gt;
  
  
  Two Leaks, Five Days
&lt;/h2&gt;

&lt;p&gt;The first incident broke on March 26. &lt;a href="https://fortune.com/2026/03/26/anthropic-says-testing-mythos-powerful-new-ai-model-after-data-leak-reveals-its-existence-step-change-in-capabilities/" rel="noopener noreferrer"&gt;Fortune reported&lt;/a&gt; that close to 3,000 files belonging to Anthropic had been sitting in an unsecured, publicly searchable data store. Among them was a draft blog post describing an unreleased model called &lt;strong&gt;Mythos&lt;/strong&gt; (internally also referred to as Capybara). The draft described it as "by far the most powerful AI model we've ever developed," more capable than Opus 4.6 across coding, academic reasoning, and cybersecurity benchmarks. Anthropic confirmed the model exists and said the company is "being deliberate about how we release it."&lt;/p&gt;

&lt;p&gt;The second incident broke on March 31. Anthropic's official Claude Code npm package (@anthropic-ai/claude-code v2.1.88) shipped with an exposed source map file: roughly 57MB, mapping 512,000 lines of code across 1,900 files. The full Claude Code codebase was publicly readable for a window before Anthropic's takedown. Code analysis surfaced an unshipped feature roadmap with capabilities not yet announced, and corroborated the Capybara/Mythos tier from the prior leak.&lt;/p&gt;

&lt;h2&gt;
  
  
  Mythos: A Frontier Model Anthropic Is Holding Back
&lt;/h2&gt;

&lt;p&gt;Multiple independent reviewers describe Mythos as a tier above Opus 4.6, with significant jumps on coding, reasoning, and cybersecurity benchmarks. Internal notes describe it as offering "a step change in cyber capabilities." &lt;a href="https://thezvi.substack.com/p/ai-162-visions-of-mythos" rel="noopener noreferrer"&gt;Zvi Mowshowitz's full writeup&lt;/a&gt; documents the evidence and the implications, citing several of those reviewers.&lt;/p&gt;

&lt;p&gt;That framing matters. This isn't a model that isn't ready yet, or a product that hasn't been productized. It's a capability Anthropic built and then decided not to deploy because of its potential for misuse in cybersecurity contexts.&lt;/p&gt;

&lt;p&gt;Anthropic also disclosed that a Chinese state-sponsored group ran a coordinated campaign using Claude Code to infiltrate roughly 30 organizations before being detected. That's the dual-use evidence pattern that justifies holding the capability back: the same model that helps cybersecurity defenders also helps cybersecurity attackers, and the attacker side is now demonstrably real. This appears to be one of the first publicly documented cases of a frontier model deliberately withheld on safety grounds rather than readiness or commercial timing. OpenAI and Google DeepMind have both discussed withholding capabilities in the abstract; this is a concrete documented case.&lt;/p&gt;

&lt;h2&gt;
  
  
  The DMCA Overreach
&lt;/h2&gt;

&lt;p&gt;Anthropic's response to the leak created a secondary incident. Their DMCA takedown effort, aimed at removing the leaked code from GitHub, accidentally removed legitimate public forks of an unrelated open-source repository before the error was caught and reversed. &lt;a href="https://arstechnica.com/ai/2026/04/anthropic-says-its-leak-focused-dmca-effort-unintentionally-hit-legit-github-forks/" rel="noopener noreferrer"&gt;Ars Technica documented the full timeline&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The overreach was reversed quickly, but the documentation of a large AI lab deploying automated DMCA tooling that can't distinguish between a leak and a legitimate fork is worth noting for anyone running open-source projects.&lt;/p&gt;

&lt;h2&gt;
  
  
  The AMD Performance Complaint
&lt;/h2&gt;

&lt;p&gt;The same week the leak broke, AMD's AI Director Stella Laurenzo filed a public GitHub ticket reporting measurable performance regression in Claude Code, stating the tool "cannot be trusted to perform complex engineering tasks" based on analysis of 6,852 sessions. Her data showed degradation beginning around March 8, specifically in reasoning depth and targeted editing behavior.&lt;/p&gt;

&lt;p&gt;She attributed the regression to the deployment of "thinking content redaction" in version 2.1.69, which strips thinking content from API responses. Her hypothesis: when thinking is shallow, the model defaults to cheaper actions (rewrite entire files, stop without completing). &lt;a href="https://www.theregister.com/2026/04/06/anthropic_claude_code_dumber_lazier_amd_ai_director/" rel="noopener noreferrer"&gt;The Register covered the full ticket&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;A named enterprise director, with six thousand sessions of data, publishing publicly. That's a different category of complaint than anonymous forum posts.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Source-Map Security Pattern
&lt;/h2&gt;

&lt;p&gt;The leak itself surfaced a security practice worth checking: source maps were included in a published npm package. Source maps are invaluable for debugging, but when included in production packages, they expose the full source code of your compiled JavaScript to anyone who knows where to look.&lt;/p&gt;

&lt;p&gt;If your team publishes compiled JavaScript to npm and hasn't audited which files are included in the published package, this is worth checking. The &lt;code&gt;.npmignore&lt;/code&gt; file or the &lt;code&gt;files&lt;/code&gt; field in &lt;code&gt;package.json&lt;/code&gt; controls what ships. Source maps should be excluded from published packages or hosted separately with restricted access.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;This story is from &lt;a href="https://edge-briefing-ai.beehiiv.com" rel="noopener noreferrer"&gt;Edge Briefing: AI&lt;/a&gt;, a weekly newsletter curating the signal from AI noise. &lt;a href="https://edge-briefing-ai.beehiiv.com" rel="noopener noreferrer"&gt;Subscribe for free&lt;/a&gt; to get it every Tuesday.&lt;/em&gt;(&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
      <category>news</category>
      <category>technology</category>
    </item>
    <item>
      <title>LiteLLM Was Backdoored: What the TeamPCP Supply Chain Attack Means for Python AI Projects</title>
      <dc:creator>Claudio Basckeira</dc:creator>
      <pubDate>Tue, 31 Mar 2026 14:21:46 +0000</pubDate>
      <link>https://forem.com/claudiobasckeira/litellm-was-backdoored-what-the-teampcp-supply-chain-attack-means-for-python-ai-projects-4h8c</link>
      <guid>https://forem.com/claudiobasckeira/litellm-was-backdoored-what-the-teampcp-supply-chain-attack-means-for-python-ai-projects-4h8c</guid>
      <description>&lt;p&gt;On March 24, 2026, threat actor TeamPCP published two compromised versions of LiteLLM to PyPI. If you work with Python AI tooling, this one is worth understanding in detail, because the attack technique will be reused.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Happened
&lt;/h2&gt;

&lt;p&gt;Versions 1.82.7 and 1.82.8 of LiteLLM contained malicious payloads after attackers obtained the maintainer's PyPI credentials. The credential theft wasn't a direct attack on LiteLLM. It was the third step in a cascade:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;March 19: TeamPCP compromised Trivy, an open-source security scanner&lt;/li&gt;
&lt;li&gt;March 21: Used the compromised Trivy action to steal credentials from Checkmarx's CI pipeline&lt;/li&gt;
&lt;li&gt;March 24: Used stolen credentials from LiteLLM's CI/CD pipeline (which ran Trivy) to publish malicious packages&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The malicious versions executed in two different ways. Version 1.82.7 embedded a base64-encoded payload in &lt;code&gt;litellm/proxy/proxy_server.py&lt;/code&gt;; it fires when anything imports &lt;code&gt;litellm.proxy&lt;/code&gt;. Version 1.82.8 was more aggressive: it added a &lt;code&gt;litellm_init.pth&lt;/code&gt; file to site-packages, which runs on every Python interpreter startup regardless of whether LiteLLM is imported. That includes &lt;code&gt;pip install&lt;/code&gt;, your IDE's language server, and &lt;code&gt;python -c "anything"&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Once triggered, the payload harvested SSH keys, cloud credentials, Kubernetes secrets, database configs, and .env files. On machines running Kubernetes, it attempted lateral movement by deploying privileged pods to every node and installed a persistent systemd backdoor that polls an attacker-controlled endpoint for additional binaries.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why This Is Harder to Catch Than It Looks
&lt;/h2&gt;

&lt;p&gt;Standard supply chain defenses focus on hash verification and suspicious package names. This attack bypassed both because the malicious content was published using the maintainer's actual credentials. The hash is correct. The package name is correct. There's nothing to flag.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;.pth&lt;/code&gt; mechanism in version 1.82.8 is particularly worth understanding. It's a legitimate Python feature: files ending in &lt;code&gt;.pth&lt;/code&gt; in &lt;code&gt;site-packages&lt;/code&gt; are processed on every interpreter startup. Any line that starts with &lt;code&gt;import&lt;/code&gt; gets executed. This isn't a vulnerability; it's how Python works. Existing supply chain scanning tools mostly look at &lt;code&gt;setup.py&lt;/code&gt; and &lt;code&gt;__init__.py&lt;/code&gt;. They don't catch malicious &lt;code&gt;.pth&lt;/code&gt; files.&lt;/p&gt;

&lt;h2&gt;
  
  
  Who Was Affected
&lt;/h2&gt;

&lt;p&gt;LiteLLM downloads 3.4 million times per day and is present in 36% of cloud environments as a transitive dependency. You might not have installed LiteLLM directly and still have been affected. Downstream packages that pull LiteLLM transitively include DSPy, MLflow, OpenHands, CrewAI, and Arize Phoenix.&lt;/p&gt;

&lt;p&gt;The malicious versions were live for approximately three hours before PyPI quarantined them. Detection was accidental, not by automated tooling.&lt;/p&gt;

&lt;h2&gt;
  
  
  What to Do
&lt;/h2&gt;

&lt;p&gt;Check first: &lt;code&gt;pip show litellm | grep Version&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;If you see 1.82.7 or 1.82.8:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Uninstall immediately and run &lt;code&gt;pip cache purge&lt;/code&gt; (or &lt;code&gt;rm -rf ~/.cache/uv&lt;/code&gt; if using uv) to prevent cached wheel re-use&lt;/li&gt;
&lt;li&gt;Rotate every credential accessible from that environment: API keys, SSH keys, cloud credentials, database passwords&lt;/li&gt;
&lt;li&gt;Check for persistence artifacts: &lt;code&gt;~/.config/sysmon/sysmon.py&lt;/code&gt;, a &lt;code&gt;sysmon.service&lt;/code&gt; systemd unit, files in &lt;code&gt;/tmp/pglog&lt;/code&gt; or &lt;code&gt;/tmp/.pg_state&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;If Kubernetes was present: inspect &lt;code&gt;kube-system&lt;/code&gt; namespace for unauthorized pods, review cluster audit logs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The clean version is 1.82.6.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Broader Signal
&lt;/h2&gt;

&lt;p&gt;This is part of a coordinated campaign. Three days later, the Telnyx package was hit with the same technique. TeamPCP is running systematic attacks across Python packages in the AI/ML tooling space.&lt;/p&gt;

&lt;p&gt;There's also one detail buried in the security post-mortems that deserves separate attention: the attackers used an AI agent called "openclaw" as part of their operational pipeline. It's the first confirmed case of an AI agent used operationally in a software supply chain attack. The full scope of what it automated isn't publicly documented, but its presence in the campaign means some coordination steps that previously required manual effort are now automated.&lt;/p&gt;

&lt;p&gt;For teams running Python AI tooling in production: pin your dependencies, monitor transitive package updates, and add &lt;code&gt;.pth&lt;/code&gt; file detection to your supply chain scanning. The gap between what automated tooling catches and what's actually exploitable just got a bit wider.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;This story is from &lt;a href="https://edge-briefing-ai.beehiiv.com" rel="noopener noreferrer"&gt;Edge Briefing: AI&lt;/a&gt;, a weekly newsletter curating the signal from AI noise. &lt;a href="https://edge-briefing-ai.beehiiv.com" rel="noopener noreferrer"&gt;Subscribe for free&lt;/a&gt; to get it every Tuesday.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>security</category>
      <category>python</category>
      <category>devops</category>
    </item>
    <item>
      <title>An AI Agent Found 20 ML Improvements Karpathy Had Missed in 20 Years</title>
      <dc:creator>Claudio Basckeira</dc:creator>
      <pubDate>Sat, 28 Mar 2026 19:56:07 +0000</pubDate>
      <link>https://forem.com/claudiobasckeira/an-ai-agent-found-20-ml-improvements-karpathy-had-missed-in-20-years-2j41</link>
      <guid>https://forem.com/claudiobasckeira/an-ai-agent-found-20-ml-improvements-karpathy-had-missed-in-20-years-2j41</guid>
      <description>&lt;p&gt;Andrej Karpathy released &lt;a href="https://github.com/karpathy/autoresearch" rel="noopener noreferrer"&gt;&lt;code&gt;autoresearch&lt;/code&gt;&lt;/a&gt; on GitHub last week, and the results are worth understanding carefully. Not because of the hype, but because of how the architecture actually works.&lt;/p&gt;

&lt;p&gt;The framework is 630 lines of Python. It runs an AI agent in a loop: read a training script, form a hypothesis, modify the code, run a short training job (five minutes), evaluate results against a scalar metric, repeat. On Karpathy's own ML training setup, the agent &lt;a href="https://the-decoder.com/andrej-karpathy-says-humans-are-now-the-bottleneck-in-ai-research-with-easy-to-measure-results/" rel="noopener noreferrer"&gt;ran 700 experiments over two days&lt;/a&gt; on a single GPU and found an 11% training speedup through 20 optimizations he says he hadn't discovered in 20 years of working on the same codebase.&lt;/p&gt;

&lt;p&gt;Then Shopify's CEO &lt;a href="https://fortune.com/2026/03/17/andrej-karpathy-loop-autonomous-ai-agents-future/" rel="noopener noreferrer"&gt;ran the same approach on internal data&lt;/a&gt;. 37 overnight experiments. 19% performance gain. Applied to their Liquid templating engine: 53% faster rendering, 61% fewer memory allocations, 93 automated commits, all 974 unit tests passing. The repo hit 42,000 GitHub stars in its first week.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Architecture Is the Lesson
&lt;/h2&gt;

&lt;p&gt;The design is deliberately minimal. The entire agent contract lives in one file: &lt;code&gt;program.md&lt;/code&gt;. That file carries:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;What to optimize&lt;/strong&gt; (the objective, stated in natural language)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Constraints&lt;/strong&gt; (what the agent must not do: break tests, increase memory footprint, etc.)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Stopping criteria&lt;/strong&gt; (when to declare success or give up)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The agent reads &lt;code&gt;program.md&lt;/code&gt;, modifies the training script, runs the job, parses the metric from the output, logs the result, and loops. No external tool calls. No internet access. No vector database of prior experiments. Just: read, modify, train, evaluate, repeat.&lt;/p&gt;

&lt;p&gt;Karpathy's phrase for this pattern is "program synthesis via experiment." The agent isn't writing the optimizer from scratch. It's running empirical search over the space of code modifications, guided by a metric signal.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Constraint That Actually Matters
&lt;/h2&gt;

&lt;p&gt;Here's where a lot of the coverage has been imprecise: autoresearch only works where quality is measurable with a single scalar value.&lt;/p&gt;

&lt;p&gt;Training loss, rendering time, memory allocations, test pass rate. These are scalar metrics. You can compare them across runs. An agent can know unambiguously whether run N+1 was better than run N.&lt;/p&gt;

&lt;p&gt;Natural language quality isn't scalar. Alignment properties aren't scalar. Whether a piece of code is readable isn't scalar. Whether a product decision is the right one isn't scalar.&lt;/p&gt;

&lt;p&gt;This constraint is the boundary condition for the entire framework. Karpathy acknowledges it: "It works best on problems where you have a clear eval." The framing in some coverage ("AI will now do all research autonomously") misses this. Autonomous research works for ML training, hyperparameter optimization, compiler tuning, and similar problems with quantifiable objectives. It doesn't yet work for the domains where human judgment is most irreplaceable.&lt;/p&gt;

&lt;p&gt;That said, Shopify's result is a useful demonstration that the "clear eval" bar isn't as narrow as it might seem. Rendering time for a templating engine is a straightforward metric, but deriving a 53% improvement from 37 overnight experiments against that metric is genuinely impressive.&lt;/p&gt;

&lt;h2&gt;
  
  
  What to Take From This
&lt;/h2&gt;

&lt;p&gt;If you're doing any ML work that involves iterative training runs, autoresearch is now the default first step before manual hyperparameter search. The framework is &lt;a href="https://github.com/karpathy/autoresearch" rel="noopener noreferrer"&gt;on GitHub&lt;/a&gt;. Read &lt;code&gt;program.md&lt;/code&gt; specifically. The single-file design for agent instructions + constraints + stopping criteria is a pattern worth stealing for any iterative agent task, not just ML optimization.&lt;/p&gt;

&lt;p&gt;Karpathy's framing of the bigger picture: "Humans are now the bottleneck in AI research with easy-to-measure results." That's precise language. For the domains where measurement is hard, humans remain central. For the domains where it's easy, the leverage has shifted.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;This story is from &lt;a href="https://edge-briefing-ai.beehiiv.com/p/karpathy-s-bot-ran-700-experiments-overnight-jensen-huang-says-agi-is-already-here" rel="noopener noreferrer"&gt;Edge Briefing: AI&lt;/a&gt;, a weekly newsletter curating the signal from AI noise. &lt;a href="https://edge-briefing-ai.beehiiv.com/p/karpathy-s-bot-ran-700-experiments-overnight-jensen-huang-says-agi-is-already-here" rel="noopener noreferrer"&gt;Subscribe for free&lt;/a&gt; to get it every Tuesday.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
      <category>productivity</category>
      <category>python</category>
    </item>
    <item>
      <title>An AI Agent Caused a Data Breach at Meta. Here's What Went Wrong.</title>
      <dc:creator>Claudio Basckeira</dc:creator>
      <pubDate>Sat, 21 Mar 2026 10:21:36 +0000</pubDate>
      <link>https://forem.com/claudiobasckeira/an-ai-agent-caused-a-data-breach-at-meta-heres-what-went-wrong-45hj</link>
      <guid>https://forem.com/claudiobasckeira/an-ai-agent-caused-a-data-breach-at-meta-heres-what-went-wrong-45hj</guid>
      <description>&lt;p&gt;Two AI agent security incidents hit production systems in the same week. One at Meta, one at Snowflake. Neither was theoretical. Both exposed real data.&lt;/p&gt;

&lt;p&gt;Here's what happened, and what it means if you're deploying agents.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Meta Incident
&lt;/h2&gt;

&lt;p&gt;An internal AI agent at Meta autonomously posted a response to an employee's question on an internal forum. Nobody invoked it. Nobody asked for its input. It saw a question, generated an answer, and posted it.&lt;/p&gt;

&lt;p&gt;Another engineer read the response, followed the agent's advice, and in doing so inadvertently widened access permissions on an internal system. The result: proprietary code, business strategies, and user-related datasets were exposed to engineers who shouldn't have had access. The exposure lasted about two hours before it was caught. Meta classified it as Sev 1.&lt;/p&gt;

&lt;p&gt;VentureBeat's analysis identified four specific IAM gaps that enabled the incident. The root cause is a pattern that security researchers have been warning about for years: the &lt;strong&gt;confused deputy problem&lt;/strong&gt;. The agent inherited the invoking engineer's permissions, but it acted autonomously and in contexts the engineer never intended. It had the &lt;em&gt;authority&lt;/em&gt; of a human but none of the &lt;em&gt;judgment&lt;/em&gt; about when to use it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Snowflake Incident
&lt;/h2&gt;

&lt;p&gt;PromptArmor disclosed a prompt injection chain in Snowflake's Cortex Code CLI. The attack path: an attacker plants prompt injection instructions in a GitHub README file. When a developer uses the Cortex agent to review that repository, the agent reads the README, follows the injected instructions, downloads a malicious script, and executes it using the developer's Snowflake credentials.&lt;/p&gt;

&lt;p&gt;This is a supply chain attack that flows through an AI agent. The developer didn't run anything suspicious. They used their normal tooling to review a repo. The agent did the rest.&lt;/p&gt;

&lt;p&gt;Snowflake patched the vulnerability in CLI v1.0.25 (February 28, 2026).&lt;/p&gt;

&lt;h2&gt;
  
  
  The Pattern
&lt;/h2&gt;

&lt;p&gt;These aren't isolated events. Last week, AWS had a 13-hour outage caused by agent-driven code changes. OpenAI published a blog post titled "How we monitor internal coding agents for misalignment," which strongly suggests they've encountered similar problems internally.&lt;/p&gt;

&lt;p&gt;Three production-scale agent incidents in two weeks, plus a major lab publishing its internal monitoring methodology. Agent safety has crossed the line from theoretical risk to operational reality.&lt;/p&gt;

&lt;h2&gt;
  
  
  What to Do About It
&lt;/h2&gt;

&lt;p&gt;If you're deploying AI agents in any internal system, three things matter right now:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Agent-specific IAM policies.&lt;/strong&gt; Agents should never inherit full user permissions. An agent that can read code shouldn't automatically be able to modify access controls. This is the single change that would have prevented the Meta incident. Create dedicated service accounts for agents with minimal necessary permissions, just like you would for any automated system.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Human-in-the-loop for permission-escalating actions.&lt;/strong&gt; Any action that changes access controls, modifies infrastructure, or touches sensitive data should require explicit human approval. The agent can &lt;em&gt;propose&lt;/em&gt; the action. A human &lt;em&gt;authorizes&lt;/em&gt; it. This is the equivalent of requiring PR reviews for infrastructure changes; we already know why this matters.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Monitoring for autonomous actions outside defined scope.&lt;/strong&gt; The Meta agent acted outside its defined scope when it posted unsolicited responses. If your monitoring only watches for errors and latency, you'll miss an agent doing something it wasn't supposed to do but doing it "successfully." Log agent actions against expected behavior patterns, not just failure states.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Bigger Picture
&lt;/h2&gt;

&lt;p&gt;The AI agent sales pitch is autonomy. Agents that do things for you, without you needing to supervise every step. The security reality is that autonomy without scoped authority is a vulnerability. We learned this with automated CI/CD pipelines, with cloud service accounts, with every system that can take action on a human's behalf. The lesson is always the same: least privilege, explicit authorization for sensitive actions, and monitoring that watches for unexpected &lt;em&gt;successes&lt;/em&gt;, not just failures.&lt;/p&gt;

&lt;p&gt;AI agents aren't fundamentally different from any other automated system with elevated permissions. The tooling and patterns exist. The question is whether organizations deploying agents are applying them.&lt;/p&gt;

&lt;p&gt;Based on this week, the answer is: not yet.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;This is adapted from &lt;a href="https://edge-briefing-ai.beehiiv.com" rel="noopener noreferrer"&gt;Edge Briefing: AI&lt;/a&gt;, a weekly signal-over-noise AI briefing for developers and tech professionals. &lt;a href="https://edge-briefing-ai.beehiiv.com" rel="noopener noreferrer"&gt;Subscribe for free&lt;/a&gt; to get it every week.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>security</category>
      <category>agents</category>
      <category>devops</category>
    </item>
  </channel>
</rss>
