<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Paul Martin</title>
    <description>The latest articles on Forem by Paul Martin (@paul_martin_1).</description>
    <link>https://forem.com/paul_martin_1</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3684263%2F1ae387eb-3859-4ebe-ab6d-50b6de884d32.jpeg</url>
      <title>Forem: Paul Martin</title>
      <link>https://forem.com/paul_martin_1</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/paul_martin_1"/>
    <language>en</language>
    <item>
      <title>Why AI Agents Fail in Production (And Why Prompting Harder Won’t Fix It)</title>
      <dc:creator>Paul Martin</dc:creator>
      <pubDate>Mon, 29 Dec 2025 12:15:37 +0000</pubDate>
      <link>https://forem.com/paul_martin_1/why-ai-agents-fail-in-production-and-why-prompting-harder-wont-fix-it-932</link>
      <guid>https://forem.com/paul_martin_1/why-ai-agents-fail-in-production-and-why-prompting-harder-wont-fix-it-932</guid>
      <description>&lt;p&gt;Most AI agent demos work beautifully.&lt;/p&gt;

&lt;p&gt;Then you ship them into a real system.&lt;/p&gt;

&lt;p&gt;And suddenly:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;agents hallucinate confidently
&lt;/li&gt;
&lt;li&gt;steps get skipped
&lt;/li&gt;
&lt;li&gt;tools are called out of order
&lt;/li&gt;
&lt;li&gt;outputs drift from the original intent
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Teams usually respond by tweaking prompts.&lt;/p&gt;

&lt;p&gt;That almost never works.&lt;/p&gt;

&lt;p&gt;This post explains &lt;strong&gt;why AI agents fail in production&lt;/strong&gt;, what’s actually going wrong under the hood, and why the problem isn’t models — it’s missing planning.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Core Misunderstanding About AI Agents
&lt;/h2&gt;

&lt;p&gt;Most agent systems today rely on this assumption:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;If we give the model enough context and a good prompt, it will behave correctly.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That assumption breaks the moment agents:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;run across multiple steps
&lt;/li&gt;
&lt;li&gt;call real tools
&lt;/li&gt;
&lt;li&gt;modify state
&lt;/li&gt;
&lt;li&gt;interact with external systems
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;At that point, &lt;strong&gt;implicit intent is not enough&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Failure Mode 1: Agents Predict, They Don’t Decide
&lt;/h2&gt;

&lt;p&gt;Large language models don’t reason about truth or correctness.&lt;/p&gt;

&lt;p&gt;They:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;predict the most likely next token
&lt;/li&gt;
&lt;li&gt;optimize for plausibility, not accuracy
&lt;/li&gt;
&lt;li&gt;have no built-in concept of “this decision was already made”
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So when an agent is asked to act repeatedly:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;earlier decisions are re-interpreted
&lt;/li&gt;
&lt;li&gt;assumptions silently change
&lt;/li&gt;
&lt;li&gt;the plan shifts without warning
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is where hallucinations start.&lt;/p&gt;

&lt;p&gt;Not because the model is bad — but because &lt;strong&gt;nothing is anchoring decisions&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Failure Mode 2: Implicit Planning Collapses Under Execution
&lt;/h2&gt;

&lt;p&gt;In many systems:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the “plan” exists only inside a prompt
&lt;/li&gt;
&lt;li&gt;constraints are implied, not enforced
&lt;/li&gt;
&lt;li&gt;nothing separates intent from execution
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This works in short conversations.&lt;/p&gt;

&lt;p&gt;It fails when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;tools are involved
&lt;/li&gt;
&lt;li&gt;workflows span minutes or hours
&lt;/li&gt;
&lt;li&gt;multiple agents collaborate
&lt;/li&gt;
&lt;li&gt;retries or partial failures occur
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When execution starts, the agent is forced to &lt;strong&gt;re-derive intent&lt;/strong&gt; from scattered context.&lt;/p&gt;

&lt;p&gt;That’s not planning.&lt;br&gt;&lt;br&gt;
That’s improvisation.&lt;/p&gt;




&lt;h2&gt;
  
  
  Failure Mode 3: Prompting Harder Makes Things Worse
&lt;/h2&gt;

&lt;p&gt;When systems fail, teams often:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;add more instructions
&lt;/li&gt;
&lt;li&gt;add longer prompts
&lt;/li&gt;
&lt;li&gt;add more examples
&lt;/li&gt;
&lt;li&gt;add more “DO NOT” rules
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This increases:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;token usage
&lt;/li&gt;
&lt;li&gt;cognitive load
&lt;/li&gt;
&lt;li&gt;ambiguity
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But it does &lt;strong&gt;not&lt;/strong&gt; create determinism.&lt;/p&gt;

&lt;p&gt;You’re still asking the model to infer decisions at runtime — repeatedly.&lt;/p&gt;

&lt;p&gt;That’s why prompt-only agents drift.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Missing Layer: Explicit Planning
&lt;/h2&gt;

&lt;p&gt;Reliable agent systems separate &lt;strong&gt;three distinct phases&lt;/strong&gt;:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Intent definition&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
What problem are we solving? What constraints apply?&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Planning&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
What decisions are locked? What steps are allowed? What assumptions are fixed?&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Execution&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Agents act &lt;em&gt;only&lt;/em&gt; within the bounds of the plan.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Most systems skip step 2.&lt;/p&gt;

&lt;p&gt;That’s the bug.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why Planning Must Be Explicit (Not Prompted)
&lt;/h2&gt;

&lt;p&gt;An explicit plan:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;exists outside the model’s hidden state
&lt;/li&gt;
&lt;li&gt;survives retries and failures
&lt;/li&gt;
&lt;li&gt;can be inspected, validated, and versioned
&lt;/li&gt;
&lt;li&gt;prevents agents from silently changing assumptions
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Once decisions are written down:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;agents stop hallucinating alternatives
&lt;/li&gt;
&lt;li&gt;execution becomes constrained
&lt;/li&gt;
&lt;li&gt;failures become debuggable
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is how you reduce drift &lt;strong&gt;without&lt;/strong&gt; over-prompting.&lt;/p&gt;




&lt;h2&gt;
  
  
  Planning in Practice
&lt;/h2&gt;

&lt;p&gt;Some teams are starting to introduce a thin planning layer &lt;em&gt;before&lt;/em&gt; agents execute — a place where intent, constraints, and decisions are made explicit and locked.&lt;/p&gt;

&lt;p&gt;That’s the direction tools like &lt;strong&gt;Superplan&lt;/strong&gt; take: treating planning as a first-class artifact instead of something inferred repeatedly at runtime.&lt;/p&gt;

&lt;p&gt;If you’re interested in that approach, you can see the idea here:  &lt;/p&gt;

&lt;p&gt;&lt;a href="https://superplan.md" rel="noopener noreferrer"&gt;Superplan.md&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  What This Means for Production AI
&lt;/h2&gt;

&lt;p&gt;If you’re building agents that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;call APIs
&lt;/li&gt;
&lt;li&gt;touch infrastructure
&lt;/li&gt;
&lt;li&gt;write data
&lt;/li&gt;
&lt;li&gt;trigger workflows
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Then you don’t have a “prompting problem”.&lt;/p&gt;

&lt;p&gt;You have a &lt;strong&gt;planning problem&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Until intent is explicit and decisions are locked &lt;em&gt;before&lt;/em&gt; execution, hallucinations are inevitable.&lt;/p&gt;




&lt;h2&gt;
  
  
  Closing Thought
&lt;/h2&gt;

&lt;p&gt;Models will keep improving.&lt;br&gt;&lt;br&gt;
Tooling will keep evolving.&lt;/p&gt;

&lt;p&gt;But no amount of model quality fixes a system that &lt;strong&gt;never decided what it was doing in the first place&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Planning isn’t overhead.&lt;br&gt;&lt;br&gt;
It’s the difference between a demo and a production system.&lt;/p&gt;

</description>
      <category>superplanmd</category>
      <category>webdev</category>
      <category>ai</category>
      <category>programming</category>
    </item>
    <item>
      <title>AI agents don’t fail because models are bad.
They fail because planning is implicit.
Superplan.md adds a thin planning layer before execution so agents act on decisions, not assumptions.

Read more: https://blog.superplan.md/what-is-superplan</title>
      <dc:creator>Paul Martin</dc:creator>
      <pubDate>Mon, 29 Dec 2025 12:09:29 +0000</pubDate>
      <link>https://forem.com/paul_martin_1/ai-agents-dont-fail-because-models-are-bad-they-fail-because-planning-is-implicit-superplanmd-1ia7</link>
      <guid>https://forem.com/paul_martin_1/ai-agents-dont-fail-because-models-are-bad-they-fail-because-planning-is-implicit-superplanmd-1ia7</guid>
      <description>&lt;div class="crayons-card c-embed text-styles text-styles--secondary"&gt;
    &lt;div class="c-embed__content"&gt;
        &lt;div class="c-embed__cover"&gt;
          &lt;a href="https://blog.superplan.md/what-is-superplan/" class="c-link align-middle" rel="noopener noreferrer"&gt;
            &lt;img alt="" src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblog.superplan.md%2F_astro%2Fblog-2.l9Z4J4n6.jpeg" height="446" class="m-0" width="800"&gt;
          &lt;/a&gt;
        &lt;/div&gt;
      &lt;div class="c-embed__body"&gt;
        &lt;h2 class="fs-xl lh-tight"&gt;
          &lt;a href="https://blog.superplan.md/what-is-superplan/" rel="noopener noreferrer" class="c-link"&gt;
            SuperPlan: The AI Planning Substrate That Prevents Agent Drift and Hallucinations
          &lt;/a&gt;
        &lt;/h2&gt;
          &lt;p class="truncate-at-3"&gt;
            SuperPlan is a planning layer for AI agents that prevents hallucinations and drift by forcing structured decisions before execution. Learn how explicit planning fixes agent failures.
          &lt;/p&gt;
        &lt;div class="color-secondary fs-s flex items-center"&gt;
            &lt;img alt="favicon" class="c-embed__favicon m-0 mr-2 radius-0" src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fblog.superplan.md%2Ffavicon.svg" width="120" height="120"&gt;
          blog.superplan.md
        &lt;/div&gt;
      &lt;/div&gt;
    &lt;/div&gt;
&lt;/div&gt;


</description>
    </item>
  </channel>
</rss>
