<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: CPDForge</title>
    <description>The latest articles on Forem by CPDForge (@cpdforge).</description>
    <link>https://forem.com/cpdforge</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3842717%2F5ac02ff9-3b89-4e5d-aa3d-a3dcf501db52.png</url>
      <title>Forem: CPDForge</title>
      <link>https://forem.com/cpdforge</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/cpdforge"/>
    <language>en</language>
    <item>
      <title>The Most Dangerous AI Output Isn’t Wrong — It’s “Almost Right”</title>
      <dc:creator>CPDForge</dc:creator>
      <pubDate>Sat, 04 Apr 2026 09:19:47 +0000</pubDate>
      <link>https://forem.com/cpdforge/-the-most-dangerous-ai-output-isnt-wrong-its-almost-right-kpn</link>
      <guid>https://forem.com/cpdforge/-the-most-dangerous-ai-output-isnt-wrong-its-almost-right-kpn</guid>
      <description>&lt;p&gt;Most people think the biggest risk with AI is hallucination.&lt;/p&gt;

&lt;p&gt;Completely wrong answers.&lt;br&gt;&lt;br&gt;
Obvious mistakes.&lt;br&gt;&lt;br&gt;
Stuff you can spot instantly.&lt;/p&gt;

&lt;p&gt;That’s not what caused problems for us.&lt;/p&gt;

&lt;p&gt;The real issue showed up later — once things &lt;em&gt;looked&lt;/em&gt; like they were working.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The outputs weren’t wrong.&lt;br&gt;&lt;br&gt;
They were almost right.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;And that’s a much harder problem to deal with.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why “Almost Right” Is Worse Than Wrong
&lt;/h2&gt;

&lt;p&gt;If something is clearly wrong, you catch it.&lt;/p&gt;

&lt;p&gt;You fix it.&lt;br&gt;&lt;br&gt;
You move on.&lt;/p&gt;

&lt;p&gt;But when something is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;90% correct
&lt;/li&gt;
&lt;li&gt;Well structured
&lt;/li&gt;
&lt;li&gt;Confidently written
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;…it passes through unnoticed.&lt;/p&gt;

&lt;p&gt;And that’s where systems start to break.&lt;/p&gt;




&lt;h2&gt;
  
  
  What This Looks Like in Practice
&lt;/h2&gt;

&lt;p&gt;These weren’t big failures.&lt;/p&gt;

&lt;p&gt;They were small, subtle ones:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A field slightly misclassified
&lt;/li&gt;
&lt;li&gt;A rule applied in the wrong context
&lt;/li&gt;
&lt;li&gt;A structure that looks valid but doesn’t align with the system
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Individually, they don’t matter.&lt;/p&gt;

&lt;p&gt;At scale, they compound.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Real Problem: AI Stabilises Its Own Mistakes
&lt;/h2&gt;

&lt;p&gt;Here’s what we realised:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;AI doesn’t just generate errors — it reinforces them.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Once a slightly incorrect pattern appears, the model tends to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Repeat it
&lt;/li&gt;
&lt;li&gt;Expand on it
&lt;/li&gt;
&lt;li&gt;Make it look more consistent over time
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So instead of random errors, you get:&lt;/p&gt;

&lt;p&gt;Clean, consistent, wrong outputs.&lt;/p&gt;

&lt;p&gt;Which are much harder to detect.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why This Happens
&lt;/h2&gt;

&lt;p&gt;AI isn’t reasoning in the way we expect.&lt;/p&gt;

&lt;p&gt;It’s optimising for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Coherence
&lt;/li&gt;
&lt;li&gt;Pattern completion
&lt;/li&gt;
&lt;li&gt;Internal consistency
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Not correctness.&lt;/p&gt;

&lt;p&gt;So if an early assumption is slightly off, the model will build a very convincing version of reality around it.&lt;/p&gt;




&lt;h2&gt;
  
  
  Where This Breaks Real Systems
&lt;/h2&gt;

&lt;p&gt;This becomes critical when AI is used for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Structured content generation
&lt;/li&gt;
&lt;li&gt;Compliance or policy outputs
&lt;/li&gt;
&lt;li&gt;Anything reused or scaled
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Because now you don’t just have an error.&lt;/p&gt;

&lt;p&gt;You have:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A repeatable error
&lt;/li&gt;
&lt;li&gt;A scalable error
&lt;/li&gt;
&lt;li&gt;A system-level error
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  What We Changed
&lt;/h2&gt;

&lt;p&gt;We stopped trusting “good-looking outputs.”&lt;/p&gt;

&lt;p&gt;Instead, we built around one principle:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Every output is suspect until proven stable.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h3&gt;
  
  
  1. Pattern Detection Over Single Output Review
&lt;/h3&gt;

&lt;p&gt;Instead of asking:&lt;br&gt;
“Is this output correct?”&lt;/p&gt;

&lt;p&gt;We ask:&lt;br&gt;
“Is this pattern consistently correct across outputs?”&lt;/p&gt;

&lt;p&gt;This exposes hidden drift fast.&lt;/p&gt;




&lt;h3&gt;
  
  
  2. Intent vs Output Validation
&lt;/h3&gt;

&lt;p&gt;We separate:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What the system is supposed to do
&lt;/li&gt;
&lt;li&gt;What the AI actually produced
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Then compare them explicitly.&lt;/p&gt;

&lt;p&gt;If they don’t align, it fails — even if it looks right.&lt;/p&gt;




&lt;h3&gt;
  
  
  3. Breaking the Feedback Loop
&lt;/h3&gt;

&lt;p&gt;We avoid feeding AI its own outputs without checks.&lt;/p&gt;

&lt;p&gt;Because that’s how:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Small errors
become reinforced patterns
become system behaviour
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  The Counterintuitive Bit
&lt;/h2&gt;

&lt;p&gt;Making outputs more polished made the problem worse.&lt;/p&gt;

&lt;p&gt;Cleaner language increases trust.&lt;br&gt;&lt;br&gt;
More trust reduces scrutiny.&lt;/p&gt;

&lt;p&gt;Which allows bad patterns to survive longer.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why This Matters Right Now
&lt;/h2&gt;

&lt;p&gt;A lot of AI tooling is focused on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Making outputs better
&lt;/li&gt;
&lt;li&gt;Making them more human
&lt;/li&gt;
&lt;li&gt;Making them more polished
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But that increases risk if you’re not validating underneath.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Takeaway
&lt;/h2&gt;

&lt;p&gt;If your AI outputs look great but your system still feels unreliable:&lt;/p&gt;

&lt;p&gt;You’re probably dealing with “almost right” errors.&lt;/p&gt;

&lt;p&gt;And those are much harder to catch than obvious failures.&lt;/p&gt;




&lt;h2&gt;
  
  
  Question for Anyone Building with AI
&lt;/h2&gt;

&lt;p&gt;If you’re using AI in production workflows:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What breaks first when you scale?&lt;/li&gt;
&lt;li&gt;Do you validate outputs, or just trust them if they look good?&lt;/li&gt;
&lt;li&gt;Have you run into “clean but wrong” behaviour?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Genuinely curious how others are handling this.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>webdev</category>
      <category>programming</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>From Prompts to Systems: Fixing AI Agent Drift in Production</title>
      <dc:creator>CPDForge</dc:creator>
      <pubDate>Mon, 30 Mar 2026 13:50:55 +0000</pubDate>
      <link>https://forem.com/cpdforge/from-prompts-to-systems-fixing-ai-agent-drift-in-production-pcm</link>
      <guid>https://forem.com/cpdforge/from-prompts-to-systems-fixing-ai-agent-drift-in-production-pcm</guid>
      <description>&lt;h2&gt;
  
  
  Why My AI Agent Kept Getting Things Wrong (And What Actually Fixed It)
&lt;/h2&gt;

&lt;p&gt;At first, it worked.&lt;/p&gt;

&lt;p&gt;I gave the AI a clear prompt. It responded well. Structured, relevant, even a bit impressive.&lt;/p&gt;

&lt;p&gt;Then I tried again.&lt;/p&gt;

&lt;p&gt;Same prompt. Slightly different output.&lt;br&gt;&lt;br&gt;
Then again — and something felt off.&lt;br&gt;&lt;br&gt;
Not completely wrong… just inconsistent.&lt;/p&gt;

&lt;p&gt;That’s when it became a problem.&lt;/p&gt;

&lt;p&gt;Because I wasn’t building a demo. I was building a product.&lt;/p&gt;


&lt;h2&gt;
  
  
  The Problem: “Almost Right” Is Not Good Enough
&lt;/h2&gt;

&lt;p&gt;When you’re working with LLMs in isolation, variability is fine. Even interesting.&lt;/p&gt;

&lt;p&gt;When you’re building something people rely on — it isn’t.&lt;/p&gt;

&lt;p&gt;I started seeing patterns:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Outputs drifting in structure
&lt;/li&gt;
&lt;li&gt;Key instructions being ignored
&lt;/li&gt;
&lt;li&gt;Tone and formatting changing between runs
&lt;/li&gt;
&lt;li&gt;Occasionally… things just made up
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Nothing catastrophic. Just unreliable.&lt;/p&gt;

&lt;p&gt;And that’s worse.&lt;/p&gt;

&lt;p&gt;Because you can’t trust it.&lt;/p&gt;


&lt;h2&gt;
  
  
  The Context: This Wasn’t Just a Chatbot
&lt;/h2&gt;

&lt;p&gt;One important detail — this wasn’t an internal tool or a sandbox experiment.&lt;/p&gt;

&lt;p&gt;This was a &lt;strong&gt;user-facing AI agent&lt;/strong&gt;, interacting with both:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;logged-in users (with context, data, and history)
&lt;/li&gt;
&lt;li&gt;prospective users (with no context at all)
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Which meant I effectively needed &lt;strong&gt;two behaviours&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;one that could operate with structured internal data and constraints
&lt;/li&gt;
&lt;li&gt;one that could explain, guide, and respond more openly without access to that context
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Trying to handle both with the same prompt quickly broke down.&lt;/p&gt;

&lt;p&gt;The agent would:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;assume context that didn’t exist
&lt;/li&gt;
&lt;li&gt;overreach when it should stay generic
&lt;/li&gt;
&lt;li&gt;or lose structure when switching between modes
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That’s when it became clear the issue wasn’t just prompting — it was &lt;strong&gt;context control and behavioural separation&lt;/strong&gt;.&lt;/p&gt;


&lt;h2&gt;
  
  
  Why This Happens (and Why It’s Not a Bug)
&lt;/h2&gt;

&lt;p&gt;It took a bit of stepping back to realise:&lt;/p&gt;

&lt;p&gt;The model wasn’t failing — I was asking it to behave like something it isn’t.&lt;/p&gt;

&lt;p&gt;LLMs are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Stateless (unless you force context)
&lt;/li&gt;
&lt;li&gt;Probabilistic (not deterministic)
&lt;/li&gt;
&lt;li&gt;Context-sensitive (and context degrades fast)
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;What I was treating as “rules” were really just:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Suggestions with good intentions&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Even system prompts didn’t fully solve it.&lt;/p&gt;

&lt;p&gt;They help — but they don’t enforce behaviour.&lt;/p&gt;


&lt;h2&gt;
  
  
  What I Tried First (and Why It Didn’t Work)
&lt;/h2&gt;

&lt;p&gt;Like most people, I went through the usual iterations:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Making prompts longer
&lt;/li&gt;
&lt;li&gt;Repeating instructions
&lt;/li&gt;
&lt;li&gt;Adding “IMPORTANT:” everywhere
&lt;/li&gt;
&lt;li&gt;Trying to be hyper-specific
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It improved things slightly… but not enough.&lt;/p&gt;

&lt;p&gt;The problem wasn’t clarity.&lt;/p&gt;

&lt;p&gt;The problem was &lt;strong&gt;control&lt;/strong&gt;.&lt;/p&gt;


&lt;h2&gt;
  
  
  The Shift: From Prompts to Systems
&lt;/h2&gt;

&lt;p&gt;The breakthrough came when I stopped thinking in terms of prompts and started thinking in terms of &lt;strong&gt;structure&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Instead of:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Tell the model what to do”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I moved to:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Define how the model is allowed to behave”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That’s a completely different mindset.&lt;/p&gt;


&lt;h2&gt;
  
  
  What I Built: A Structured Instruction Layer
&lt;/h2&gt;

&lt;p&gt;I ended up creating what I originally called an “instruction bible”.&lt;/p&gt;

&lt;p&gt;In reality, it’s closer to a &lt;strong&gt;structured instruction system&lt;/strong&gt; layered on top of the model.&lt;/p&gt;
&lt;h3&gt;
  
  
  1. Persistent rules (not buried in prompts)
&lt;/h3&gt;

&lt;p&gt;Instead of mixing everything into one prompt, I separated:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Role definition
&lt;/li&gt;
&lt;li&gt;Behaviour rules
&lt;/li&gt;
&lt;li&gt;Output constraints
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Example:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"role"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"compliance_ai"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"rules"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"Do not invent regulations"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"Flag uncertainty explicitly"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"Prioritise clarity over completeness"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"output_format"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"structured_sections"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This becomes the source of truth, not just part of the conversation.&lt;/p&gt;




&lt;h3&gt;
  
  
  2. Modular instructions
&lt;/h3&gt;

&lt;p&gt;Different tasks = different instruction sets.&lt;/p&gt;

&lt;p&gt;Instead of one giant prompt, I used:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Generation mode
&lt;/li&gt;
&lt;li&gt;Review mode
&lt;/li&gt;
&lt;li&gt;Analysis mode
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each with its own constraints.&lt;/p&gt;

&lt;p&gt;This reduced cross-contamination between behaviours.&lt;/p&gt;




&lt;h3&gt;
  
  
  3. Controlled outputs
&lt;/h3&gt;

&lt;p&gt;I stopped accepting “natural” responses.&lt;/p&gt;

&lt;p&gt;Everything had to follow a structure.&lt;/p&gt;

&lt;p&gt;For example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Sections must exist
&lt;/li&gt;
&lt;li&gt;Headings must match
&lt;/li&gt;
&lt;li&gt;Lists must be formatted consistently
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If the output didn’t comply, it was rejected or reprocessed.&lt;/p&gt;




&lt;h3&gt;
  
  
  4. Reduced ambiguity
&lt;/h3&gt;

&lt;p&gt;I removed anything vague.&lt;/p&gt;

&lt;p&gt;No:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;“be helpful”
&lt;/li&gt;
&lt;li&gt;“be clear”
&lt;/li&gt;
&lt;li&gt;“be concise”
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Instead:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Define structure
&lt;/li&gt;
&lt;li&gt;Define constraints
&lt;/li&gt;
&lt;li&gt;Define boundaries
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The model performs much better when it has less room to interpret.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Changed
&lt;/h2&gt;

&lt;p&gt;Once this layer was in place, the difference was immediate.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Outputs became consistent
&lt;/li&gt;
&lt;li&gt;Structure stabilised
&lt;/li&gt;
&lt;li&gt;Hallucination dropped significantly
&lt;/li&gt;
&lt;li&gt;Reuse became possible
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Most importantly:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;I could actually trust the output in a product setting&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Not perfect — but predictable.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Bigger Realisation
&lt;/h2&gt;

&lt;p&gt;The real lesson wasn’t about prompts.&lt;/p&gt;

&lt;p&gt;It was this:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Prompt engineering doesn’t scale. Systems do.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;You can get good results with clever prompts.&lt;/p&gt;

&lt;p&gt;But if you want:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;reliability
&lt;/li&gt;
&lt;li&gt;repeatability
&lt;/li&gt;
&lt;li&gt;product-grade output
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You need structure.&lt;/p&gt;




&lt;h2&gt;
  
  
  Where This Fits in the Bigger Picture
&lt;/h2&gt;

&lt;p&gt;This lines up with a broader shift happening right now:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;From chatbots → agents
&lt;/li&gt;
&lt;li&gt;From prompts → orchestration
&lt;/li&gt;
&lt;li&gt;From “AI responses” → controlled systems
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We’re moving away from:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Ask the model something”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Toward:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Design how the model operates”&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Final Thought
&lt;/h2&gt;

&lt;p&gt;LLMs are powerful — but they’re not plug-and-play components.&lt;/p&gt;

&lt;p&gt;If you want to build something real with them, you have to accept:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You’re not just writing prompts
&lt;/li&gt;
&lt;li&gt;You’re designing behaviour
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And once you start treating it that way, everything changes.&lt;/p&gt;




&lt;p&gt;If you’re building with AI and hitting similar issues, I’d be interested to hear how you’re handling it — especially where things break.&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>ai</category>
      <category>machinelearning</category>
      <category>softwareengineering</category>
    </item>
    <item>
      <title>We tried to generate a compliance course with AI. It didn’t go well.</title>
      <dc:creator>CPDForge</dc:creator>
      <pubDate>Wed, 25 Mar 2026 08:07:10 +0000</pubDate>
      <link>https://forem.com/cpdforge/we-tried-to-generate-a-compliance-course-with-ai-it-didnt-go-well-56n7</link>
      <guid>https://forem.com/cpdforge/we-tried-to-generate-a-compliance-course-with-ai-it-didnt-go-well-56n7</guid>
      <description>&lt;p&gt;&lt;strong&gt;We started off trying to build a compliance course.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;We ended up building the system required to trust one.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Turns out they’re not the same thing.&lt;/p&gt;

&lt;p&gt;That’s when everything changed.&lt;/p&gt;




&lt;h2&gt;
  
  
  🧪 The First Version (Looked Fine… Until It Didn’t)
&lt;/h2&gt;

&lt;p&gt;The initial idea was simple:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use AI to generate a compliance training course.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Pick a topic like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;risk assessment
&lt;/li&gt;
&lt;li&gt;workplace safety
&lt;/li&gt;
&lt;li&gt;ESG fundamentals
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Feed it into a model, get a structured course out.&lt;/p&gt;

&lt;p&gt;And technically — that worked.&lt;/p&gt;

&lt;p&gt;We got:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;modules
&lt;/li&gt;
&lt;li&gt;lessons
&lt;/li&gt;
&lt;li&gt;headings
&lt;/li&gt;
&lt;li&gt;even quizzes
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;On the surface, it looked decent.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;But once you actually read it properly…&lt;/p&gt;




&lt;h2&gt;
  
  
  ❌ What Was Broken
&lt;/h2&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Shallow Content&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;It explained things, but didn’t really teach anything.&lt;br&gt;&lt;br&gt;
No depth. No real-world context. No edge cases.&lt;/p&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;Inconsistent Structure&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Some lessons were detailed. Others felt like placeholders.&lt;br&gt;&lt;br&gt;
No consistency across the course.&lt;/p&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;No Instructional Flow&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;It wasn’t designed — it was assembled.&lt;br&gt;&lt;br&gt;
Content chunks, not a learning journey.&lt;/p&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;And the Big One: Reliability&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;In compliance training, “almost correct” isn’t acceptable.&lt;br&gt;&lt;br&gt;
It’s a risk.&lt;/p&gt;




&lt;h2&gt;
  
  
  ⚠️ The Realisation
&lt;/h2&gt;

&lt;p&gt;We assumed the problem was:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“How do we generate better content?”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;It wasn’t.&lt;/p&gt;

&lt;p&gt;The real problem was:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How do we make that content consistent, reliable, and safe to use?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;AI was doing exactly what it’s good at:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;producing plausible output
&lt;/li&gt;
&lt;li&gt;filling gaps convincingly
&lt;/li&gt;
&lt;li&gt;sounding right
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But that’s not the same as being trustworthy.&lt;/p&gt;




&lt;h2&gt;
  
  
  🔧 What Broke First
&lt;/h2&gt;

&lt;p&gt;Our original pipeline looked something like:&lt;/p&gt;

&lt;p&gt;Prompt → LLM → Output course&lt;/p&gt;

&lt;p&gt;And for a moment, that felt like enough.&lt;/p&gt;

&lt;p&gt;Until we started testing it properly.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Sections contradicted each other
&lt;/li&gt;
&lt;li&gt;Concepts repeated in different ways
&lt;/li&gt;
&lt;li&gt;Terminology drifted across lessons
&lt;/li&gt;
&lt;li&gt;Some parts were strong, others clearly weak
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;You could generate a course.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;You just couldn’t rely on it.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  🧱 What We Had to Build Instead
&lt;/h2&gt;

&lt;p&gt;The moment things changed was when we stopped treating this as a generation problem.&lt;/p&gt;

&lt;p&gt;We started treating it as a system problem.&lt;/p&gt;

&lt;p&gt;The pipeline evolved into something more like:&lt;/p&gt;

&lt;p&gt;Input&lt;br&gt;
→ Structured Generation&lt;br&gt;
→ Validation Layer&lt;br&gt;
→ Targeted Rewriting&lt;br&gt;
→ Enrichment (quizzes, scenarios, examples)&lt;br&gt;
→ Compliance Checks&lt;br&gt;
→ Output&lt;/p&gt;

&lt;p&gt;Each layer existed for a reason.&lt;/p&gt;

&lt;p&gt;Because every time we skipped one — something failed.&lt;/p&gt;




&lt;h2&gt;
  
  
  🧩 The Hard Parts (That Don’t Show Up in Demos)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Structure Enforcement&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;We had to stop the model from improvising.&lt;/p&gt;

&lt;p&gt;That meant:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;fixed lesson frameworks
&lt;/li&gt;
&lt;li&gt;defined section types
&lt;/li&gt;
&lt;li&gt;controlled outputs
&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;Targeted Improvement (Not Regeneration)&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Regenerating everything just moved the problem around.&lt;/p&gt;

&lt;p&gt;Instead:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;identify weak sections
&lt;/li&gt;
&lt;li&gt;rewrite only those
&lt;/li&gt;
&lt;li&gt;preserve what already works
&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;Cross-Course Consistency&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;This was harder than expected.&lt;/p&gt;

&lt;p&gt;We needed to deal with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;duplicated concepts
&lt;/li&gt;
&lt;li&gt;mismatched terminology
&lt;/li&gt;
&lt;li&gt;uneven difficulty
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Which meant introducing:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;internal rules
&lt;/li&gt;
&lt;li&gt;pattern checks
&lt;/li&gt;
&lt;li&gt;consistency constraints
&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;Compliance Awareness&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;This is where most tools fall down.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;We needed:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;alignment with recognised frameworks
&lt;/li&gt;
&lt;li&gt;the ability to adapt as guidance evolves
&lt;/li&gt;
&lt;li&gt;detection of weak or risky content
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  🧠 The Shift
&lt;/h2&gt;

&lt;p&gt;At some point, we stopped thinking in prompts.&lt;/p&gt;

&lt;p&gt;We started thinking in systems.&lt;/p&gt;

&lt;p&gt;AI became one part of the process — not the solution.&lt;/p&gt;




&lt;h2&gt;
  
  
  🛠️ If You’re Building with AI
&lt;/h2&gt;

&lt;p&gt;It’s very easy to focus on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;better prompts
&lt;/li&gt;
&lt;li&gt;better outputs
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;But the real leverage is in:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;constraints
&lt;/li&gt;
&lt;li&gt;validation
&lt;/li&gt;
&lt;li&gt;iteration
&lt;/li&gt;
&lt;li&gt;control
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Because generation is easy.&lt;/p&gt;

&lt;p&gt;Making it usable is not.&lt;/p&gt;




&lt;h2&gt;
  
  
  🚀 Where This Landed
&lt;/h2&gt;

&lt;p&gt;What started as “generate a course” became:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;structure
&lt;/li&gt;
&lt;li&gt;validation
&lt;/li&gt;
&lt;li&gt;rewriting
&lt;/li&gt;
&lt;li&gt;enrichment
&lt;/li&gt;
&lt;li&gt;compliance
&lt;/li&gt;
&lt;li&gt;delivery
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Not because we wanted more features —&lt;br&gt;&lt;br&gt;
but because without them, none of it worked.&lt;/p&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;That was the real lesson.&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;AI doesn’t remove complexity.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;It just hides it — until it matters.&lt;/strong&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
      <category>webdev</category>
      <category>softwareengineering</category>
    </item>
  </channel>
</rss>
