<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Paul Coles</title>
    <description>The latest articles on Forem by Paul Coles (@paul_coles_633f698b10fd6e).</description>
    <link>https://forem.com/paul_coles_633f698b10fd6e</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1860071%2F61b8460b-57dc-4b65-9ec4-ba2330f63f3d.jpg</url>
      <title>Forem: Paul Coles</title>
      <link>https://forem.com/paul_coles_633f698b10fd6e</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/paul_coles_633f698b10fd6e"/>
    <language>en</language>
    <item>
      <title>The Subtle Art of Herding Cats: How I Turned Chaos Into a Repeatable Test Process (Part 3 of 4)</title>
      <dc:creator>Paul Coles</dc:creator>
      <pubDate>Mon, 11 Aug 2025 07:43:37 +0000</pubDate>
      <link>https://forem.com/paul_coles_633f698b10fd6e/the-subtle-art-of-herding-cats-how-i-turned-chaos-into-a-repeatable-test-process-part-3-of-4-5c2b</link>
      <guid>https://forem.com/paul_coles_633f698b10fd6e/the-subtle-art-of-herding-cats-how-i-turned-chaos-into-a-repeatable-test-process-part-3-of-4-5c2b</guid>
      <description>&lt;h2&gt;
  
  
  Proof of Concept: Does This Actually Work?
&lt;/h2&gt;

&lt;p&gt;In Part 2, I found that gold standards are more effective than rulebooks. Also, lazy loading helps stop Context Rot. Part 3 shows how it works in real life. It includes fake examples and a truthful look at what happens when the cats face reality.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Universal BDD Vision: Two Car Companies Principle
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The Core Philosophy
&lt;/h3&gt;

&lt;p&gt;If two companies do the same thing, like BMW and Mercedes-Benz with car configurators, you can take a requirement from either and create the same BDD scenario.&lt;/p&gt;

&lt;p&gt;The scenario shouldn't contain:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Implementation details&lt;/strong&gt;: REST APIs, microservices, specific databases&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;System names&lt;/strong&gt;: ConfiguratorService v2.1, PricingEngine, ValidationAPI&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Technical artefacts&lt;/strong&gt;: JSON responses, event handlers, component states&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Instead, it should focus on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;User intent&lt;/strong&gt;: What does the person want to accomplish?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;User actions&lt;/strong&gt;: What do they actually do?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Observable results&lt;/strong&gt;: What do they see happen?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The code behind is different, but the human need is identical.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Problem: Implementation-Contaminated Scenarios
&lt;/h3&gt;

&lt;p&gt;Here's what BDD scenarios look like when they're contaminated with implementation details:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight gherkin"&gt;&lt;code&gt;&lt;span class="c"&gt;# BMW's contaminated approach&lt;/span&gt;
&lt;span class="kd"&gt;Feature&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; BMW iDrive ConfiguratorService Integration [SPEC-BMW-123]
&lt;span class="kn"&gt;Background&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
  &lt;span class="nf"&gt;Given &lt;/span&gt;the BMW ConnectedDrive API is initialized
  &lt;span class="nf"&gt;And &lt;/span&gt;the user authenticates via BMW ID OAuth
  &lt;span class="nf"&gt;And &lt;/span&gt;the PricingEngine microservice is available

&lt;span class="kn"&gt;Scenario&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; M Sport Package selection triggers pricing recalculation
  &lt;span class="nf"&gt;Given &lt;/span&gt;I have loaded the 3-series configurator via iDrive interface
  &lt;span class="nf"&gt;When &lt;/span&gt;I POST to /api/bmw/packages/m-sport with authentication headers
  &lt;span class="nf"&gt;Then &lt;/span&gt;the PricingCalculatorService should return updated totals
  &lt;span class="nf"&gt;And &lt;/span&gt;the frontend should display BMW-specific pricing components
  &lt;span class="nf"&gt;And &lt;/span&gt;the ConfiguratorState should persist to BMW backend systems

&lt;span class="c"&gt;# Mercedes contaminated approach&lt;/span&gt;
&lt;span class="kd"&gt;Feature&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; Mercedes MBUX Configurator Integration [SPEC-MB-456]
&lt;span class="kn"&gt;Background&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
  &lt;span class="nf"&gt;Given &lt;/span&gt;the Mercedes me connect platform is active
  &lt;span class="nf"&gt;And &lt;/span&gt;MBUX infotainment system is responsive
  &lt;span class="nf"&gt;And &lt;/span&gt;the pricing validation service confirms availability

&lt;span class="kn"&gt;Scenario&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; AMG package selection updates Mercedes pricing display
  &lt;span class="nf"&gt;Given &lt;/span&gt;I access the C-Class configurator through MBUX interface
  &lt;span class="nf"&gt;When &lt;/span&gt;the system processes AMG package selection via Mercedes API
  &lt;span class="nf"&gt;Then &lt;/span&gt;the integrated pricing module recalculates total cost
  &lt;span class="nf"&gt;And &lt;/span&gt;Mercedes-specific UI components reflect package changes
  &lt;span class="nf"&gt;And &lt;/span&gt;the selection persists in Mercedes customer profile system
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;The Problem&lt;/strong&gt;: These scenarios test implementation details, not user behaviour. A tester needs different knowledge to understand BMW and Mercedes scenarios. Users perform the same tasks, but the context differs.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;📌 &lt;strong&gt;Universal Behavior Insight&lt;/strong&gt;: When configuring a BMW 3-series or a Mercedes C-Class, users want to choose packages, check pricing updates, and identify conflicts. The implementation shows significant differences, but the user experience remains fundamentally the same.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  The Solution: Universal, Human-Focused Scenarios
&lt;/h3&gt;

&lt;p&gt;Here's what the same functionality looks like when focused on universal human behaviour:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight gherkin"&gt;&lt;code&gt;&lt;span class="c"&gt;# Works for BMW, Mercedes, Audi, or any car configurator&lt;/span&gt;
&lt;span class="kd"&gt;Feature&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; Vehicle Package Configuration [SPEC-123]

&lt;span class="kn"&gt;Scenario&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; Premium package selection updates pricing
  &lt;span class="nf"&gt;Given &lt;/span&gt;I am on the vehicle configuration page
  &lt;span class="nf"&gt;When &lt;/span&gt;I select the premium package
  &lt;span class="nf"&gt;Then &lt;/span&gt;I should see the updated total price
  &lt;span class="nf"&gt;And &lt;/span&gt;the premium package should be marked as selected

&lt;span class="kn"&gt;Scenario&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; Package conflict prevention
  &lt;span class="nf"&gt;Given &lt;/span&gt;I have selected a premium package
  &lt;span class="nf"&gt;When &lt;/span&gt;I attempt to select a conflicting economy package
  &lt;span class="nf"&gt;Then &lt;/span&gt;I should see a conflict warning message
  &lt;span class="nf"&gt;And &lt;/span&gt;the economy package should remain unselected
  &lt;span class="nf"&gt;And &lt;/span&gt;my original premium selection should be preserved

&lt;span class="kn"&gt;Scenario&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; Package removal affects pricing
  &lt;span class="nf"&gt;Given &lt;/span&gt;I have selected multiple packages
  &lt;span class="nf"&gt;When &lt;/span&gt;I remove the premium package
  &lt;span class="nf"&gt;Then &lt;/span&gt;the total price should decrease
  &lt;span class="nf"&gt;And &lt;/span&gt;the premium package should no longer appear selected
  &lt;span class="nf"&gt;And &lt;/span&gt;any dependent options should be automatically removed
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Domain Configuration Separation
&lt;/h3&gt;

&lt;p&gt;Behind the scenes, each company uses their specific domain configuration:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;BMW Domain Config:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"navigation_url"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://bmw.com/configurator"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"premium_package"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"M Sport Package"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"economy_package"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Efficiency Package"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"api_endpoint"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"BMW ConnectedDrive API"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"pricing_currency"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"EUR"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Mercedes Domain Config:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"navigation_url"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://mercedes-benz.com/configurator"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"premium_package"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"AMG Line Package"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"economy_package"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Eco Package"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"api_endpoint"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Mercedes me connect API"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"pricing_currency"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"EUR"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The same universal scenarios have different domain implementations. Testers understand them easily because they focus on human behaviour, not on technical details.&lt;/p&gt;

&lt;h2&gt;
  
  
  Garbage in and Garbage out
&lt;/h2&gt;

&lt;p&gt;Something that became clear was that some of our tickets were not so good. They're written in a given, when, then format, but, it's in a table and full of bullet points. Instead of writing bullet points, it's a blend of syntax. It will have 10 compound results, and some may contradict each other.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example of a "Bad Ticket":&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Given the user is on the config page 
When they add the M Sport Package 
Then
• The price updates
• The UI shows "M Sport"
• The PricingEngine service is called with SKU-123
• A confirmation modal appears (unless they are premium users)
• The total must not exceed credit limit in UserDB
• The economy package is disabled
• Loading spinner shows during calculation
• Error handling for network failures
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This ticket mixes UI behavior, API calls, business rules, and error handling - all in one "Then" clause. Some requirements clash. They test implementation details rather than user behaviour.&lt;/p&gt;

&lt;p&gt;From this I made the tool check the ticket before we could let the AI try and make scenarios from it.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;read it&lt;/li&gt;
&lt;li&gt;assess it&lt;/li&gt;
&lt;li&gt;apply rules&lt;/li&gt;
&lt;li&gt;Let the user decide what to do

&lt;ul&gt;
&lt;li&gt;Accept the badness&lt;/li&gt;
&lt;li&gt;See what the scenarios would look like&lt;/li&gt;
&lt;li&gt;Rewrite them using the single responsibility principal&lt;/li&gt;
&lt;li&gt;Stop&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Complete Workflow: From Jira Ticket to Executable Tests
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Task 1: Context Extraction in Action
&lt;/h3&gt;

&lt;p&gt;When a Jira ticket arrives, the AI agent (loaded only with analysis rules and domain context) creates a structured conversation log:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gu"&gt;## Requirements Analysis&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; REQ-001: User can select vehicle packages
&lt;span class="p"&gt;-&lt;/span&gt; REQ-002: Package selection updates total pricing
&lt;span class="p"&gt;-&lt;/span&gt; REQ-003: Conflicting packages show warning messages
&lt;span class="p"&gt;-&lt;/span&gt; REQ-004: Package removal updates pricing and dependencies

&lt;span class="gu"&gt;## Positive Test Scenarios Identified&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Premium package selection with pricing update
&lt;span class="p"&gt;-&lt;/span&gt; Multiple package selection and total calculation
&lt;span class="p"&gt;-&lt;/span&gt; Package upgrade scenarios

&lt;span class="gu"&gt;## Negative Test Scenarios Identified&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Conflicting package selection attempts
&lt;span class="p"&gt;-&lt;/span&gt; Invalid package combinations
&lt;span class="p"&gt;-&lt;/span&gt; Network error during selection

&lt;span class="gu"&gt;## Inferred Requirements (Agent Additions)&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Loading states during price calculation
&lt;span class="p"&gt;-&lt;/span&gt; Confirmation for expensive package selections
&lt;span class="p"&gt;-&lt;/span&gt; Package dependency validation
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Task 2: BDD Generation with Pattern-Led Prompting
&lt;/h3&gt;

&lt;p&gt;The agent then loads BDD generation rules (and only those rules) plus the Task 1 output. It uses the gold standards approach to create clear scenarios that focus on business language and user actions.&lt;/p&gt;

&lt;p&gt;The key insight: &lt;strong&gt;The AI already knows BDD structure.&lt;/strong&gt; I didn't need to teach "Given-When-Then." I just needed to steer it toward business-focused language and user-observable outcomes.&lt;/p&gt;

&lt;h3&gt;
  
  
  Task 3a: Behavioural Assessment - The Testing Filter
&lt;/h3&gt;

&lt;p&gt;This is where the &lt;strong&gt;Context Smartness&lt;/strong&gt; approach really shines. The agent loads only assessment criteria and applies strict behavioural filters:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Include for Automation:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Multi-step user workflows&lt;/li&gt;
&lt;li&gt;Cross-component integration tests&lt;/li&gt;
&lt;li&gt;Business process validation&lt;/li&gt;
&lt;li&gt;State persistence across actions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Exclude from Automation:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Single component behaviour (unit test territory)&lt;/li&gt;
&lt;li&gt;Subjective UX validation&lt;/li&gt;
&lt;li&gt;Accessibility testing (specialised tools needed)&lt;/li&gt;
&lt;li&gt;Performance without specific metrics&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Golden Rule&lt;/strong&gt;: Only test what you can control. Avoid putting product prices or names in automation since they change. Check that prices show correctly and names appear consistently. Focus on the user experience, not the system's internals. This aligns with "intent-based testing" principles, but I see it as common sense.&lt;/p&gt;

&lt;h3&gt;
  
  
  Task 3b: TAF Generation - From Human to Machine
&lt;/h3&gt;

&lt;p&gt;The final task loads technical patterns and turns approved behaviour scenarios into working automation code.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Generated Test Automation Framework (TAF) Code:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight gherkin"&gt;&lt;code&gt;&lt;span class="kn"&gt;Scenario&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; Premium package selection updates pricing
  &lt;span class="nf"&gt;Given &lt;/span&gt;I navigate to the package configuration page
  &lt;span class="nf"&gt;When &lt;/span&gt;I select the premium package option
  &lt;span class="nf"&gt;Then &lt;/span&gt;the pricing display should show updated costs
  &lt;span class="nf"&gt;And &lt;/span&gt;the premium package should appear selected
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Generated Infrastructure Report:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gu"&gt;## Required Page Objects&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; PackageConfigurationPage
&lt;span class="p"&gt;  -&lt;/span&gt; premiumPackageOption (data-testid="premium-package")
&lt;span class="p"&gt;  -&lt;/span&gt; pricingDisplay (data-testid="pricing-total")
&lt;span class="p"&gt;  -&lt;/span&gt; packageSelectionIndicator (data-testid="selected-indicator")

&lt;span class="gu"&gt;## Missing Step Definitions&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; "I select the premium package option"
&lt;span class="p"&gt;-&lt;/span&gt; "the premium package should appear selected"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;The agent gets you 80-90% of the way there, then humans add the final details.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Using State Diagrams so you and it can know what it does
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The Problem: AI Doesn't Know Your Application States
&lt;/h3&gt;

&lt;p&gt;The LLM only knows what it knows. If you ask it to write API requests from a spec snippet, it will try. But the result often seems fine, even though it's completely wrong. &lt;strong&gt;It doesn't know the states of your application.&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  The Solution: Plain English State Description
&lt;/h3&gt;

&lt;p&gt;I began explaining application states and process flows in plain English. I often used Figma designs since they show the actual state changes clearly.&lt;/p&gt;

&lt;p&gt;Then I asked the agent to create Mermaid state diagrams from the scenarios:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;graph TD
    A[Configuration Page] --&amp;gt; B[Premium Selected]
    B --&amp;gt; C[Pricing Updated]
    B --&amp;gt; D[Try Economy Selection]
    D --&amp;gt; E[Conflict Warning Displayed]
    E --&amp;gt; B[Return to Premium Selected]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;These diagrams showed missing state transitions&lt;/strong&gt; that weren't clear in Jira stories but were visible in Figma designs. The AI became better at identifying incomplete workflows and suggesting additional test scenarios.&lt;/p&gt;

&lt;h3&gt;
  
  
  Making the tasks self documenting
&lt;/h3&gt;

&lt;p&gt;I was much like a university student writing their software plan after the code. I had this amazing system, but only I knew what it did, and I'd only remember this for a while. So, I asked the AI to produce flow charts using Mermaid again.&lt;/p&gt;

&lt;p&gt;This allows others to understand what it does without reading a bunch of pseudo code. It also allows me to follow the paths through and debug problems. I quickly realised its value when I kept loading the domain. I checked it, made a decision, and then loaded the domain again for a more thorough check. 😒&lt;/p&gt;

&lt;h2&gt;
  
  
  Honest Assessment: What Actually Works
&lt;/h2&gt;

&lt;h3&gt;
  
  
  It's all new to me
&lt;/h3&gt;

&lt;p&gt;I've jumped into this without much preparation. I'll discuss this more in part 4. The first version was a mess, but it worked and showed that it was possible. When we needed to roll it out, I realised I couldn't because it was tied to my area of work.&lt;/p&gt;

&lt;p&gt;I learned what worked as I went along.&lt;/p&gt;

&lt;p&gt;We launched what I'll call version 2. This version is domain-aware and loads context. But, it had some early bugs. One major issue was that ticket assessment would always fail. This meant it wasn't loading the domain context. It was definitely a "it works on my machine" problem.&lt;/p&gt;

&lt;p&gt;I tried to strengthen the wording, but I know this only helps so much. The AI doesn't read like us; it sees everything as one long sentence. It also gives more weight to recent information.&lt;/p&gt;

&lt;p&gt;I needed to change how I executed tasks. In V3, everything is pseudo code. I'm considering that this could all be real code, using a basic markdown file for the AI.&lt;/p&gt;

&lt;h3&gt;
  
  
  Lesson 1: The Ripple Effect (F1 Car Analogy)
&lt;/h3&gt;

&lt;p&gt;I'm a big fan of F1, more of the design and off track stuff; the races can be pretty dull. What is clear is that changing the front wing affects other areas of the car.&lt;/p&gt;

&lt;p&gt;Making the domain loading perfect had the issue of the AI treating it as an override for garbage tickets. It would still allow poor tickets because it believed the extra domain stuff improved them. It didn't, it meant they were nonsense with the correct names.&lt;/p&gt;

&lt;h3&gt;
  
  
  Lesson 2: You Can't Automate Quality Control
&lt;/h3&gt;

&lt;p&gt;The domain stuff is seasoning on your pasta; it enhances the bland scenarios to reflect your specific business, but it cannot remedy fundamentally bad ingredients.&lt;/p&gt;

&lt;p&gt;So, the assessment had to change to pseudo code so that it understood the rules. I did try putting rule 5 before rule 4 (changed the numbers and everything), but the AI ignored it!&lt;/p&gt;

&lt;p&gt;Doing this stopped ticket processing. In a perfect world, this wouldn't occur. Everyone would create perfect scenarios. So I had to make it optional.&lt;/p&gt;

&lt;h3&gt;
  
  
  Lesson 3: The Human Must Be the Final Arbiter
&lt;/h3&gt;

&lt;p&gt;There's something that the system has to adhere to - The human has to make the decisions. It's why the test cases aren't limited, they are in priority order, but it makes everything. When something goes wrong, people won't tell off the AI.&lt;/p&gt;

&lt;p&gt;So, that's where the options I mentioned earlier came from.&lt;/p&gt;

&lt;p&gt;After this, rather than mess around, we changed all the other tasks to pseudo code.&lt;/p&gt;

&lt;h3&gt;
  
  
  Turning the AI on Itself: Unit Testing the Rules
&lt;/h3&gt;

&lt;p&gt;I tested all these changes manually, which was frustrating. Then, I got the AI to create some unit tests. It took good and bad examples, even tickets outside my domain. The AI generated expectations and made repeatable tests. Now, when the rules change, we can run these tests to check for any issues. We also recreate the mermaid diagrams, so we can see if the flow makes sense.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Wins ✅
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Consistency&lt;/strong&gt;: Generated scenarios follow the same patterns every time. No more confusion about why one tester says "Given I navigate to" and another says "Given the user accesses."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Speed&lt;/strong&gt;: Minutes instead of hours for complex features. What used to take an afternoon of careful scenario writing now happens in the time it takes to make coffee.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Creativity&lt;/strong&gt;: The agent spots edge cases that humans often overlook. It often detects state changes, error conditions, and user journey differences not included in the original requirements. When you focus on user behaviour instead of technical details, you naturally uncover more realistic test scenarios. This is what intent-based testing advocates have always claimed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Documentation&lt;/strong&gt;: Creates the specifications that were missing. Generated BDD scenarios are often clearer than the original Jira tickets. They become the source of truth for what the feature does.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Onboarding&lt;/strong&gt;: New team members understand features faster. Universal, behaviour-focused scenarios are self-documenting in ways that implementation-specific tests aren't.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Ongoing Challenges ⚠️
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Domain Drift&lt;/strong&gt;: The AI loves to just go for it. Before you know it, there are domain-specific details creeping into what should be universal patterns. You have to watch what the AI does if you're changing rules.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Edge Case Handling&lt;/strong&gt;: Still needs human review for unusual scenarios. The AI excels at common patterns but struggles with genuinely unique business logic.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Context Maintenance&lt;/strong&gt;: Domain configurations need regular updates. As products evolve, the mappings between universal patterns and specific implementations require ongoing care.&lt;/p&gt;

&lt;h3&gt;
  
  
  What It Doesn't Fix ❌
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Process Problems&lt;/strong&gt;: Technical solutions don't fix workflow issues. If your requirements are unclear, arrive late, or change constantly, AI won't solve those basic communication problems.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Human Communication&lt;/strong&gt;: Still need clear specs and acceptance criteria. The AI amplifies the quality of your inputs - it doesn't create clarity from chaos.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Domain Expertise&lt;/strong&gt;: Agent can't replace understanding your business. It can apply patterns consistently, but someone still needs to know whether the business logic makes sense.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Practical Implementation Guide
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Create Your Gold Standards
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Pick your best existing BDD scenarios&lt;/li&gt;
&lt;li&gt;Clean them to perfection&lt;/li&gt;
&lt;li&gt;Document why they're good&lt;/li&gt;
&lt;li&gt;Use these as training examples&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Build Task-Based Rules
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Extract minimal rules from gold standards&lt;/li&gt;
&lt;li&gt;Create focused rule sets per task&lt;/li&gt;
&lt;li&gt;Test with lazy loading approach&lt;/li&gt;
&lt;li&gt;Measure consistency improvements&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Implement the Full Workflow
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Task 1&lt;/strong&gt;: Context extraction and analysis&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Task 2&lt;/strong&gt;: Human-readable BDD generation&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Task 3a&lt;/strong&gt;: Behavioural assessment&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Task 3b&lt;/strong&gt;: Automation code generation&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Measure and Refine
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Compare generated vs manual scenarios&lt;/li&gt;
&lt;li&gt;Track consistency metrics&lt;/li&gt;
&lt;li&gt;Identify remaining edge cases&lt;/li&gt;
&lt;li&gt;Refine rules based on actual usage&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  What's Coming in Part 4
&lt;/h2&gt;

&lt;p&gt;The framework works, the cats stay in formation, and the scenarios are consistent. But here's what really got me thinking: &lt;strong&gt;I accidentally solved fundamental AI problems that every developer faces.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;It's annoying talking to an AI and it takes off half-cocked and does something you don't want. Turns out, I wasn't alone in this frustration.&lt;/p&gt;

&lt;p&gt;In Part 4, I'll reveal:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;The Context Rot discovery&lt;/strong&gt;: How I identified performance degradation months before it was documented&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The market irony&lt;/strong&gt;: Why simple solutions to real problems get overlooked while flashy AI tools get all the funding&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;What I learned about AI reliability&lt;/strong&gt; that applies to any system trying to get consistent behavior&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;From frustrated tester&lt;/strong&gt; to accidentally solving problems I didn't know had names&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The real breakthrough wasn't just herding cats - it was discovering that my specific frustrations were actually universal AI challenges.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Paul Coles is a software tester who proved that universal BDD patterns work across domains when separated from implementation details. In Part 3, he demonstrates the complete framework in action with real examples and honest assessment of what works and what doesn't. His cat now stays mostly in the designated areas.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  🐾 Series Navigation
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Part 1: Why AI Starts Making Stuff Up&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
&lt;em&gt;The cat has opinions — and your postcode formatting rules aren't one of them.&lt;/em&gt;&lt;br&gt;&lt;br&gt;
&lt;a href="https://dev.tolink-to-part-1"&gt;Read it →&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Part 2: Show, Don't Tell: Teaching AI with Better Examples&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
&lt;em&gt;Bribing the cat with gold standards and smaller piles of paper.&lt;/em&gt;&lt;br&gt;&lt;br&gt;
[Read it →]&lt;a href="https://dev.to/paul_coles_633f698b10fd6e/the-subtle-art-of-herding-cats-show-dont-tell-teaching-ai-by-example-part-2-of-4-ing"&gt;https://dev.to/paul_coles_633f698b10fd6e/the-subtle-art-of-herding-cats-show-dont-tell-teaching-ai-by-example-part-2-of-4-ing&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Part 3: How I Made My AI Stop Guessing&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
&lt;em&gt;Teaching the cat one trick at a time with task-focused training.&lt;/em&gt;&lt;br&gt;&lt;br&gt;
&lt;strong&gt;(you are here)&lt;/strong&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Part 4: The More You Say, the Less It Learns&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
&lt;em&gt;When you talk too much, the cat stops listening — and invents new requirements instead.&lt;/em&gt;&lt;br&gt;&lt;br&gt;
&lt;em&gt;(Coming soon)&lt;/em&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Photo by &lt;a href="https://unsplash.com/@birteliu?utm_content=creditCopyText&amp;amp;utm_medium=referral&amp;amp;utm_source=unsplash" rel="noopener noreferrer"&gt;Birte Liu&lt;/a&gt; on &lt;a href="https://unsplash.com/photos/man-feeding-pigeons-G2p3VWUYG8o?utm_content=creditCopyText&amp;amp;utm_medium=referral&amp;amp;utm_source=unsplash" rel="noopener noreferrer"&gt;Unsplash&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>testing</category>
      <category>softwareengineering</category>
    </item>
    <item>
      <title>The Subtle Art of Herding Cats: Show, Don’t Tell: Teaching AI by Example (Part 2 of 4)</title>
      <dc:creator>Paul Coles</dc:creator>
      <pubDate>Wed, 06 Aug 2025 20:38:22 +0000</pubDate>
      <link>https://forem.com/paul_coles_633f698b10fd6e/the-subtle-art-of-herding-cats-show-dont-tell-teaching-ai-by-example-part-2-of-4-ing</link>
      <guid>https://forem.com/paul_coles_633f698b10fd6e/the-subtle-art-of-herding-cats-show-dont-tell-teaching-ai-by-example-part-2-of-4-ing</guid>
      <description>&lt;h2&gt;
  
  
  Quick Recap: The Problems We Discovered
&lt;/h2&gt;

&lt;p&gt;In Part 1, I learned the hard way that giving AI 47 rules is like trying to get a cat to do well, much of anything. In Part 2, I'll show you the approach that finally made it behave: gold standards and lazy loading.&lt;/p&gt;

&lt;h2&gt;
  
  
  Show, Don't Tell
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The Failed Approach: Death by Documentation
&lt;/h3&gt;

&lt;p&gt;My first idea was to write complete rules. Hundreds of detailed instructions covering every possible scenario, edge case, and formatting requirement. The AI nodded politely and ignored most of it.&lt;/p&gt;

&lt;p&gt;The talks were exhausting:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Me&lt;/strong&gt;: "Why didn't you follow the naming rules?"&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Agent&lt;/strong&gt;: "Which ones? There were several different patterns mentioned."&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Me&lt;/strong&gt;: "The ones in section 4.2.1 about specification references!"&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Agent&lt;/strong&gt;: "I focused on the examples in section 6.3 instead."&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Sound familiar? I was trying to teach by explanation rather than showing examples.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Breakthrough Moment
&lt;/h3&gt;

&lt;p&gt;Instead of writing more rules, I tried something different. I created one perfect example of what I wanted:&lt;/p&gt;

&lt;p&gt;It didn't look exactly like this, this is just a tribute.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Gold Standard BDD Scenario:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight gherkin"&gt;&lt;code&gt;&lt;span class="kd"&gt;Feature&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; Product Configuration [SPEC-123]

&lt;span class="kn"&gt;Scenario&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; Premium package selection shows correct pricing
  &lt;span class="nf"&gt;Given &lt;/span&gt;I am on the product configuration page
  &lt;span class="nf"&gt;When &lt;/span&gt;I select the premium package
  &lt;span class="nf"&gt;Then &lt;/span&gt;I should see the premium pricing displayed
  &lt;span class="nf"&gt;And &lt;/span&gt;the package should be marked as selected
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then I asked the agent: &lt;strong&gt;"Look at this gold standard. What rules do you need to reproduce this quality?"&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The agent looked through the rules and Instead of 300 lines of stuff, the AI found 10 key principles:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use clear, business-focused language&lt;/li&gt;
&lt;li&gt;Follow Given-When-Then structure
&lt;/li&gt;
&lt;li&gt;Include specification references&lt;/li&gt;
&lt;li&gt;Focus on user-observable outcomes&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;📌 &lt;strong&gt;Pattern-Led Prompting Discovery&lt;/strong&gt;: AI agents learn better from perfect examples than from detailed explanations. Show the destination, let them find the path.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  The Domain Separation Breakthrough
&lt;/h3&gt;

&lt;p&gt;The gold standard looked simple, but there was hidden cleverness. Some elements needed to come from domain configuration, not be hardcoded in the pattern:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Universal Pattern:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight gherkin"&gt;&lt;code&gt;&lt;span class="kd"&gt;Feature&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; Vehicle Package Configuration [SPEC-123]

&lt;span class="kn"&gt;Scenario&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; Premium package selection shows correct pricing
  &lt;span class="err"&gt;Given I am on the [DOMAIN&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="err"&gt;configuration_page_url]&lt;/span&gt;
  &lt;span class="err"&gt;When I select the [DOMAIN&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="err"&gt;premium_package_name]&lt;/span&gt;  
  &lt;span class="err"&gt;Then I should see the [DOMAIN&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="err"&gt;pricing_display_format]&lt;/span&gt; &lt;span class="err"&gt;updated&lt;/span&gt;
  &lt;span class="err"&gt;And the package should show [DOMAIN&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="err"&gt;selection_indicator_state]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Domain Configuration:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"configuration_page_url"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"vehicle configuration page"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"premium_package_name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"M Sport Package"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"pricing_display_format"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"total pricing"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; 
  &lt;/span&gt;&lt;span class="nl"&gt;"selection_indicator_state"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"as selected"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For an online sock store, the same pattern works with different domain values:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Product categories: "athletic socks"&lt;/li&gt;
&lt;li&gt;Size options: "Size 8-10"
&lt;/li&gt;
&lt;li&gt;Error messages: "Sorry, out of stock in your size"&lt;/li&gt;
&lt;li&gt;UI elements: "add to basket button"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;A BDD pattern is universal your domain makes it real.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;These domain mappings improve the BDD's accuracy and relevance to your specific context, making scenarios immediately useful rather than generic templates. Other domains will have completely different maps, allowing the AI to recognise universal patterns while adapting to your specific terminology.&lt;/p&gt;

&lt;p&gt;Previously, this context was scattered throughout the rules - a BMW configurator conversation would gradually contaminate generic BDD patterns with "M Sport Package" references. In the new approach, domain specifics live in dedicated files.&lt;/p&gt;

&lt;h3&gt;
  
  
  "Show, Don't Tell" in Practice
&lt;/h3&gt;

&lt;p&gt;This discovery matched a principle I'd written in my framework: &lt;strong&gt;"Show, don't tell."&lt;/strong&gt; &lt;/p&gt;

&lt;p&gt;Instead of explaining what makes good BDD, I showed the agent perfect examples and let it find the patterns. The AI understands what BDD is, I don't need to tell it that. What it does need to know is what &lt;em&gt;I&lt;/em&gt; want from it.&lt;/p&gt;

&lt;p&gt;But I learned there's a big difference between &lt;strong&gt;guidelines&lt;/strong&gt; (flexible suggestions) and &lt;strong&gt;rules&lt;/strong&gt; (mandatory requirements). For things that absolutely had to happen, I needed very clear language.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lazy Loading Context: The Architecture That Changed Everything
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The Problem: Context Explosion
&lt;/h3&gt;

&lt;p&gt;Using 25% of Amazon Q's context window was like trying to have a conversation at a concert. Too many distractions, too much noise, too many competing priorities.&lt;/p&gt;

&lt;p&gt;The AI was drowning in:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;✓ Generic BDD patterns         ✓ Domain-specific mappings
✓ Assessment criteria          ✓ Implementation details  
✓ Quality gates               ✓ Error handling
✓ Naming conventions          ✓ Edge case handling
✓ Reporting structures        ✓ 38 more categories...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  The Solution: Task-Based Context Loading
&lt;/h3&gt;

&lt;p&gt;I realised the agent needed &lt;strong&gt;focused context per task&lt;/strong&gt;, not everything at once. Here's the architecture that worked:&lt;/p&gt;

&lt;h4&gt;
  
  
  Base Context (Always Available)
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Task execution framework&lt;/li&gt;
&lt;li&gt;Conversation logging patterns&lt;/li&gt;
&lt;li&gt;State management rules&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Dynamic Context (Loaded Per Task)
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;Task 1: Context Extraction&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Load&lt;/strong&gt;: Analysis rules + Relevant Domain context only&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Goal&lt;/strong&gt;: Extract requirements from Jira
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Output&lt;/strong&gt;: Structured conversation log&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Blocked&lt;/strong&gt;: BDD patterns, automation rules, technical details&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Task 2: BDD Generation&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Load&lt;/strong&gt;: BDD patterns + Task 1 output only&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Goal&lt;/strong&gt;: Create human-readable scenarios&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Output&lt;/strong&gt;: Feature files for manual testing&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Blocked&lt;/strong&gt;: Automation assessment, technical implementation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Task 3a: Behavioural Assessment&lt;/strong&gt; &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Load&lt;/strong&gt;: Assessment criteria + Task 2 output only&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Goal&lt;/strong&gt;: Determine automation suitability&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Output&lt;/strong&gt;: Automation assessment report&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Blocked&lt;/strong&gt;: Code generation patterns, implementation details&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Task 3b: Test Automation Generation&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Load&lt;/strong&gt;: Technical patterns + approved scenarios only&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Goal&lt;/strong&gt;: Create executable automation code
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Output&lt;/strong&gt;: Test Automation-compatible feature files&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Blocked&lt;/strong&gt;: Analysis rules, BDD guidelines&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Context Smartness Principle&lt;/strong&gt;: Each task gets exactly the context it needs - no more, no less. No competing priorities, no overwhelming rule sets.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  The Performance Impact
&lt;/h3&gt;

&lt;p&gt;The difference was dramatic:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Before&lt;/strong&gt;: 25% context usage, inconsistent results, mysterious failures&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;After&lt;/strong&gt;: &amp;lt;5% context per task, reliable patterns, predictable behaviour&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The AI went from confused and unreliable to focused and consistent. Each task could concentrate on its specific job without distraction.&lt;/p&gt;

&lt;h2&gt;
  
  
  Pseudocode Rules: When You Absolutely Must Be Obeyed
&lt;/h2&gt;

&lt;p&gt;If there is one thing you take from this, it's if you &lt;em&gt;really&lt;/em&gt; need the AI to follow rules use pseudo code and not just words.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Guidelines vs Rules Problem
&lt;/h3&gt;

&lt;p&gt;Even with focused context, the AI would still treat critical requirements as optional suggestions. Natural language left too much room for creative interpretation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Before (Ignored):&lt;/strong&gt;&lt;br&gt;
"Please assess each scenario carefully, considering automation suitability and technical feasibility."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;After (Followed Religiously):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;FOR EACH scenario IN bdd_scenarios:
    IF scenario.type == "accessibility":
        EXCLUDE(scenario, reason="specialized_tools_required")
    ELIF scenario.complexity == "single_component": 
        EXCLUDE(scenario, reason="unit_test_territory")
    ELSE:
        ASSESS(scenario, gates=[0,1,2,3])
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Mandatory Language That Works
&lt;/h3&gt;

&lt;p&gt;For absolutely critical processes, I learned to use clear commanding language:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;'MANDATORY'&lt;/strong&gt; - not "should" or "please"&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;'ZERO TOLERANCE'&lt;/strong&gt; - not "try to avoid" &lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;'AUTOMATIC EXCLUSIONS'&lt;/strong&gt; - not "generally not recommended"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The AI follows pseudocode and explicit commands while treating natural language as flexible guidance. But, each instance of commanding language is backed up with pseudo code.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Conversation State Pattern
&lt;/h3&gt;

&lt;p&gt;The breakthrough was having the agent create a structured conversation log in Task 1:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gu"&gt;## Requirements Analysis&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; REQ-001: User can select premium packages
&lt;span class="p"&gt;-&lt;/span&gt; REQ-002: Pricing updates when packages change  
&lt;span class="p"&gt;-&lt;/span&gt; REQ-003: Conflicts prevent invalid combinations

&lt;span class="gu"&gt;## Positive Test Scenarios&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Premium package selection
&lt;span class="p"&gt;-&lt;/span&gt; Price calculation accuracy
&lt;span class="p"&gt;-&lt;/span&gt; Package combination validation

&lt;span class="gu"&gt;## Negative Test Scenarios  &lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Invalid package combinations
&lt;span class="p"&gt;-&lt;/span&gt; Error handling for unavailable options

&lt;span class="gu"&gt;## Inferred Requirements (Agent Additions)&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Loading states during price calculation
&lt;span class="p"&gt;-&lt;/span&gt; Confirmation dialogs for expensive options
&lt;span class="p"&gt;-&lt;/span&gt; Network error handling
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  The "Made Up" Requirements Solution: Embracing AI Creativity
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The Unexpected Discovery
&lt;/h3&gt;

&lt;p&gt;Here's something that surprised me: the agent kept adding requirements that weren't in the original spec. My first idea was to stop this behavior.&lt;/p&gt;

&lt;p&gt;Instead, I asked it to &lt;strong&gt;share invented requirements in a separate section.&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  The Value of AI Inference
&lt;/h3&gt;

&lt;p&gt;Sometimes these "made up" requirements were brilliant:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"What happens during price calculation loading?"&lt;/li&gt;
&lt;li&gt;"Should there be confirmation for expensive options?"
&lt;/li&gt;
&lt;li&gt;"How do we handle network errors?"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The agent was thinking like a tester, finding gaps in specifications. I learned to embrace this creativity rather than suppress it - but keep it clearly labeled so humans could check the suggestions.&lt;/p&gt;

&lt;h2&gt;
  
  
  Don't Limit the AI (But Do Limit Its Authority)
&lt;/h2&gt;

&lt;p&gt;Something I decided early on: even though the AI can update Jira, I don't want it to. That's taking away too much control and will make people lazy. I need humans to decide whether other humans should do a test or not.&lt;/p&gt;

&lt;p&gt;Put it this way: if you make the AI limit tests to only those that are "important," and something goes wrong, it won't be the AI that gets told off.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The AI's role&lt;/strong&gt;: Prioritise and recommend&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Your role&lt;/strong&gt;: Make the final calls&lt;/p&gt;

&lt;p&gt;Yes, the AI puts a priority on its creations. They're in an order, but &lt;em&gt;you&lt;/em&gt; decide what actually gets done. The AI can be creative with requirements, suggest test scenarios, and even rate automation suitability, but humans retain control over the decisions that matter.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The AI won't be the one getting told off&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Quick Wins You Can Implement Today
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Create Your Gold Standard
&lt;/h3&gt;

&lt;p&gt;Find your best existing BDD scenario and clean it to perfection. This becomes your teaching example.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Extract Minimal Rules
&lt;/h3&gt;

&lt;p&gt;Ask your AI: "What rules do you need to reproduce this quality?" You'll get 5-10 essential principles instead of 300 lines of documentation.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Separate Domain from Pattern
&lt;/h3&gt;

&lt;p&gt;Identify what's universal (user actions, observable results) vs domain-specific (product names, URLs, error messages). Put domain details in separate configuration.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Use Pseudocode for Critical Logic
&lt;/h3&gt;

&lt;p&gt;Replace "Please assess carefully" with explicit IF/THEN logic for anything that must happen without exception. You don't have to write the code, just write bullet points and ask it to make the code&lt;/p&gt;

&lt;h2&gt;
  
  
  What's Coming in Part 3
&lt;/h2&gt;

&lt;p&gt;These solutions sound good in theory, but do they actually work in practice? In Part 3, I'll show you:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Real before/after examples&lt;/strong&gt;: Contaminated scenarios transformed into universal patterns&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The framework in action&lt;/strong&gt;: Complete workflow from Jira ticket to executable tests
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Honest assessment&lt;/strong&gt;: What actually works, ongoing challenges, and what it doesn't fix&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;What Not to Automate: Smarter Test Filtering&lt;/strong&gt;: How to decide what should be automated vs tested manually&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The cats are starting to line up, but the real test is whether they stay in formation when facing real-world complexity.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Paul Coles is a software tester who discovered that AI agents respond better to examples than explanations. In Part 2, he reveals the specific techniques that transformed chaotic AI behavior into reliable, consistent output. His actual cat learned to use the litter tray but still ignores most other commands.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  🐾 Series Navigation
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Part 1: Why AI Starts Making Stuff Up&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
&lt;em&gt;The cat has opinions — and your postcode formatting rules aren't one of them.&lt;/em&gt;&lt;br&gt;&lt;br&gt;
[Read it →] &lt;a href="https://dev.to/paul_coles_633f698b10fd6e/the-subtle-art-of-herding-cats-why-ai-agents-ignore-your-rules-part-1-of-4-5fhd"&gt;https://dev.to/paul_coles_633f698b10fd6e/the-subtle-art-of-herding-cats-why-ai-agents-ignore-your-rules-part-1-of-4-5fhd&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Part 2: Show, Don’t Tell: Teaching AI with Better Examples&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
&lt;em&gt;Bribing the cat with gold standards and smaller piles of paper.&lt;/em&gt;&lt;br&gt;&lt;br&gt;
&lt;strong&gt;← You are here&lt;/strong&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Part 3: How I Made My AI Stop Guessing&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
&lt;em&gt;Teaching the cat one trick at a time with task-focused training.&lt;/em&gt;&lt;br&gt;&lt;br&gt;
&lt;em&gt;(Coming soon)&lt;/em&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Part 4: The More You Say, the Less It Learns&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
&lt;em&gt;When you talk too much, the cat stops listening — and invents new requirements instead.&lt;/em&gt;&lt;br&gt;&lt;br&gt;
&lt;em&gt;(Coming soon)&lt;/em&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Photo by &lt;a href="https://unsplash.com/@jontyson?utm_content=creditCopyText&amp;amp;utm_medium=referral&amp;amp;utm_source=unsplash" rel="noopener noreferrer"&gt;Jon Tyson&lt;/a&gt; on &lt;a href="https://unsplash.com/photos/teal-and-white-graffiti-wall-QgoNPoH1v4c?utm_content=creditCopyText&amp;amp;utm_medium=referral&amp;amp;utm_source=unsplash" rel="noopener noreferrer"&gt;Unsplash&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>testing</category>
      <category>llm</category>
      <category>softwareengineering</category>
    </item>
    <item>
      <title>The Subtle Art of Herding Cats: Why AI Agents Ignore Your Rules (Part 1 of 4)</title>
      <dc:creator>Paul Coles</dc:creator>
      <pubDate>Mon, 28 Jul 2025 14:47:12 +0000</pubDate>
      <link>https://forem.com/paul_coles_633f698b10fd6e/the-subtle-art-of-herding-cats-why-ai-agents-ignore-your-rules-part-1-of-4-5fhd</link>
      <guid>https://forem.com/paul_coles_633f698b10fd6e/the-subtle-art-of-herding-cats-why-ai-agents-ignore-your-rules-part-1-of-4-5fhd</guid>
      <description>&lt;h2&gt;
  
  
  TL;DR My AI Training Hurdles
&lt;/h2&gt;

&lt;p&gt;I spent months training an AI to create BDD tests (and more). I discovered that AIs are like keen cats - they forget instructions when given too many commands. This is Part 1 of my journey from chaos to &lt;strong&gt;Context Smartness&lt;/strong&gt;. Parts 2-4 cover the solutions, framework, and market implications.&lt;/p&gt;

&lt;h2&gt;
  
  
  For the patient test automation engineers, the test leads who buy into the process, and the AI workflow engineers building the infrastructure, this information is valuable
&lt;/h2&gt;

&lt;p&gt;Different readers will benefit in different ways:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Practitioners&lt;/strong&gt; will learn specific patterns for training AI systems&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Leads&lt;/strong&gt; will understand why AI initiatives often fail and what makes them successful&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Engineers&lt;/strong&gt; will see systematic approaches to AI reliability and context management&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Hopes vs. The Reality: When Smart Cats Act Dumb
&lt;/h2&gt;

&lt;p&gt;I started this journey with a simple dream: get AI to read specs for me. I hadn't worked with multi-page specs for years. How hard could it be? The agent would understand context, follow rules, and produce perfect scenarios every time.&lt;/p&gt;

&lt;p&gt;Reality check: AI agents are like cats. They're very clever, but they have their own views on which rules count. They will ignore you when it suits them.&lt;/p&gt;

&lt;h2&gt;
  
  
  From First Attempts to Hard-Earned Lessons
&lt;/h2&gt;

&lt;h3&gt;
  
  
  It's AI right? It's clever
&lt;/h3&gt;

&lt;p&gt;I began treating the AI like an equal - a brilliant colleague who needed proper instruction. I worked with them. I made clear rules, shared important context, and expected steady results.&lt;/p&gt;

&lt;p&gt;The conversations went like this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Me&lt;/strong&gt;: "Here are 47 detailed rules for writing BDD scenarios"&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Agent&lt;/strong&gt;: "Got it! I understand perfectly"&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Agent&lt;/strong&gt;: &lt;em&gt;Proceeds to lowercase postcodes for mysterious reasons&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Me&lt;/strong&gt;: "Why did you do that? There's nothing in the domain file about lowercase postcodes"&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Agent&lt;/strong&gt;: "There are too many rules. I can't follow everything."&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It was like chatting with a polite cat. It nods along, but then it still knocks your coffee mug off the table.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Cat Rule Discovered&lt;/strong&gt;: The Cat Rule isn't about counting rules literally. It's about the competing instructions. You may have 100 formatting guidelines, but the AI only needs to make about 10 types of decisions at once.&lt;/p&gt;

&lt;p&gt;I had thousands of BDD rules. They asked the AI to manage formatting, domain knowledge, quality checks, and technical implementation all at the same time. It's no surprise it made strange choices!&lt;/p&gt;

&lt;h2&gt;
  
  
  🐱 The Cat Rule: Maximum 10 Instructions
&lt;/h2&gt;

&lt;h3&gt;
  
  
  ❌ What I Did Wrong (47 Rules)
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Follow BDD patterns&lt;/li&gt;
&lt;li&gt;Use domain mappings
&lt;/li&gt;
&lt;li&gt;Apply quality gates&lt;/li&gt;
&lt;li&gt;Handle errors properly
...and 43 more rules! 🤯&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Result&lt;/strong&gt;: AI ignored most rules, performed poorly&lt;/p&gt;

&lt;h3&gt;
  
  
  ✅ What Actually Works (8 Rules)
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Use clear business language&lt;/li&gt;
&lt;li&gt;Follow Given-When-Then&lt;/li&gt;
&lt;li&gt;Include spec references&lt;/li&gt;
&lt;li&gt;Focus on user outcomes&lt;/li&gt;
&lt;li&gt;Apply MANDATORY rules&lt;/li&gt;
&lt;li&gt;Use domain config&lt;/li&gt;
&lt;li&gt;Check quality gates
&lt;/li&gt;
&lt;li&gt;Generate clean scenarios&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Result&lt;/strong&gt;: Consistent, reliable AI behavior! 🌈🦄&lt;/p&gt;

&lt;h3&gt;
  
  
  The More You Say, the Less It Hears
&lt;/h3&gt;

&lt;p&gt;As I refined the rules over time, they grew and grew. I was using 25% of Amazon Q's available context window after some time. The agent was drowning in information:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;Rules included:
✓ Generic BDD patterns         ✓ Domain-specific mappings
✓ Assessment criteria          ✓ Implementation details
✓ Quality gates               ✓ Error handling
✓ Naming conventions          ✓ Edge case handling
✓ Reporting structures        ✓ ... and 38 more categories
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The problem became clear: &lt;strong&gt;The agent forgets things when the context is too large.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;AIs have extensive knowledge, but their application lacks consistency when overwhelmed. If I'm asking the AI about task 4, it still has to sort through the stuff for tasks 1 to 3 and all the other conventions. It's like asking a cat to follow 47 commands at once. They ignore most, and the noise makes them mess up the few they do hear.&lt;/p&gt;

&lt;p&gt;Context rot identified: Adding more input tokens can hurt AI performance. I noticed this months before I found out it had a name. For more details, visit &lt;a href="https://research.trychroma.com/context-rot" rel="noopener noreferrer"&gt;https://research.trychroma.com/context-rot&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;In my case, if you were to ask the AI why it did something, it could find the rule it should have applied. It's like the needle in a haystack test. But what it couldn't do was apply this rule at the right time with all the other stuff it needed to do.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why Your AI Starts Making Stuff Up
&lt;/h3&gt;

&lt;p&gt;The breaking point came when I attempted to ensure the agent used domain context consistently. It kept making odd choices:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Lowercase postcodes (nowhere in the rules)&lt;/li&gt;
&lt;li&gt;Technical error messages in human-readable scenarios&lt;/li&gt;
&lt;li&gt;Treating mandatory rules as optional guidelines&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But the real problem wasn't just using the wrong things at the wrong time. I had unknowingly created a perfect storm of conflicting information and massive context load that was literally making the AI dumber.&lt;/p&gt;

&lt;p&gt;Training the AI via conversations about 'why' things happened had contaminated what should be "generic" BDD patterns with car configurator specifics, package bundle terminology, and React SPA assumptions. But worse, the huge amount of context was hurting performance in ways I didn't understand then.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The tool was becoming domain-specific instead of universally applicable, AND performing worse as context expanded.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Lightbulb Moment: It's More of a Guideline
&lt;/h2&gt;

&lt;p&gt;Then it struck me - the lightbulb turned on. &lt;strong&gt;I was treating the AI as an equal, but it isn't.&lt;/strong&gt; It's not as smart as I thought in the way I thought (my internal monologue about it was less charitable).&lt;/p&gt;

&lt;p&gt;The agent needed different training than a human colleague would. Even with focused context loading, the AI perceived key requirements as optional unless the language was clearly commanding. Or not applying mandatory rules - they were more guidelines.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pattern-Led Prompting Principle&lt;/strong&gt;: AI agents respond better to examples than explanations, tables and pseudocode better than natural language for complex logic.&lt;/p&gt;

&lt;p&gt;This discovery would lead to what I now call &lt;strong&gt;Context Smartness&lt;/strong&gt; - providing exactly the right information, at the right time, in the right amount.&lt;/p&gt;

&lt;p&gt;This Pattern-Led Prompting Principle underpins the 'Show, Don't Tell' method. I will explain this in Part 2, with examples taking the place of lengthy documentation.&lt;/p&gt;

&lt;h2&gt;
  
  
  Key Discoveries from Part 1
&lt;/h2&gt;

&lt;p&gt;Through months of frustrating talks with my AI agent, I found several basic principles:&lt;/p&gt;

&lt;h3&gt;
  
  
  The Cat Rule
&lt;/h3&gt;

&lt;p&gt;Never give AI more than 10 competing instructions. Beyond this point, performance drops as the agent struggles to work out what matters most.&lt;/p&gt;

&lt;h3&gt;
  
  
  Context Rot
&lt;/h3&gt;

&lt;p&gt;Adding more input context actually makes AIs perform worse - not better. I was using 25% of Amazon Q's context window and wondering why my "smart" agent was getting dumber. &lt;a href="https://research.trychroma.com/context-rot" rel="noopener noreferrer"&gt;https://research.trychroma.com/context-rot&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Domain Contamination
&lt;/h3&gt;

&lt;p&gt;Generic rules slowly pick up domain-specific details through repeated conversations. This makes tools less reusable and adds to context bloat. You have to watch how the rules are formed and make sure they're generic.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Guidelines vs Rules Problem
&lt;/h3&gt;

&lt;p&gt;AI agents treat everything as flexible unless you use very clear commanding language. "Please assess carefully" becomes optional; "MANDATORY: ASSESS(scenario, gates=[0,1,2,3])" gets followed.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's Coming Next
&lt;/h2&gt;

&lt;p&gt;In the remaining parts of this series, I'll show you exactly how I solved these problems:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Part 2&lt;/strong&gt;: "Show, Don’t Tell: Teaching AI by Example" - The breakthrough solutions that actually worked&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Part 3&lt;/strong&gt;: "How I Turned Chaos Into a Repeatable Test Process" - Real examples with BMW vs Mercedes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Part 4&lt;/strong&gt;: "Context Rot and the Billion Dollar Opportunity" - Why these solutions matter for the AI industry&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each problem in Part 1 has a matching solution. The cat can be herded, but not the way you think.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Foundation for What's Next
&lt;/h2&gt;

&lt;p&gt;The problems I discovered - Context Rot, domain contamination, the guidelines vs rules confusion - aren't just BDD testing issues. They're fundamental AI reliability challenges that affect any system trying to get consistent behaviour from large language models.&lt;/p&gt;

&lt;p&gt;In Part 2, I'll show you the &lt;strong&gt;Context Smartness&lt;/strong&gt; approach that solved all of these problems: focused examples instead of comprehensive rules, task-based lazy loading, and the magic of "show, don't tell."&lt;/p&gt;

&lt;p&gt;The cats stayed in formation, but it took understanding their psychology first.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Paul Coles is a software tester who accidentally discovered several AI reliability patterns while trying to automate BDD scenario generation. In this 4-part series, he shares the systematic approach that transformed unpredictable AI behaviour into reliable, consistent output. His actual cat still ignores most commands.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>testing</category>
      <category>llm</category>
      <category>softwareengineering</category>
    </item>
    <item>
      <title>AI-Assisted Testing: A Survival Guide to Implementing MCP with Atlassian Tools</title>
      <dc:creator>Paul Coles</dc:creator>
      <pubDate>Tue, 03 Jun 2025 19:06:40 +0000</pubDate>
      <link>https://forem.com/paul_coles_633f698b10fd6e/ai-assisted-testing-a-survival-guide-to-implementing-mcp-with-atlassian-tools-2gnm</link>
      <guid>https://forem.com/paul_coles_633f698b10fd6e/ai-assisted-testing-a-survival-guide-to-implementing-mcp-with-atlassian-tools-2gnm</guid>
      <description>&lt;h2&gt;
  
  
  1. Introduction
&lt;/h2&gt;

&lt;p&gt;Still battling 'Wagile' processes, a skewed developer-to-tester ratio, and specs that could double as doorstops? You're not alone. This technical deep-dive builds on our previous discussion to show you exactly how to implement AI-assisted testing using Model Context Protocol (MCP) within your existing Atlassian ecosystem.&lt;/p&gt;

&lt;h3&gt;
  
  
  What You'll Achieve
&lt;/h3&gt;

&lt;p&gt;By the end of this guide, you'll have a working AI assistant that can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Parse Confluence specifications to identify test scenarios and acceptance criteria&lt;/li&gt;
&lt;li&gt;Read your Jira tickets and extract acceptance criteria&lt;/li&gt;
&lt;li&gt;Generate comprehensive test cases in your team's format&lt;/li&gt;
&lt;li&gt;Identify edge cases you might have missed&lt;/li&gt;
&lt;li&gt;Create test documentation that actually gets used&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Why MCP Over Generic AI Tools
&lt;/h3&gt;

&lt;p&gt;You can paste specifications into ChatGPT or use Copilot, but MCP has key benefits:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Direct Integration&lt;/strong&gt;: No copy-pasting between tools&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Context Awareness&lt;/strong&gt;: Understands your specific project structure&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reduced Hallucination&lt;/strong&gt;: Works with actual data, and rules, not assumptions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pattern Learning&lt;/strong&gt;: Adapts to your team's test framework&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  What This Guide Covers
&lt;/h3&gt;

&lt;p&gt;This isn't about fixing broken processes—that's a cultural challenge. This is about practical implementation:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Setting up MCP to connect with your Atlassian stack&lt;/li&gt;
&lt;li&gt;Configuring secure connections through corporate networks&lt;/li&gt;
&lt;li&gt;Creating your first AI-generated test suite&lt;/li&gt;
&lt;li&gt;Measuring the impact on your team's velocity&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  2. What You'll Need Before Starting
&lt;/h2&gt;

&lt;h3&gt;
  
  
  MCP Atlassian
&lt;/h3&gt;

&lt;p&gt;It's worth reading about the MCP sever that will be used: &lt;a href="https://github.com/sooperset/mcp-atlassian" rel="noopener noreferrer"&gt;https://github.com/sooperset/mcp-atlassian&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It's distributed as a Docker image, which means that the setup is generally very simple. The only rinkles may be corporate certificates, which we talk about later.&lt;/p&gt;

&lt;h3&gt;
  
  
  Docker
&lt;/h3&gt;

&lt;p&gt;First up, you'll need Docker installed on your machine. Whether you're using Docker Desktop or Docker Engine doesn't matter—both work fine. If you don't have Docker, go to &lt;a href="https://docs.docker.com/get-docker" rel="noopener noreferrer"&gt;docs.docker.com/get-docker&lt;/a&gt;. Then, follow the installation guide for your operating system.&lt;/p&gt;

&lt;p&gt;Why use Docker? It keeps the MCP server running the same way on any setup. Plus, it manages all your dependencies without cluttering your system.&lt;/p&gt;

&lt;h3&gt;
  
  
  Amazon Q CLI
&lt;/h3&gt;

&lt;p&gt;Next, you'll need the Amazon Q Command Line Interface. Installation instructions are here: [&lt;a href="https://docs.aws.amazon.com/amazonq/latest/qdeveloper-ug/command-line-installing.html" rel="noopener noreferrer"&gt;https://docs.aws.amazon.com/amazonq/latest/qdeveloper-ug/command-line-installing.html&lt;/a&gt;]. If they're set up right, you can run &lt;code&gt;q --version&lt;/code&gt; without any errors.&lt;/p&gt;

&lt;h3&gt;
  
  
  Atlassian Access
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;For Jira:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A Jira account with API access (most regular accounts have this)&lt;/li&gt;
&lt;li&gt;A Personal Access Token (PAT)—not your password!&lt;/li&gt;
&lt;li&gt;The URL of your Jira instance (e.g., &lt;a href="https://yourcompany.atlassian.net" rel="noopener noreferrer"&gt;https://yourcompany.atlassian.net&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;For Confluence:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Like Jira: account, PAT, and URL&lt;/li&gt;
&lt;li&gt;Make sure you can access the spaces containing your specifications&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Don't know how to create a PAT? In Atlassian, go to your account settings, find "Security," and look for "API tokens." Create one and save it somewhere secure—you can't view it again!&lt;/p&gt;

&lt;h3&gt;
  
  
  Corporate Certificates
&lt;/h3&gt;

&lt;p&gt;Here's the fun part—if you're behind a corporate firewall, you'll need your company's SSL certificates. They're needed to allow the Docker container to trust your corporate proxy/firewall for outbound connections. Usually you'll have set them up when you started, probably in a certs folder in your home directory. They usually come as &lt;code&gt;.crt&lt;/code&gt;, &lt;code&gt;.pem&lt;/code&gt; or &lt;code&gt;.cerfiles&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Setting Up Your Environment
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Creating Your Configuration File
&lt;/h3&gt;

&lt;p&gt;In the project directory, you'll find a file called &lt;code&gt;.env-example&lt;/code&gt;. This is your template. Copy it to create your actual configuration:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cp&lt;/span&gt; .env-example .env
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Adding Your Credentials
&lt;/h3&gt;

&lt;p&gt;Open the &lt;code&gt;.env&lt;/code&gt; file in your IDE. You'll see something like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;JIRA_URL=https://yourcompany.atlassian.net
JIRA_PAT=your-jira-personal-access-token-here
CONFLUENCE_URL=https://yourcompany.atlassian.net/wiki
CONFLUENCE_PAT=your-confluence-personal-access-token-here
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Replace the placeholder values with your actual credentials. A few tips:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Don't include quotes around the values&lt;/li&gt;
&lt;li&gt;Make sure there are no trailing spaces&lt;/li&gt;
&lt;li&gt;Double-check your URLs—they should be the base URLs without any paths&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Understanding Configuration Options
&lt;/h3&gt;

&lt;p&gt;You'll also see some SSL-related settings:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SSL_VERIFY=true
CERT_PATH=/path/to/certificates
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you're behind a corporate firewall, you may need to set &lt;code&gt;SSL_VERIFY=false&lt;/code&gt; for testing. But for production use, keep it true and provide the correct certificate path.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;Warning&lt;/em&gt;&lt;/strong&gt;: Setting SSL_VERIFY=false significantly reduces security and should never be used in production or for sensitive data. It's only for temporary testing to isolate certificate issues. Always strive to provide the correct certificate path and keep SSL_VERIFY=true&lt;/p&gt;

&lt;h2&gt;
  
  
  4. Building and Running with Docker
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Preparing Certificates
&lt;/h3&gt;

&lt;p&gt;If you use corporate certificates like Zscaler, we need to build the docker image ourselves. Place certificates in your project root for Docker to access them. If you already have certificates elsewhere (like ~/certs), you can modify the Dockerfile to copy from that location instead.&lt;/p&gt;

&lt;p&gt;The Docker build process will automatically add these certificates. This lets the container make secure connections through your corporate network.&lt;/p&gt;

&lt;h3&gt;
  
  
  Building the Docker Image
&lt;/h3&gt;

&lt;p&gt;With your certificates in place, run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker build &lt;span class="nt"&gt;-t&lt;/span&gt; mcp-atlassian-with-zscaler &lt;span class="nb"&gt;.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This command does several things:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Creates a Python environment&lt;/li&gt;
&lt;li&gt;Installs all necessary dependencies&lt;/li&gt;
&lt;li&gt;Configures your certificates&lt;/li&gt;
&lt;li&gt;Sets up the MCP server&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The first build might take a few minutes.&lt;/p&gt;

&lt;p&gt;If the build fails, it's usually because of missing certificates or network issues. Check the error messages—they're surprisingly helpful.&lt;/p&gt;

&lt;h2&gt;
  
  
  5. Connecting to Amazon Q
&lt;/h2&gt;

&lt;p&gt;With your Docker container built, it's time to connect everything together.&lt;/p&gt;

&lt;h3&gt;
  
  
  Starting Your First Chat Session
&lt;/h3&gt;

&lt;p&gt;Open your terminal and run the following command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;q chat
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When you run this command, Amazon Q automatically:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Reads your MCP configuration from ./.amazonq/mcp.json.&lt;/li&gt;
&lt;li&gt;Establishes connections to your configured tools, including the MCP server.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Example mcp.json file
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"mcpServers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"mcp_atlassian"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"docker"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"args"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"run"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"-i"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"--rm"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"--env-file"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;".env"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"mcp-atlassian-with-zscaler"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"disabled"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;"The MCP server runs inside Docker and communicates via stdin/stdout, so no port configuration is needed unless you're customising the setup."&lt;/p&gt;

&lt;h3&gt;
  
  
  Understanding the Initialisation
&lt;/h3&gt;

&lt;p&gt;You should see something like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;✓ mcp_atlassian loaded in 2.40 s
✓ 2 of 2 mcp servers initialized.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each checkmark means a successful connection. If you see any errors here, it usually means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Your credentials are incorrect&lt;/li&gt;
&lt;li&gt;The URLs in your &lt;code&gt;.env&lt;/code&gt; file are wrong&lt;/li&gt;
&lt;li&gt;Network/firewall issues&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  If there are errors
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Ask Amazon Q about it&lt;/li&gt;
&lt;li&gt;Open docker desktop, look for the containers, select yours and click &lt;code&gt;logs&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Verifying Your Connections
&lt;/h3&gt;

&lt;p&gt;Once connected, try a simple query to verify everything works:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;What are my open Jira tickets?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;If Q responds with your actual tickets, congratulations! You're connected. If not, check your Jira PAT and URL in the &lt;code&gt;.env&lt;/code&gt; file.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;At this point the MCP will probably ask for permission to run a &lt;code&gt;tool&lt;/code&gt;. A tool is a coded method to communicate.&lt;/li&gt;
&lt;li&gt;When asked press &lt;code&gt;t&lt;/code&gt; to trust for this session&lt;/li&gt;
&lt;li&gt;or &lt;code&gt;y&lt;/code&gt; for this question&lt;/li&gt;
&lt;li&gt;You can configure the permissions of tools in amazong q from the chat [&lt;a href="https://docs.aws.amazon.com/amazonq/latest/qdeveloper-ug/command-line-chat-tools.html" rel="noopener noreferrer"&gt;https://docs.aws.amazon.com/amazonq/latest/qdeveloper-ug/command-line-chat-tools.html&lt;/a&gt;]&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  6. Using AI for Test Generation
&lt;/h2&gt;

&lt;p&gt;Now for the fun part—actually using this setup to generate tests.&lt;/p&gt;

&lt;h3&gt;
  
  
  Example Queries That Work Well
&lt;/h3&gt;

&lt;p&gt;Here are some queries that deliver immediate value:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Basic Jira queries:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"What are my open assigned tickets, list them in priority order"&lt;/li&gt;
&lt;li&gt;"Show me all tickets in the current sprint for project XYZ"&lt;/li&gt;
&lt;li&gt;"What tickets are blocked?"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Test generation from tickets:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"What tests should I do for &amp;lt;ticket ID&amp;gt;"&lt;/li&gt;
&lt;li&gt;"Generate test cases for the acceptance criteria in PROJ-12345"&lt;/li&gt;
&lt;li&gt;"What edge cases should I consider for ticket &amp;lt;ticket ID&amp;gt;?"&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Referencing Confluence Specifications
&lt;/h3&gt;

&lt;p&gt;When you have detailed specs in Confluence, you can get specific:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;In the spec &amp;lt;confluence page id&amp;gt; in section 1.8.5, what test cases and edge cases should I be looking for?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The AI will:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Fetch the specific Confluence page&lt;/li&gt;
&lt;li&gt;Find the exact section you mentioned&lt;/li&gt;
&lt;li&gt;Analyse the requirements&lt;/li&gt;
&lt;li&gt;Generate relevant test cases&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Combining Jira Tickets with Specifications
&lt;/h3&gt;

&lt;p&gt;This is where the real power shows up. You can cross-reference:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;For ? and the spec &amp;lt;confluence page id&amp;gt; in section  1.8.5, are there any requirements missing or unclear? What test cases should we do for this?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The AI will:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Pull the Jira ticket details&lt;/li&gt;
&lt;li&gt;Fetch the Confluence specification&lt;/li&gt;
&lt;li&gt;Compare acceptance criteria with spec requirements&lt;/li&gt;
&lt;li&gt;Identify gaps or inconsistencies&lt;/li&gt;
&lt;li&gt;Suggest comprehensive test coverage&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Finding Gaps in Requirements
&lt;/h3&gt;

&lt;p&gt;One of the most valuable uses is identifying what's missing:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"Looking at ticket XYZ-123 and its linked specification, what acceptance criteria might be missing?"&lt;/li&gt;
&lt;li&gt;"Based on the UI mockups in CONF-789 and the requirements in JIRA-456, what scenarios aren't covered?"&lt;/li&gt;
&lt;li&gt;"What questions should I ask the product owner about this feature?"&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  7. Working with Chat Sessions
&lt;/h2&gt;

&lt;p&gt;Amazon Q can keep context between sessions. This is very helpful for ongoing testing tasks.&lt;/p&gt;

&lt;h3&gt;
  
  
  Resuming Previous Conversations
&lt;/h3&gt;

&lt;p&gt;When you're in a project directory, Q remembers your previous conversations. To continue where you left off:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;q chat &lt;span class="nt"&gt;--resume&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You'll see a summary like:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;We discussed your assigned Jira tickets related to configuring cars then examined the rules in your .q folder that define test case formats and bundle selection conventions for your project.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Understanding Context Retention
&lt;/h3&gt;

&lt;p&gt;Q maintains context within a folder, which means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Your test patterns are remembered&lt;/li&gt;
&lt;li&gt;Previous queries inform new responses&lt;/li&gt;
&lt;li&gt;You can build on earlier work without re-explaining&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Best Practices for Ongoing Projects
&lt;/h3&gt;

&lt;p&gt;To get the most from context retention:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Use project-specific folders&lt;/strong&gt;: Keep each project's chats separate&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Start sessions with context&lt;/strong&gt;: "We're testing the car configurator feature"&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reference previous work&lt;/strong&gt;: "Using the test pattern we established yesterday..."&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Build incrementally&lt;/strong&gt;: Start with simple tests, then add complexity&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  8. Troubleshooting Common Issues
&lt;/h2&gt;

&lt;p&gt;Things don't always work perfectly. Here's how to fix common problems.&lt;/p&gt;

&lt;h3&gt;
  
  
  Connection Problems
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Symptom&lt;/strong&gt;: MCP servers fail to initialise&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Common causes:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Incorrect URLs in &lt;code&gt;.env&lt;/code&gt; file&lt;/li&gt;
&lt;li&gt;Network firewall blocking connections&lt;/li&gt;
&lt;li&gt;Docker not running&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Verify Docker is running: &lt;code&gt;docker ps&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Try with &lt;code&gt;SSL_VERIFY=false&lt;/code&gt; temporarily to isolate certificate issues&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Authentication Errors
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Symptom&lt;/strong&gt;: "401 Unauthorized" or "403 Forbidden" errors&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Common causes:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Expired or incorrect PAT&lt;/li&gt;
&lt;li&gt;Insufficient permissions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Generate a new PAT in Atlassian&lt;/li&gt;
&lt;li&gt;Ensure the token has read access to projects/spaces you need&lt;/li&gt;
&lt;li&gt;Update &lt;code&gt;.env&lt;/code&gt; file with new token&lt;/li&gt;
&lt;li&gt;Rebuild Docker image&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Certificate Issues
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Symptom&lt;/strong&gt;: SSL verification errors&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Common causes:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Missing corporate certificates&lt;/li&gt;
&lt;li&gt;Certificates in wrong location&lt;/li&gt;
&lt;li&gt;Expired certificates&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Get latest certificates from IT&lt;/li&gt;
&lt;li&gt;Place in project root directory&lt;/li&gt;
&lt;li&gt;Rebuild Docker image&lt;/li&gt;
&lt;li&gt;Ensure &lt;code&gt;CERT_PATH&lt;/code&gt; in &lt;code&gt;.env&lt;/code&gt; is correct&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  9. Next Steps
&lt;/h2&gt;

&lt;p&gt;You're up and running—now what?&lt;/p&gt;

&lt;h3&gt;
  
  
  Quick Wins to Try First
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Generate tests for your most complex feature&lt;/strong&gt;: Pick that configurator or rules engine that everyone avoids testing manually&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Document existing features&lt;/strong&gt;: Use AI to create test documentation for features that were "completed" without proper test cases&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Find test gaps&lt;/strong&gt;: Run your existing test suites through AI analysis to identify missing scenarios&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Building Evidence for Process Improvement
&lt;/h3&gt;

&lt;p&gt;Track these metrics from day one:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Time to create test cases (before vs. after)&lt;/li&gt;
&lt;li&gt;Number of edge cases identified&lt;/li&gt;
&lt;li&gt;Defects found using AI-generated tests&lt;/li&gt;
&lt;li&gt;Time saved per sprint&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Use this data to show management the value of:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Better developer-to-tester ratios&lt;/li&gt;
&lt;li&gt;Time for exploratory testing&lt;/li&gt;
&lt;li&gt;Investment in test automation&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Expanding Usage Across Your Team
&lt;/h3&gt;

&lt;p&gt;Start small and grow:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Begin with one complex feature&lt;/li&gt;
&lt;li&gt;Share success stories in team meetings&lt;/li&gt;
&lt;li&gt;Create team-specific prompt templates&lt;/li&gt;
&lt;li&gt;Schedule brown-bag sessions to train others&lt;/li&gt;
&lt;li&gt;Build a library of effective queries&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Remember: This tool doesn't fix broken processes, but it gives you breathing room to work on what really matters—building quality into your products from the start.&lt;/p&gt;

&lt;p&gt;The path forward isn't about replacing testers with AI. It's about amplifying what testers do best: thinking critically about quality, understanding user needs, and finding the problems that matter.&lt;/p&gt;

&lt;h2&gt;
  
  
  10. Getting more advanced
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Amazon Q like most LLMs thrives on rules.&lt;/li&gt;
&lt;li&gt;Rules are markdown files that can either be global or project/repo based.&lt;/li&gt;
&lt;li&gt;Rules are stored in the &lt;code&gt;./.amazonq/rules&lt;/code&gt; folder.&lt;/li&gt;
&lt;li&gt;In a chat Q will load the rules, but sometimes it may forget about some of them, which is why your prompts matter.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Building rules
&lt;/h3&gt;

&lt;p&gt;I would start off small and describe the interations of a feature in human readable terms. We tend to have states and interactions described in Figma. Without these interactions Q doesn't know what your system looks like, which means it may not be able to write the cases how you'd like.&lt;/p&gt;

&lt;p&gt;We have a framework that takes BDD steps and translates those into automated UI tests in webdriver IO, for example&lt;/p&gt;

&lt;p&gt;For example, if your framework has a step like: Then the user clicks the &lt;code&gt;continue&lt;/code&gt; button on the configuration page where &lt;code&gt;continue&lt;/code&gt; refers to a specific UI element (a 'locator') within a 'page object' for the &lt;code&gt;configuration&lt;/code&gt; page. Your rules would teach Q about these conventions, allowing it to generate BDD steps that directly translate to your existing automation framework.&lt;/p&gt;

&lt;p&gt;With this Q knows the layout of our page objects and the rules of how we write BDD tests.&lt;/p&gt;

&lt;p&gt;I taught it these rules through trial and error some of the time, where it would produce something that looked correct, but was missing a something, perhaps the way it checked something. Using the chat interface I would ask why it did something, and work through the problem with the LLM. At the end I would ask it to update the rules.&lt;/p&gt;

&lt;h3&gt;
  
  
  Understanding the business
&lt;/h3&gt;

&lt;p&gt;One thing to bear in mind, is that the buisiness when they view you test cases don't want to see detailed steps with &lt;code&gt;code&lt;/code&gt; in them, they want something human readable. That's something the LLM can do using the knowledge you taught it of interactions and states from Figma and the understanding of how you structure your tests.&lt;/p&gt;

&lt;p&gt;We can now ask the LLM to produce cases that can run automation, but also human readable test cases. It can even produce them within the same request.&lt;/p&gt;

&lt;h3&gt;
  
  
  Prompt Engineering
&lt;/h3&gt;

&lt;p&gt;Sometimes the LLM doesn't always do what you ask, which is why propmts matter. If you've been working with the LLM within a chat, you can probably just say &lt;code&gt;make a test case about putting in a light bulb&lt;/code&gt; and it'll do it. But, prompts really help&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight gherkin"&gt;&lt;code&gt;&lt;span class="err"&gt;Create&lt;/span&gt; &lt;span class="nf"&gt;a &lt;/span&gt;BDD test case for [TICKET_ID/valid car configurations] following our framework conventions.

&lt;span class="kn"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;span class="err"&gt;- Framework&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="err"&gt;WebdriverIO&lt;/span&gt; &lt;span class="err"&gt;with&lt;/span&gt; &lt;span class="err"&gt;BDD&lt;/span&gt; &lt;span class="err"&gt;(Cucumber)&lt;/span&gt;
&lt;span class="err"&gt;- Domain&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="err"&gt;Bundle&lt;/span&gt; &lt;span class="err"&gt;selection&lt;/span&gt; &lt;span class="err"&gt;testing&lt;/span&gt;
&lt;span class="err"&gt;- Scope&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="err"&gt;Up&lt;/span&gt; &lt;span class="err"&gt;to&lt;/span&gt; &lt;span class="err"&gt;and&lt;/span&gt; &lt;span class="err"&gt;including&lt;/span&gt; &lt;span class="err"&gt;finalising&lt;/span&gt; &lt;span class="err"&gt;configuration&lt;/span&gt;

&lt;span class="err"&gt;Requirements&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;span class="err"&gt;1.&lt;/span&gt; &lt;span class="err"&gt;Use&lt;/span&gt; &lt;span class="err"&gt;current&lt;/span&gt; &lt;span class="err"&gt;valid&lt;/span&gt; &lt;span class="err"&gt;car&lt;/span&gt; &lt;span class="err"&gt;configurations&lt;/span&gt; &lt;span class="err"&gt;from&lt;/span&gt; &lt;span class="err"&gt;our&lt;/span&gt; &lt;span class="err"&gt;business&lt;/span&gt; &lt;span class="err"&gt;rules&lt;/span&gt;
&lt;span class="err"&gt;2.&lt;/span&gt; &lt;span class="err"&gt;Follow&lt;/span&gt; &lt;span class="err"&gt;standard&lt;/span&gt; &lt;span class="err"&gt;Given/When/&lt;/span&gt;&lt;span class="nf"&gt;Then &lt;/span&gt;structure with proper indentation
&lt;span class="err"&gt;3.&lt;/span&gt; &lt;span class="err"&gt;Include&lt;/span&gt; &lt;span class="err"&gt;page&lt;/span&gt; &lt;span class="err"&gt;navigation&lt;/span&gt; &lt;span class="err"&gt;verification&lt;/span&gt; &lt;span class="err"&gt;at&lt;/span&gt; &lt;span class="err"&gt;each&lt;/span&gt; &lt;span class="err"&gt;step&lt;/span&gt;
&lt;span class="err"&gt;4. Add strategic screenshot steps for&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
   &lt;span class="err"&gt;-&lt;/span&gt; &lt;span class="err"&gt;Initial&lt;/span&gt; &lt;span class="err"&gt;state&lt;/span&gt;
   &lt;span class="err"&gt;-&lt;/span&gt; &lt;span class="err"&gt;After&lt;/span&gt; &lt;span class="err"&gt;key&lt;/span&gt; &lt;span class="err"&gt;user&lt;/span&gt; &lt;span class="err"&gt;actions&lt;/span&gt;
   &lt;span class="err"&gt;-&lt;/span&gt; &lt;span class="err"&gt;Final&lt;/span&gt; &lt;span class="err"&gt;state&lt;/span&gt; &lt;span class="err"&gt;verification&lt;/span&gt;
&lt;span class="err"&gt;5. Clearly mark any new page object elements needed beneath the test&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="err"&gt;// NEW ELEMENT NEEDED&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="err"&gt;[description]&lt;/span&gt;

&lt;span class="err"&gt;Mandatory inclusions&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;span class="err"&gt;-&lt;/span&gt; &lt;span class="err"&gt;Initial&lt;/span&gt; &lt;span class="err"&gt;test&lt;/span&gt; &lt;span class="err"&gt;setup&lt;/span&gt; &lt;span class="err"&gt;steps&lt;/span&gt;
&lt;span class="err"&gt;-&lt;/span&gt; &lt;span class="err"&gt;Page&lt;/span&gt; &lt;span class="err"&gt;load&lt;/span&gt; &lt;span class="err"&gt;verification&lt;/span&gt;
&lt;span class="err"&gt;-&lt;/span&gt; &lt;span class="nf"&gt;Data &lt;/span&gt;validation steps
&lt;span class="err"&gt;-&lt;/span&gt; &lt;span class="err"&gt;Error&lt;/span&gt; &lt;span class="err"&gt;handling&lt;/span&gt; &lt;span class="err"&gt;considerations&lt;/span&gt;

&lt;span class="err"&gt;Output format&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;span class="kd"&gt;Feature&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; [Feature name]
  &lt;span class="kn"&gt;Scenario&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; [Scenario name]
    Given...
    When...
    Then...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



</description>
      <category>mcp</category>
      <category>testing</category>
      <category>automation</category>
      <category>bdd</category>
    </item>
    <item>
      <title>AI-Assisted Testing: A Lifeline in the Waterfall-Sprint Hybrid Chaos</title>
      <dc:creator>Paul Coles</dc:creator>
      <pubDate>Thu, 29 May 2025 22:32:06 +0000</pubDate>
      <link>https://forem.com/paul_coles_633f698b10fd6e/ai-assisted-testing-a-lifeline-in-the-waterfall-sprint-hybrid-chaos-406k</link>
      <guid>https://forem.com/paul_coles_633f698b10fd6e/ai-assisted-testing-a-lifeline-in-the-waterfall-sprint-hybrid-chaos-406k</guid>
      <description>&lt;h2&gt;
  
  
  TL;DR for Busy Testers
&lt;/h2&gt;

&lt;p&gt;In 30 seconds: We're drowning in a 5:1 developer-to-tester ratio with massive specs and fake sprints. AI-assisted test generation isn't fixing our broken process, but it's buying us time to breathe, test what matters, and build evidence for real change.&lt;/p&gt;

&lt;h2&gt;
  
  
  Who This Is For
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Test leads drowning in work with insufficient resources&lt;/li&gt;
&lt;li&gt;QA managers trying to justify better processes to leadership.
&lt;/li&gt;
&lt;li&gt;Teams stuck in "Wagile" looking for practical survival tactics.&lt;/li&gt;
&lt;li&gt;Anyone who's been told to "just automate everything".&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Key Terms
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;MCP (Model Context Protocol)&lt;/strong&gt;: Anthropic's system for connecting LLMs to your tools&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Wagile&lt;/strong&gt;: Waterfall pretending to be Agile&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Horizontal Slicing&lt;/strong&gt;: Building all of one layer before moving to the next&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;TAF&lt;/strong&gt;: Test Automation Framework (your team's specific patterns)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Part 1: The Problem
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The Reality Check: When Agile Isn't Really Agile
&lt;/h3&gt;

&lt;p&gt;You're calling it "Agile," but it seems more like waterfall dressed as a sprint. You're familiar with the symptoms:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Massive specifications that could double as doorstops.&lt;/li&gt;
&lt;li&gt;Sprint planning ignores testing capacity&lt;/li&gt;
&lt;li&gt;Developers pack the sprint based on their bandwidth&lt;/li&gt;
&lt;li&gt;Features arrive in testing two sprints late&lt;/li&gt;
&lt;li&gt;"Just automate it all!" echoes from above&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Does this Sound familiar?&lt;/p&gt;

&lt;h3&gt;
  
  
  The Tick-Tock Death March
&lt;/h3&gt;

&lt;p&gt;Here's how it plays out in this horizontal slice world:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Sprint 1:&lt;/strong&gt; Developers put in place package logic. Everything looks great in isolation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Sprint 2:&lt;/strong&gt; The team builds mutual exclusion rules. Still seems fine — packages conflict as expected.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Sprint 4:&lt;/strong&gt; The basket logic finally arrives. Out of the blue, nothing works properly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Problem:&lt;/strong&gt; You can't test the configurator end-to-end because it's not a vertical slice. The packages were stubbed when the basket didn't exist. Now the basket exists, but the stubs are wrong. You're finding integration problems four sprints late. At the same time, you're testing the new features for Sprint 5.&lt;/p&gt;

&lt;p&gt;Meanwhile, the testing tick-tock continues:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tick:&lt;/strong&gt; Developers grab work from the massive spec, packing the sprint. (After all, developers have to code; it can't be about flow, it has to be about being busy.) Code flies, and pull requests pile up.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tock:&lt;/strong&gt; Testers rush to handle half the tasks from Sprint 1. Meanwhile, they are also working on Sprint 2. You might be examining the specs in detail for the first time. After all, it's hard to remember 20,000 words in a 60-minute walk-through. You're checking a Figma, is it the right one? Is it up-to-date.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tick:&lt;/strong&gt; Developer work increases. The testing from the last sprint isn't done, and new features keep arriving.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tock:&lt;/strong&gt; The deadline looms, and you're seeing features for the first time, two sprints late. Is it correct? Is it good? Who knows — there's no time to find out.&lt;/p&gt;

&lt;h3&gt;
  
  
  Case Study: The Car Configurator Complexity
&lt;/h3&gt;

&lt;p&gt;Imagine testing a system where:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You can choose individual options (leather seats, sunroof, premium audio).&lt;/li&gt;
&lt;li&gt;OR you can choose packages (Sport Pack, Luxury Pack, Winter Pack).&lt;/li&gt;
&lt;li&gt;BUT packages have mutual exclusions (can't have Sport and Eco pack).&lt;/li&gt;
&lt;li&gt;AND some individual options conflict with packages.&lt;/li&gt;
&lt;li&gt;PLUS pricing changes based on combinations.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Part 2: Why Traditional Solutions Fail
&lt;/h2&gt;

&lt;h3&gt;
  
  
  "Just Automate Everything!"
&lt;/h3&gt;

&lt;p&gt;As testers, we hear management say "automate everything!" But here's what they don't see...&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Reading specifications for the first time as testing starts.&lt;/li&gt;
&lt;li&gt;Checking the implementation to ensure it matches the Figma designs.&lt;/li&gt;
&lt;li&gt;Writing test cases while executing them.&lt;/li&gt;
&lt;li&gt;Dealing with the backlog of "completed" work that we have never tested.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It's like someone telling you to build a ladder while you fall down a cliff.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Hidden Multipliers: Why It's Even Worse Than You Think
&lt;/h3&gt;

&lt;p&gt;Beyond the automation dream, we face chaos that worsens the testing crisis:&lt;/p&gt;

&lt;h4&gt;
  
  
  The Revolving Door Problem
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Team members change all the time, but no one tells you who has left or joined.&lt;/li&gt;
&lt;li&gt;New faces pop up in stand-ups without any introductions.&lt;/li&gt;
&lt;li&gt;Knowledge walks away, taking important context with it.&lt;/li&gt;
&lt;li&gt;You’re left figuring out who does what by trial and error.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  "Agile" Theatre
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Detailed specs show up fully formed (hello, waterfall!).&lt;/li&gt;
&lt;li&gt;Stand-ups turn into 15-minute individual status reports.&lt;/li&gt;
&lt;li&gt;The term "collaboration" appears in slides but not in practice.&lt;/li&gt;
&lt;li&gt;You haven’t seen a retrospective in months (do they even happen?).&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  The Onboarding Black Hole
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;"Here's your laptop. Good luck!"&lt;/li&gt;
&lt;li&gt;No team directory, no architecture overview, no transfer of domain knowledge just videos about something, but you're not sure, they never say.&lt;/li&gt;
&lt;li&gt;You learn by osmosis and asking the same questions many times.&lt;/li&gt;
&lt;li&gt;Six weeks in, you’re still uncovering critical systems you should test.&lt;/li&gt;
&lt;li&gt;There's the automation system without a read.me, the word doc doesn't work, you have to work out what packages it needs through trial and error.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Process? What process?
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;UI discrepancies pop up during testing (surprise!).&lt;/li&gt;
&lt;li&gt;Requirements live in Confluence, Jira, Slack, and someone's head.&lt;/li&gt;
&lt;li&gt;We design interactions in Figma, but which one, and is it up to date?

&lt;ul&gt;
&lt;li&gt;Every day you ask "Should it have this?"&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;Feedback loops are so long they feel like feedback spirals.&lt;/li&gt;

&lt;li&gt;"We will improve the process after this release" (spoiler: they won't).&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;Then management wonders why adding more testers doesn't help. A week later, the new hires are still catching up. They are reading specs, watching meeting recordings, and trying to grasp the odd "legacy service that sometimes fails."&lt;/p&gt;

&lt;p&gt;We can’t automate our way out – we can’t even communicate our way in.&lt;/p&gt;

&lt;h2&gt;
  
  
  Part 3: Enter AI-Assisted Testing
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Not a Silver Bullet, but a Pressure Release Valve
&lt;/h3&gt;

&lt;p&gt;AI-assisted test generation isn't about achieving testing nirvana; it's about survival.&lt;/p&gt;

&lt;h3&gt;
  
  
  "Can't I just use Copilot or a public Large Language Model (LLM) for this?"
&lt;/h3&gt;

&lt;p&gt;You absolutely can. Tools like Copilot or direct ChatGPT interaction are easy to access. They can read open pages or take pasted specifications. This makes them quick for generating simple test cases or brainstorming ideas. But, they come with significant drawbacks in a production testing environment:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;High Hallucination Risk:&lt;/strong&gt; General-purpose LLMs don't have specific context about your Jira tickets, Confluence specs, or your team's test patterns. They can generate test cases that sound good but are often wrong, irrelevant, or made up.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;No Integration:&lt;/strong&gt; They don't work directly with your project management or documentation tools. This leads to a lot of manual copy-pasting of context.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;No Learning from Your Patterns:&lt;/strong&gt; They won't adapt to your team's test automation framework (TAF) step patterns. This leads to inconsistent tests that are harder to maintain.&lt;/p&gt;

&lt;p&gt;This is the specific area where MCP demonstrates its strengths. It aims to close that gap by offering a more reliable and context-aware solution.&lt;/p&gt;

&lt;h3&gt;
  
  
  From Spec to Test in Minutes, Not Days
&lt;/h3&gt;

&lt;p&gt;When you check the configurator section in the spec, you won't waste hours writing test cases. Instead, you extract rules from the spec:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"Sport Pack includes premium audio."&lt;/li&gt;
&lt;li&gt;"Winter Pack excludes Summer Pack."&lt;/li&gt;
&lt;li&gt;"Electric engine can't have a Sport Pack."&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;From Figma, you get the process flow and state changes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"If you pick wood trim and change to Summer Pack, then the system shows an 'are you sure' prompt."&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;AI creates test permutations. This lets you focus on the key question: Does this look like what's in Figma? Also, does it make sense for users?&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Catch-Up is Possible:&lt;/strong&gt; If you notice features two sprints late, AI-generated tests can help you evidence your testing well.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Documentation Comes from Chaos:&lt;/strong&gt; The AI-generated tests create the documentation that was missing. New team members (or you, three sprints later) can understand what the feature does.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Some Automation Is Better Than None:&lt;/strong&gt; You can't automate all tasks, but AI can ease the heavy load of configuration testing.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Part 4: Implementation Reality
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Building Team Capability, Not Replacing It
&lt;/h3&gt;

&lt;p&gt;Janet Gregory and Lisa Crispin emphasise that quality is woven into the fabric of what we do. AI doesn't weave that fabric — teams do. This tool handles the mundane threading so we can focus on the patterns.&lt;/p&gt;

&lt;p&gt;Yes, our process is broken. But at least we have those large specs - we know what we're supposed to be doing. Our stories mostly have acceptance criteria (though when there are 300 changes and 2 ACs, something's wrong).&lt;/p&gt;

&lt;p&gt;The point is: AI gives us breathing room to work on the real problem - the cultural shift toward continuous quality.&lt;/p&gt;

&lt;h3&gt;
  
  
  How MCP Works (In Plain English)
&lt;/h3&gt;

&lt;p&gt;Think of MCP as a translator between your project tools and AI:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;It reads your Jira tickets and Confluence specs&lt;/li&gt;
&lt;li&gt;It understands your team's test patterns from examples&lt;/li&gt;
&lt;li&gt;It generates tests that match your style, not generic templates&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Technical detail: It runs in a Docker container and connects via API tokens. (See appendix for setup details)&lt;/p&gt;

&lt;p&gt;Here's a real example. The AI produced a car configurator component. This includes package conflicts and mutual exclusions based on the requirements.&lt;/p&gt;

&lt;p&gt;There's less risk of it hallucinating because the MCP keeps it on track with your specific project context. It may also spot things you haven't.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight gherkin"&gt;&lt;code&gt;&lt;span class="kd"&gt;Feature&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; Car Configurator Package Management

  &lt;span class="kn"&gt;Background&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;Given &lt;/span&gt;I am on the car configurator page
    &lt;span class="nf"&gt;And &lt;/span&gt;I have selected a &lt;span class="s"&gt;"Hatch Back"&lt;/span&gt; model
    &lt;span class="nf"&gt;And &lt;/span&gt;the configurator displays available packages and options

  &lt;span class="kn"&gt;Scenario&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; Sport Package conflicts with Eco Package selection
    &lt;span class="err"&gt;Given I have selected the "Sport Package" containing&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
      &lt;span class="p"&gt;|&lt;/span&gt; &lt;span class="nv"&gt;Option&lt;/span&gt;            &lt;span class="p"&gt;|&lt;/span&gt; &lt;span class="nv"&gt;Price&lt;/span&gt;  &lt;span class="p"&gt;|&lt;/span&gt;
      &lt;span class="p"&gt;|&lt;/span&gt; &lt;span class="n"&gt;Performance&lt;/span&gt; &lt;span class="n"&gt;Tyres&lt;/span&gt; &lt;span class="p"&gt;|&lt;/span&gt; &lt;span class="n"&gt;£800&lt;/span&gt;   &lt;span class="p"&gt;|&lt;/span&gt;
      &lt;span class="p"&gt;|&lt;/span&gt; &lt;span class="n"&gt;Sport&lt;/span&gt; &lt;span class="n"&gt;Suspension&lt;/span&gt;  &lt;span class="p"&gt;|&lt;/span&gt; &lt;span class="n"&gt;£1,200&lt;/span&gt; &lt;span class="p"&gt;|&lt;/span&gt;
      &lt;span class="p"&gt;|&lt;/span&gt; &lt;span class="n"&gt;Sport&lt;/span&gt; &lt;span class="n"&gt;Exhaust&lt;/span&gt;     &lt;span class="p"&gt;|&lt;/span&gt; &lt;span class="n"&gt;£600&lt;/span&gt;   &lt;span class="p"&gt;|&lt;/span&gt;
    &lt;span class="nf"&gt;And &lt;/span&gt;the total package price is &lt;span class="s"&gt;"£2,600"&lt;/span&gt;
    &lt;span class="nf"&gt;When &lt;/span&gt;I attempt to select the &lt;span class="s"&gt;"Eco Package"&lt;/span&gt;
    &lt;span class="err"&gt;Then I should see a conflict warning stating&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="err"&gt;"Eco&lt;/span&gt; &lt;span class="err"&gt;Package&lt;/span&gt; &lt;span class="err"&gt;cannot&lt;/span&gt; &lt;span class="err"&gt;be&lt;/span&gt; &lt;span class="err"&gt;combined&lt;/span&gt; &lt;span class="err"&gt;with&lt;/span&gt; &lt;span class="err"&gt;Sport&lt;/span&gt; &lt;span class="err"&gt;Package."&lt;/span&gt;
    &lt;span class="nf"&gt;And &lt;/span&gt;the &lt;span class="s"&gt;"Eco Package"&lt;/span&gt; option should be disabled
    &lt;span class="nf"&gt;And &lt;/span&gt;my current selection should remain &lt;span class="s"&gt;"Sport Package"&lt;/span&gt;
    &lt;span class="nf"&gt;And &lt;/span&gt;the basket total should remain &lt;span class="s"&gt;"£2,600"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  The Survival Guide: Making It Work
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Start Where You Are
&lt;/h3&gt;

&lt;p&gt;Don't wait for the perfect process. If you're overwhelmed by complexity, let AI create test cases. You can then check the critical paths yourself.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Use It for Comprehension, Not Just Coverage
&lt;/h3&gt;

&lt;p&gt;When you first see that spec section, use AI to quickly generate scenarios. This helps you understand the feature faster — what are all the combinations? What are the edge cases?&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Focus Your Human Effort
&lt;/h3&gt;

&lt;p&gt;With AI handling the combinatorial explosion, you can focus on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Does this match the Figma designs?&lt;/li&gt;
&lt;li&gt;Do the mutual exclusions make sense to users?&lt;/li&gt;
&lt;li&gt;What happens when rules conflict?&lt;/li&gt;
&lt;li&gt;Is this actually valuable to customers?&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Quick Start for Desperate Teams
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;First: Check if you're allowed to use AI tools&lt;/strong&gt; (Really. Ask security. Using unapproved tools with company data is a career-limiting move.)&lt;/li&gt;
&lt;li&gt;Pick your most complex feature with multiple rules.&lt;/li&gt;
&lt;li&gt;Set up MCP with your Jira/Confluence (see setup guide).&lt;/li&gt;
&lt;li&gt;Generate test cases for just that feature.&lt;/li&gt;
&lt;li&gt;Compare time spent vs. manual creation.&lt;/li&gt;
&lt;li&gt;Use the time saved for exploratory testing.

&lt;ul&gt;
&lt;li&gt;Document what you find!&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;h2&gt;
  
  
  Part 5: The Honest Path Forward
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What This Fixes (and What It Doesn't)
&lt;/h3&gt;

&lt;p&gt;Let's be real — AI won't fix your broken process. You still have:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Too much WIP (work in progress).&lt;/li&gt;
&lt;li&gt;Artificial sprint boundaries destroying flow.&lt;/li&gt;
&lt;li&gt;A 5:1 developer-to-tester ratio that guarantees bottlenecks.&lt;/li&gt;
&lt;li&gt;Specifications that arrive fully formed rather than iteratively.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But AI can make an unsustainable situation slightly more bearable. It buys you time to demonstrate the value of comprehensive testing and build the case for the process changes you really need.&lt;/p&gt;

&lt;h3&gt;
  
  
  Build Evidence for Change
&lt;/h3&gt;

&lt;p&gt;Plan to measure:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Test case creation time (target: 70% reduction).&lt;/li&gt;
&lt;li&gt;Edge case coverage (how many combinations would you have missed?).&lt;/li&gt;
&lt;li&gt;Time freed for exploratory testing.&lt;/li&gt;
&lt;li&gt;Critical issues found with that freed time.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This evidence will help justify the process improvements we desperately need.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Honest Truth
&lt;/h2&gt;

&lt;p&gt;AI-assisted testing is a lifesaver, not the solution itself. It's like using a better bucket on a ship that’s taking on water. You can bail out faster, but it doesn’t solve the main issue.&lt;/p&gt;

&lt;p&gt;AI helps make the case for proper testing. It shows what happens when your testers aren't overwhelmed. And it makes it clear why you need to change up your process.&lt;/p&gt;

&lt;p&gt;You should fix the process first, but that is not how it works. In reality, AI-assisted testing is about staying afloat and keeping your sanity. It frees up time to focus on quality, rather than going through the motions.&lt;/p&gt;

&lt;p&gt;For instance, AI can create a huge test suite in a few minutes – that's a massive help.&lt;/p&gt;

&lt;h2&gt;
  
  
  Part 6: Getting Started (Yes, Really)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What You'll Need
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Docker (or someone who can install it for you)&lt;/li&gt;
&lt;li&gt;Jira/Confluence access with API permissions&lt;/li&gt;
&lt;li&gt;About 30 minutes when no one's pinging you&lt;/li&gt;
&lt;li&gt;A complex feature to test (you've got plenty)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  The 10-Minute Version
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Pull the MCP-Atlassian Docker image&lt;/li&gt;
&lt;li&gt;Create API tokens in Atlassian&lt;/li&gt;
&lt;li&gt;Set up your .env file with credentials&lt;/li&gt;
&lt;li&gt;Connect to Amazon Q&lt;/li&gt;
&lt;li&gt;Work with Amazon Q to add some rules to guide it&lt;/li&gt;
&lt;li&gt;Ask it about your most painful feature&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  What Success Looks Like
&lt;/h3&gt;

&lt;p&gt;Within an hour, you should be generating test cases that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Actually match your testing patterns&lt;/li&gt;
&lt;li&gt;Cover edge cases you'd miss at 5 PM on a Friday&lt;/li&gt;
&lt;li&gt;Make sense to other team members (unlike that legacy automation)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Common Gotchas
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;If you set up a corporate proxy during onboarding, you might need to build the docker image yourself. Make sure to include your certs.&lt;/li&gt;
&lt;li&gt;API tokens expire. Set a calendar reminder.&lt;/li&gt;
&lt;li&gt;Start small. One feature. Prove the value.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Full setup guide: &lt;a href="https://dev.to/paul_coles_633f698b10fd6e/ai-assisted-testing-a-survival-guide-to-implementing-mcp-with-atlassian-tools-2gnm"&gt;https://dev.to/paul_coles_633f698b10fd6e/ai-assisted-testing-a-survival-guide-to-implementing-mcp-with-atlassian-tools-2gnm&lt;/a&gt;&lt;/p&gt;

</description>
      <category>mcp</category>
      <category>testing</category>
      <category>agile</category>
    </item>
  </channel>
</rss>
