<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem:  Gábor Mészáros</title>
    <description>The latest articles on Forem by  Gábor Mészáros (@cleverhoods).</description>
    <link>https://forem.com/cleverhoods</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3647906%2F2ae4010e-7f1a-4906-9598-c259abb6e222.jpeg</url>
      <title>Forem:  Gábor Mészáros</title>
      <link>https://forem.com/cleverhoods</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/cleverhoods"/>
    <language>en</language>
    <item>
      <title>The Undiagnosed Input Problem</title>
      <dc:creator> Gábor Mészáros</dc:creator>
      <pubDate>Wed, 08 Apr 2026 11:51:12 +0000</pubDate>
      <link>https://forem.com/reporails/the-undiagnosed-input-problem-4pmc</link>
      <guid>https://forem.com/reporails/the-undiagnosed-input-problem-4pmc</guid>
      <description>&lt;p&gt;The AI agent ecosystem has built a serious industry around controlling outputs. Guardrails. Safety classifiers. Output validation. Monitoring. Retry systems. Human review.&lt;/p&gt;

&lt;p&gt;All of that matters, but there is simpler upstream question that still goes mostly unmeasured:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Are the instructions any good?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That sounds obvious, &lt;strong&gt;yet it is not how the industry behaves.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;When an agent fails to follow instructions, the usual explanations come fast:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;em&gt;Models are probabilistic&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Agents are inconsistent&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;You need stronger guardrails&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;You need better monitoring&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;You need retries&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;You need humans in the loop&lt;/em&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;… and while those explanations are right to a certain degree, they also have a side effect: &lt;strong&gt;they turn instruction quality into a blind spot.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The ecosystem has become extremely good at inspecting what comes out of the model, and surprisingly weak at inspecting what goes in.&lt;/p&gt;

&lt;h2&gt;
  
  
  The symptom
&lt;/h2&gt;

&lt;p&gt;Consider &lt;a href="https://sierra.ai/blog/benchmarking-ai-agents" rel="noopener noreferrer"&gt;τ-bench&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;It gives agents policy instructions and measures whether they follow them in realistic customer-service tasks. Airline and retail workflows. Real constraints. Real multi-step behavior.&lt;/p&gt;

&lt;p&gt;The benchmark result that gets repeated is the model result: even strong systems still fail a large share of tasks, and consistency across repeated attempts remains weak.&lt;/p&gt;

&lt;p&gt;The conclusion most people draw is straightforward: &lt;strong&gt;we need better models, better agents, better orchestration.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;My take: &lt;strong&gt;&lt;em&gt;Maybe&lt;/em&gt;&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;But there is another question sitting underneath the benchmark:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Were the instructions themselves well-formed and well structured?&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Not just present. Not just long enough. Not just sincere.&lt;/p&gt;

&lt;p&gt;Well-formed. Well-structured. Well-organized.&lt;/p&gt;

&lt;p&gt;Specific enough to anchor behavior. Structured enough to survive context mixing. Non-conflicting across files. Positioned where the model can actually use them.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Those questions usually never gets asked.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The industry response
&lt;/h2&gt;

&lt;p&gt;I had a conversation recently where a lead solutions architect put the standard view plainly:&lt;/p&gt;

&lt;p&gt;“&lt;em&gt;The instruction merely influences the probability distribution over outputs. It doesn’t override it.&lt;/em&gt;”&lt;/p&gt;

&lt;p&gt;That is right about the mechanism but it is wrong about what follows from it.&lt;/p&gt;

&lt;p&gt;Yes, instructions operate probabilistically. &lt;strong&gt;But that does not mean all instructions are weak in the same way.&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The shape of the distribution is not fixed. It changes with the properties of the instruction itself. Specificity sharpens it. Structure sharpens it. Conflict flattens it. Vague abstractions flatten it. Bad formatting can suppress it almost entirely.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Across my earlier controlled experiments, small changes in wording and placement produced large changes in compliance:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://cleverhoods.medium.com/do-not-think-of-a-pink-elephant-7d40a26cd072" rel="noopener noreferrer"&gt;Instruction&lt;/a&gt; ordering moved compliance by 25 percentage points with the same model and the same directive.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://cleverhoods.medium.com/instruction-best-practices-precision-beats-clarity-e1bcae806671" rel="noopener noreferrer"&gt;Specificity&lt;/a&gt; produced roughly a 10x compliance effect when the instruction named the exact construct instead of describing it abstractly.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://cleverhoods.medium.com/claude-md-best-practices-7-formatting-rules-for-the-machine-a591afc3d9a9" rel="noopener noreferrer"&gt;Formatting&lt;/a&gt; changed whether the model reliably registered the instruction at all.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The problem is that most instruction systems are built without diagnostics.&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;That is not an AI limitation. That is an engineering failure.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  The folk system
&lt;/h2&gt;

&lt;p&gt;Right now, instruction practice spreads mostly through imitation.&lt;/p&gt;

&lt;p&gt;A popular repository posts “best practices” for Claude Code. Shared Cursor rules circulate as templates. People copy &lt;code&gt;AGENTS.md&lt;/code&gt; files between projects. Teams accumulate &lt;code&gt;CLAUDE.md&lt;/code&gt;, &lt;code&gt;.cursorrules&lt;/code&gt;, c&lt;code&gt;opilot-instructions.md&lt;/code&gt;, etc and project-specific rule files across multiple tools.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;Copy, paste, hope, repeat.&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Some of that advice is useful. Almost none of it is tested in any controlled, reproducible way. That would be fine if instruction quality were self-evident. &lt;strong&gt;It is not.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A long instruction file can feel thorough while being internally contradictory. A highly opinionated ruleset can feel disciplined while producing almost no behavioral influence on the model.&lt;/p&gt;

&lt;p&gt;A sprawling multi-file setup can look sophisticated while making the system worse.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Without diagnostics, developers do not know which instructions are binding, which are noise, and which are actively interfering with each other.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  The gap
&lt;/h2&gt;

&lt;p&gt;The tooling split is now pretty clear.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Output tooling&lt;/strong&gt; is mature. Guardrails AI validates structure. Lakera focuses on prompt injection and security. NeMo Guardrails enforces safety and conversational rails. Llama Guard classifies risky content. The output edge is crowded.&lt;/p&gt;

&lt;p&gt;Prompt testing is real. Promptfoo, Braintrust, and LangSmith can all help evaluate behavior. But they are primarily black-box systems: did the prompt produce the output you wanted?&lt;/p&gt;

&lt;p&gt;That is useful.&lt;/p&gt;

&lt;p&gt;It is not the same as measuring the instruction artifact itself.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Instruction-quality tooling&lt;/strong&gt; exists only in fragments. Some tools use LLM-as-judge. Some use deterministic local rules. But the category is still early, inconsistent, and mostly disconnected from measured behavioral outcomes.&lt;/p&gt;

&lt;p&gt;What is still largely missing is a deterministic way to inspect instruction files as engineered objects:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;how specific they are&lt;/li&gt;
&lt;li&gt;how directly they state intent&lt;/li&gt;
&lt;li&gt;whether they conflict across files&lt;/li&gt;
&lt;li&gt;whether they overuse headings&lt;/li&gt;
&lt;li&gt;whether they provide alternatives instead of bare prohibitions&lt;/li&gt;
&lt;li&gt;whether the system is getting denser while getting weaker&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Code gets static analysis.&lt;/p&gt;

&lt;p&gt;Instruction systems usually get &lt;em&gt;vibes&lt;/em&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  What we measured
&lt;/h2&gt;

&lt;p&gt;We built an analyzer that treats instruction files as structured objects with measurable properties. Deterministic. Reproducible. No LLM-as-judge.&lt;/p&gt;

&lt;p&gt;I am running it across a large live corpus of real repositories. The full run completes this week; what follows is what the partial sample already shows - stable enough to publish, not yet the full picture.&lt;/p&gt;

&lt;p&gt;Quality is reported on a 0-to-100 scale: &lt;code&gt;0&lt;/code&gt; means the file produces no measurable influence on model behavior, &lt;code&gt;100&lt;/code&gt; is the ceiling the framework can score.&lt;/p&gt;

&lt;p&gt;A fresh aggregation over &lt;strong&gt;12,076&lt;/strong&gt; completed instruction-file scans is virtually identical to an earlier &lt;strong&gt;9,582&lt;/strong&gt;-repo sample:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;bottom tier:&lt;/strong&gt; &lt;code&gt;40.3%&lt;/code&gt; vs &lt;code&gt;40.1%&lt;/code&gt;&lt;br&gt;
&lt;strong&gt;top tier:&lt;/strong&gt; &lt;code&gt;12.1%&lt;/code&gt; vs &lt;code&gt;12.2%&lt;/code&gt;&lt;br&gt;
&lt;strong&gt;mean quality score:&lt;/strong&gt; &lt;code&gt;27&lt;/code&gt; vs &lt;code&gt;27&lt;/code&gt;&lt;br&gt;
&lt;strong&gt;directive content ratio:&lt;/strong&gt; &lt;code&gt;27.9%&lt;/code&gt; vs &lt;code&gt;27.9%&lt;/code&gt; - the share of instruction sentences that directly tell the model what to do&lt;/p&gt;

&lt;p&gt;That matters because it means the pattern is stable.&lt;/p&gt;

&lt;p&gt;This does not look like a small-sample artifact.&lt;/p&gt;

&lt;p&gt;And the strongest finding is not what I expected.&lt;/p&gt;
&lt;h2&gt;
  
  
  More rules, lower quality
&lt;/h2&gt;

&lt;p&gt;The common response to bad agent behavior is to add more rules.&lt;/p&gt;

&lt;p&gt;More files. More guidance. More scoping. More edge-case coverage.&lt;/p&gt;

&lt;p&gt;The corpus says that strategy tends to backfire.&lt;/p&gt;

&lt;p&gt;Across &lt;strong&gt;12,076&lt;/strong&gt; repositories, instruction quality falls as instruction-file count rises:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Files per repo     N      Mean score   Bottom tier %   Top tier %
1                  4681   28           46.3%           16.9%
2-5                4796   26           37.3%            9.5%
6-20               1972   26           36.0%            8.8%
21-50               438   25           31.3%            5.7%
51-500              186   25           33.3%            5.4%
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The key number is the top-tier share.&lt;/p&gt;

&lt;p&gt;It collapses from &lt;code&gt;16.9%&lt;/code&gt; in single-file setups to &lt;code&gt;5.4%&lt;/code&gt; in repositories with &lt;code&gt;51&lt;/code&gt; to &lt;code&gt;500&lt;/code&gt; instruction files.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;That is a roughly 3x drop.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The article version of that finding is simple:&lt;/p&gt;

&lt;p&gt;Developers respond to bad agent behavior by adding more rules. In the corpus, that strategy correlates with a 3x collapse in the probability of landing in the top tier.&lt;/p&gt;

&lt;p&gt;That does not prove file count causes low quality by itself.&lt;/p&gt;

&lt;p&gt;But it does show that rule proliferation is not rescuing these systems. At scale, it is associated with weaker instruction quality, not stronger.&lt;/p&gt;

&lt;h2&gt;
  
  
  The sweet spot
&lt;/h2&gt;

&lt;p&gt;There is also a more subtle result in the partial sample. Instruction quality appears to be non-monotonic in directive density: more directives help at first, then stop helping, and past a point start to hurt.&lt;/p&gt;

&lt;p&gt;The full curve is in next week’s piece. The short version is that there is an optimal density range, after which additional directives stop strengthening the system.&lt;/p&gt;

&lt;p&gt;Enough force to bind behavior. Not so much that the system turns into an overpacked rules document.&lt;/p&gt;

&lt;h2&gt;
  
  
  A real example
&lt;/h2&gt;

&lt;p&gt;Here is the kind of instruction block the corpus is full of:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Code should be clear, well documented, clear PHPDocs.

# Code must meet SOLID DRY KISS principles.

# Should be compatible with PSR standards when it need.

# Take care about performance
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It is not malicious. It is not absurd.&lt;/p&gt;

&lt;p&gt;It is just &lt;strong&gt;weak.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Everything is abstract. Nothing is anchored. Headings are doing the work prose should do. The agent can read it, represent it, and still walk past most of it.&lt;/p&gt;

&lt;p&gt;Now compare:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;Never use &lt;span class="sb"&gt;`&lt;/span&gt;var_dump&lt;span class="o"&gt;()&lt;/span&gt;&lt;span class="sb"&gt;`&lt;/span&gt; or &lt;span class="sb"&gt;`&lt;/span&gt;&lt;span class="nb"&gt;dd&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;&lt;span class="sb"&gt;`&lt;/span&gt; &lt;span class="k"&gt;in &lt;/span&gt;committed code. Use &lt;span class="sb"&gt;`&lt;/span&gt;Log::debug&lt;span class="o"&gt;()&lt;/span&gt;&lt;span class="sb"&gt;`&lt;/span&gt; instead.
Run &lt;span class="sb"&gt;`&lt;/span&gt;./vendor/bin/phpstan analyse src/&lt;span class="sb"&gt;`&lt;/span&gt; before every commit. Level 6 minimum.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Same general intent. Completely different binding strength.&lt;/p&gt;

&lt;p&gt;The second version names the construct, names the alternative, names the command, and names the threshold. &lt;strong&gt;It gives the model something concrete to hold onto.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That is what diagnostics should make visible.&lt;/p&gt;

&lt;h2&gt;
  
  
  What this means
&lt;/h2&gt;

&lt;p&gt;Output guardrails still matter.&lt;/p&gt;

&lt;p&gt;Prompt evaluation still matters.&lt;/p&gt;

&lt;p&gt;Safety systems still matter.&lt;/p&gt;

&lt;p&gt;But they do not answer the upstream question: &lt;strong&gt;Are the instructions themselves well-formed?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If the answer is no, then a large class of downstream failures will keep showing up as mysterious agent unreliability when the real problem is earlier and simpler.&lt;/p&gt;

&lt;p&gt;The agent loaded the instruction and walked past it.&lt;/p&gt;

&lt;p&gt;That is often not a model problem.&lt;/p&gt;

&lt;p&gt;It is an input problem.&lt;/p&gt;

&lt;p&gt;And input quality is measurable.&lt;/p&gt;

&lt;h2&gt;
  
  
  What’s next
&lt;/h2&gt;

&lt;p&gt;These are corpus-level findings from a partial sample, not universal laws.&lt;/p&gt;

&lt;p&gt;The sample is still in flight. The strongest claims here are about association, not proof of causality. Specific conflict-count case studies need source verification before publication. Popularity weighting is not yet applied, so “40% of repositories score in the bottom tier” is not the same claim as “40% of production agent work scores in the bottom tier.”&lt;/p&gt;

&lt;p&gt;The full corpus run completes this week. Next week I publish the end-of-run analysis across the full sample — the complete distribution, the cross-cuts the partial sample cannot yet support, and the specific case studies this article deliberately held back. If you want to know where your stack lands, that is the piece to come back for.&lt;/p&gt;

&lt;p&gt;For now, the central pattern is already stable enough to matter:&lt;/p&gt;

&lt;p&gt;The ecosystem keeps responding to weak agent behavior by adding more instructions, while the corpus shows that more instruction files are usually associated with lower measured quality.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;That is the undiagnosed input problem.&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Not that instructions do not matter.&lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;That they matter, measurably, and most teams still have no way to see whether theirs are helping or hurting.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;p&gt;&lt;em&gt;This is part of the Instruction Best Practices series. Previous: &lt;a href="https://cleverhoods.medium.com/do-not-think-of-a-pink-elephant-7d40a26cd072" rel="noopener noreferrer"&gt;Do NOT Think of a Pink Elephant&lt;/a&gt;, &lt;a href="https://cleverhoods.medium.com/instruction-best-practices-precision-beats-clarity-e1bcae806671" rel="noopener noreferrer"&gt;Precision Beats Clarity&lt;/a&gt;, &lt;a href="https://cleverhoods.medium.com/claude-md-best-practices-7-formatting-rules-for-the-machine-a591afc3d9a9" rel="noopener noreferrer"&gt;7 Formatting Rules for the Machine&lt;/a&gt;. I’m building instruction diagnostics for coding agents. Follow for the full corpus analysis.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>productivity</category>
      <category>claude</category>
      <category>performance</category>
    </item>
    <item>
      <title>Do NOT Think of a Pink Elephant</title>
      <dc:creator> Gábor Mészáros</dc:creator>
      <pubDate>Tue, 31 Mar 2026 12:19:14 +0000</pubDate>
      <link>https://forem.com/cleverhoods/do-not-think-of-a-pink-elephant-383n</link>
      <guid>https://forem.com/cleverhoods/do-not-think-of-a-pink-elephant-383n</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;You thought of a pink elephant, didn't you?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Same goes for LLMs too. &lt;/p&gt;

&lt;p&gt;"&lt;em&gt;Do not use mocks in tests.&lt;/em&gt;"&lt;/p&gt;

&lt;p&gt;Clear, direct, unambiguous instruction. The agent read it — I can see it in the trace. Then it wrote a test file with &lt;code&gt;unittest.mock&lt;/code&gt; on line 3. Thanks...&lt;/p&gt;

&lt;p&gt;I've seen this play out hundreds of times. A developer writes a rule, the agent loads it, and it does exactly what the rule said not to do. The natural conclusion: instructions are unreliable. The agent is probabilistic. You can't trust it.&lt;/p&gt;

&lt;p&gt;That's wrong. The instruction was the problem.&lt;/p&gt;

&lt;h2&gt;
  
  
  The pink elephant
&lt;/h2&gt;

&lt;p&gt;There's a well-known effect in psychology called ironic process theory (Daniel Wegner, 1987). Tell someone "don't think of a pink elephant," and they immediately think of a pink elephant. The act of suppressing a thought requires activating it first.&lt;/p&gt;

&lt;p&gt;Something structurally similar happens with AI instructions.&lt;/p&gt;

&lt;p&gt;"Do not use mocks in tests" introduces the concept of mocking into the context. The tokens &lt;code&gt;mock&lt;/code&gt;, &lt;code&gt;tests&lt;/code&gt;, &lt;code&gt;use&lt;/code&gt; — these are exactly the tokens the model would produce when writing test code with mocks. You've put the thing you're banning right in the generation path.&lt;/p&gt;

&lt;p&gt;This doesn't mean restrictive instructions are useless. It means a bare restriction is incomplete.&lt;/p&gt;

&lt;h2&gt;
  
  
  The anatomy of a complete instruction
&lt;/h2&gt;

&lt;p&gt;The instructions that work — reliably, across thousands of runs — have three components. But the order you write them in matters as much as whether they're there at all.&lt;/p&gt;

&lt;p&gt;Here's how most people write it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# Human-natural ordering — constraint first&lt;/span&gt;
Do not use unittest.mock in tests.
Use real service clients from tests/fixtures/.
Mocked tests passed CI last quarter while the production
integration was broken — real clients catch this.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;All three components are present. Restriction, directive, context. But the restriction fires first — the model activates &lt;code&gt;{mock, unittest, tests}&lt;/code&gt; before it ever sees the alternative. You've front-loaded the pink elephant.&lt;/p&gt;

&lt;p&gt;Now flip it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# Golden ordering — directive first&lt;/span&gt;
Use real service clients from tests/fixtures/.
Real integration tests catch deployment failures and configuration
errors that would otherwise reach production undetected.
Do not use unittest.mock.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Same three components. Different order. The directive establishes the desired pattern first. The reasoning reinforces it. The restriction fires last, when the positive frame is already dominant.&lt;/p&gt;

&lt;p&gt;In my experiments — 500 runs per condition, same model, same context — constraint-first produces violations 31% of the time. Directive-first with positive reasoning: 7%.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The pink elephant isn't just about missing components. It's about which concept the model sees first.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Three layers, in this order:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Directive&lt;/strong&gt; — what to do. This goes first. It establishes the pattern you want in the generation path before the prohibited concept appears.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Context&lt;/strong&gt; — why. Reasoning that reinforces the directive &lt;em&gt;without mentioning the prohibited concept&lt;/em&gt;. "Real integration tests catch deployment failures" adds mass to the positive pattern. Reasoning that mentions the prohibited concept doubles the violation rate.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Restriction&lt;/strong&gt; — what not to do. This goes last. Negation provides weak suppression — but weak suppression is enough when the positive pattern is already dominant.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  The part nobody expects
&lt;/h2&gt;

&lt;p&gt;Here's what surprised me: &lt;strong&gt;the ordering effect is larger than any other variable I've measured.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Precise naming vs. vague categories? 28 percentage points. Exact scope vs. broad scope? 74 points across the range. But reordering — same words, same components, just flipped — accounts for 25 points on its own. And it compounds with everything else.&lt;/p&gt;

&lt;p&gt;Most developers write instructions the way they'd write them for a human: state the problem, then the solution. "Don't do X. Instead, do Y." It's natural. It's also the worst ordering for an LLM.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Never write "Don't use X. Instead, use Y." Write "Use Y. Here's why Y works. Don't use X."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Formatting helps too — structure is not decoration. I covered that in depth in &lt;a href="https://dev.to/cleverhoods/-claudemd-best-practices-7-formatting-rules-for-the-machine-3d3l"&gt;7 Formatting Rules for the Machine&lt;/a&gt;. But formatting on top of bad ordering is polishing the wrong end. Get the order right first.&lt;/p&gt;

&lt;h2&gt;
  
  
  What this looks like in practice
&lt;/h2&gt;

&lt;p&gt;Here's a real instruction I see in the wild:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;When writing tests, avoid mocking external services. Try to
use real implementations where possible. This helps catch
integration issues early. If you must mock, keep mocks minimal
and focused.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Count the problems:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"Avoid" — hedged, not direct&lt;/li&gt;
&lt;li&gt;"external services" — category, not construct&lt;/li&gt;
&lt;li&gt;"Try to" — escape hatch built into the instruction&lt;/li&gt;
&lt;li&gt;"where possible" — another escape hatch&lt;/li&gt;
&lt;li&gt;"If you must mock" — reintroduces mocking as an option &lt;em&gt;within the instruction that prohibits it&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;Constraint-first ordering — the prohibition leads, the alternative follows&lt;/li&gt;
&lt;li&gt;No structural separation — restriction, directive, hedge, and escape hatch all in one paragraph&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Now rewrite it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gs"&gt;**Use the service clients**&lt;/span&gt; in &lt;span class="sb"&gt;`tests/fixtures/stripe.py`&lt;/span&gt; and
&lt;span class="sb"&gt;`tests/fixtures/redis.py`&lt;/span&gt;.
&lt;span class="gt"&gt;
&amp;gt; Real service clients caught a breaking Stripe API change&lt;/span&gt;
&lt;span class="gt"&gt;&amp;gt; that went undetected for 3 weeks in payments - integration&lt;/span&gt;
&lt;span class="gt"&gt;&amp;gt; tests against live endpoints surface these immediately.&lt;/span&gt;

&lt;span class="ge"&gt;*Do not import*&lt;/span&gt; &lt;span class="sb"&gt;`unittest.mock`&lt;/span&gt; or &lt;span class="sb"&gt;`pytest.monkeypatch`&lt;/span&gt;.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Directive first — names the exact files. Context second — the specific incident, reinforcing &lt;em&gt;why the directive matters&lt;/em&gt; without mentioning the prohibited concept. Restriction last — names the exact imports, fires after the positive pattern is established. No hedging. No escape hatches.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;

&lt;p&gt;For any instruction in your AGENTS.md/CLAUDE.md or SKILLS.md  files:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Start with the directive.&lt;/strong&gt; Name the file, the path, the pattern. Use backticks. If there's no alternative to lead with, you're writing a pink elephant.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Add the context.&lt;/strong&gt; One sentence. The specific incident or the specific reason the directive works. Do not mention the thing you're about to prohibit — reasoning that references the prohibited concept halves the benefit.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;End with the restriction.&lt;/strong&gt; Name the construct — the import, the class, the function. Bold it. No "try to avoid" or "where possible."&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Format each component distinctly.&lt;/strong&gt; The directive, context, and restriction should be visually and structurally separate. Don't merge them into one paragraph.&lt;/li&gt;
&lt;/ol&gt;

&lt;blockquote&gt;
&lt;p&gt;If your instruction is just "don't do X" — you've told the model to think about X.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Tell it what to think about instead. And tell it &lt;em&gt;first&lt;/em&gt;.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>agentskills</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>Instruction Best Practices: Precision Beats Clarity</title>
      <dc:creator> Gábor Mészáros</dc:creator>
      <pubDate>Tue, 24 Mar 2026 13:12:30 +0000</pubDate>
      <link>https://forem.com/cleverhoods/instruction-best-practices-precision-beats-clarity-lod</link>
      <guid>https://forem.com/cleverhoods/instruction-best-practices-precision-beats-clarity-lod</guid>
      <description>&lt;p&gt;Two rules in the same file. Both say "don't mock."&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;When working with external services, avoid using mock objects in tests.

When writing tests for src/payments/, do not use unittest.mock.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Same intent. Same file. Same model. One gets followed. One gets ignored.&lt;/p&gt;

&lt;p&gt;I stared at the diff for a while, convinced something was broken. The model loaded the file. It read both rules. It followed one and walked past the other like it wasn't there.&lt;/p&gt;

&lt;p&gt;Nothing was broken. The words were wrong.&lt;/p&gt;

&lt;h1&gt;
  
  
  The experiment
&lt;/h1&gt;

&lt;p&gt;I ran controlled behavioral experiments: same model, same context window, same position in the file. One variable changed at a time. Over a thousand runs per finding, with statistically significant differences between conditions.&lt;/p&gt;

&lt;p&gt;Two findings stood out.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;First&lt;/strong&gt; &lt;em&gt;(and the one that surprised me most)&lt;/em&gt;: when instructions have a conditional scope ("When doing X..."), precision matters enormously. &lt;strong&gt;A broad scope is worse than a wrong scope.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Second&lt;/strong&gt;: instructions that name the exact construct get followed roughly &lt;strong&gt;10 times more often&lt;/strong&gt; than instructions that describe the category. "&lt;code&gt;unittest.mock&lt;/code&gt;" vs "mock objects" — same rule, same meaning to a human. Not the same to the model.&lt;/p&gt;

&lt;h1&gt;
  
  
  Scope it or drop it
&lt;/h1&gt;

&lt;p&gt;Most instructions I see in the wild look like this:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;When working with external services, do not use unittest.mock.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;That "When working with external services" is the scope — it tells the agent &lt;em&gt;when&lt;/em&gt; to apply the rule. Scopes are useful. But the wording matters more than you'd expect.&lt;/p&gt;

&lt;p&gt;I tested four scope wordings for the same instruction:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Exact scope — best compliance
When writing tests for src/payments/, do not use unittest.mock.

# Universal scope — nearly as good
When writing tests, do not use unittest.mock.

# Wrong domain — degraded
When working with databases, do not use unittest.mock.

# Broad category — worst compliance
When working with external services, do not use unittest.mock.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Read that ranking again. &lt;strong&gt;Broad is worse than wrong.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;"When working with databases" has nothing to do with the test at hand. But it gives the agent something concrete - a specific domain to anchor on. The instruction is scoped to the wrong context, but it's still a clear, greppable constraint.&lt;/p&gt;

&lt;p&gt;"When working with external services" is technically correct. It even sounds more helpful. But it activates a cloud of associations - HTTP clients, API wrappers, service meshes, authentication, retries - and the instruction gets lost in the noise.&lt;/p&gt;

&lt;p&gt;The rule: &lt;strong&gt;if your scope wouldn't work as a grep pattern, rewrite it or drop it.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;An unconditional instruction beats a badly-scoped conditional:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Broad scope — fights itself
When working with external services, prefer real implementations
over mock objects in your test suite.

# No scope — just say it
Do not use unittest.mock.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;The second version is blunter. It's also more effective. Universal scopes ("When writing tests") cost almost nothing — they frame the context without introducing noise. But broad category scopes actively hurt.&lt;/p&gt;

&lt;h1&gt;
  
  
  Name the thing
&lt;/h1&gt;

&lt;p&gt;Here's what the difference looks like across domains.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Describes the category — low compliance
Avoid using mock objects in tests.

# Names the construct — high compliance
Do not use unittest.mock.

# Category
Handle errors properly in API calls.

# Construct
Wrap calls to stripe.Customer.create() in try/except StripeError.

# Category
Don't use unsafe string formatting.

# Construct
Do not use f-strings in SQL queries. Use parameterized queries
with cursor.execute().

# Category
Avoid storing secrets in code.

# Construct
Do not hardcode values in os.environ[]. Read from .env
via python-dotenv.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;The pattern: if the agent could tab-complete it, use that form. If it's something you'd type into an import statement, a grep, or a stack trace - that's the word the agent needs.&lt;/p&gt;

&lt;p&gt;Category names feel clearer to us, humans. "Mock objects" is plain English. But the model matches against what it would actually generate, not against what the words mean in English. "&lt;code&gt;unittest.mock&lt;/code&gt;" matches the tokens the model would produce when writing test code. "Mock objects" matches everything and nothing.&lt;/p&gt;

&lt;p&gt;Think of it like search. A query for &lt;code&gt;unittest.mock&lt;/code&gt; returns one result. A query for "mocking libraries" returns a thousand. The agent faces the same problem: a vague instruction activates too many associations, and the signal drowns.&lt;/p&gt;

&lt;h1&gt;
  
  
  The compound effect
&lt;/h1&gt;

&lt;p&gt;When both parts of the instruction are vague - vague scope, vague body - the failures compound. When both are precise, the gains compound.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Before — vague everywhere
When working with external services, prefer using real implementations
over mock objects in your test suite.

# After — precise everywhere
When writing tests for `src/payments/`:
Do not import `unittest.mock`.
Use the sandbox client from `tests/fixtures/stripe.py`.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Same intent. The rewrite takes ten seconds. The difference is not incremental, it's categorical.&lt;/p&gt;

&lt;p&gt;Formatting gets the instruction &lt;em&gt;read&lt;/em&gt; - headers, code blocks, hierarchy make it scannable. Precision gets the instruction &lt;em&gt;followed&lt;/em&gt; - exact constructs and tight scopes make it actionable. They work together. A well-formatted vague instruction still gets ignored. A precise instruction buried in a wall of text still gets missed. You need both.&lt;/p&gt;

&lt;h1&gt;
  
  
  When to adopt this
&lt;/h1&gt;

&lt;p&gt;This matters most when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Your instruction files mention categories more than constructs, like "services," "libraries," "objects," "errors" etc.&lt;/li&gt;
&lt;li&gt;You use broad conditional scopes: "when working with...," "for external...," "in general..."&lt;/li&gt;
&lt;li&gt;You have rules that are loaded and read but not followed&lt;/li&gt;
&lt;li&gt;You want to squeeze more compliance out of existing instructions without restructuring the file&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It matters less when your instructions are already construct-level ("do not call &lt;code&gt;eval()&lt;/code&gt;") or unconditional.&lt;/p&gt;

&lt;h1&gt;
  
  
  Try it
&lt;/h1&gt;

&lt;ol&gt;
&lt;li&gt;Open your instruction files.&lt;/li&gt;
&lt;li&gt;Find every instruction that uses a category word -&amp;gt; "services," "objects," "libraries," "errors," "dependencies."&lt;/li&gt;
&lt;li&gt;Replace it with the construct the agent would encounter at runtime - the import path, the class name, the file glob, the CLI flag.&lt;/li&gt;
&lt;li&gt;For conditional instructions: replace broad scopes with exact paths or file patterns. If you can't be exact, drop the condition entirely - unconditional is better than vague.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Then run your agent on the same task that was failing. You'll see the difference.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Formatting is the signal. Precision is the target.&lt;/strong&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>beginners</category>
      <category>performance</category>
      <category>agents</category>
    </item>
    <item>
      <title>CLAUDE.md Best Practices: 7 formatting rules for the Machine</title>
      <dc:creator> Gábor Mészáros</dc:creator>
      <pubDate>Tue, 03 Mar 2026 13:06:00 +0000</pubDate>
      <link>https://forem.com/cleverhoods/-claudemd-best-practices-7-formatting-rules-for-the-machine-3d3l</link>
      <guid>https://forem.com/cleverhoods/-claudemd-best-practices-7-formatting-rules-for-the-machine-3d3l</guid>
      <description>&lt;p&gt;I watched an agent ignore a rule I wrote 2 hours earlier.&lt;/p&gt;

&lt;p&gt;Not a vague rule. A specific one. &lt;strong&gt;"run pytest before committing."&lt;/strong&gt; It was right there in the CLAUDE.md, paragraph two, between the project description and the linting setup. The agent read the file. I saw it in the context. It just... didn't follow it.&lt;/p&gt;

&lt;p&gt;I moved the same instruction under a &lt;code&gt;## Testing&lt;/code&gt; header, wrapped &lt;code&gt;pytest&lt;/code&gt; in backticks, and added a one-line rationale. Next run, the agent followed it to the letter.&lt;/p&gt;

&lt;p&gt;The instruction didn't change. The &lt;strong&gt;signal strength&lt;/strong&gt; did.&lt;/p&gt;

&lt;p&gt;In the &lt;a href="https://dev.to/cleverhoods/why-bootstrap-should-be-the-first-command-in-every-agent-session-4jg2"&gt;last post&lt;/a&gt;, we got the agent oriented — &lt;code&gt;/bootstrap&lt;/code&gt; loads the map, the workflows, the boundaries. But orientation and compliance are different things. You can hand someone a perfect briefing and still lose them if the briefing is a wall of text. Same with agents.&lt;/p&gt;

&lt;p&gt;The question isn't whether your instructions are loaded. It's whether the agent &lt;em&gt;follows&lt;/em&gt; them.&lt;/p&gt;

&lt;h2&gt;
  
  
  The comparison
&lt;/h2&gt;

&lt;p&gt;Here's the same instruction, two ways.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Version A:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;When working on this project, always make sure to run the test suite
before committing any changes. The command to run tests is pytest and
you should run it from the project root. If tests fail, fix them before
committing. Also make sure to use ruff for formatting.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Version B:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gu"&gt;## Testing&lt;/span&gt;
&lt;span class="p"&gt;
-&lt;/span&gt; &lt;span class="sb"&gt;`pytest`&lt;/span&gt; — run from project root before every commit
&lt;span class="p"&gt;-&lt;/span&gt; Fix failures before committing

&lt;span class="gu"&gt;## Formatting&lt;/span&gt;
&lt;span class="p"&gt;
-&lt;/span&gt; &lt;span class="sb"&gt;`ruff check --fix &amp;amp;&amp;amp; ruff format`&lt;/span&gt; — run before committing
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Same content. Version B gets followed. Version A gets buried.&lt;/p&gt;

&lt;p&gt;This isn't about aesthetics. Structural elements — headers, code fences, lists — create anchor points that agents latch onto. Prose paragraphs don't. The more structure you provide, the more reliably each instruction lands.&lt;/p&gt;

&lt;h2&gt;
  
  
  It's not just about length
&lt;/h2&gt;

&lt;p&gt;You already learned to keep your CLAUDE.md short. It's a good start but it's not sufficient. A 20-line prose paragraph gets lost just as easily as a 200-line one. The variable isn't word count. It's structure.&lt;/p&gt;

&lt;p&gt;A short file with no headers, no code blocks, and no rationale will underperform a longer file that's well-structured.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Length is the ceiling. Formatting is the signal.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Seven structural rules
&lt;/h2&gt;

&lt;p&gt;These aren't content guidelines. They're formatting choices that determine whether instructions survive the trip from file to agent behavior. I'll start with the three you won't find in other guides, then cover the four that everyone mentions but nobody explains &lt;em&gt;why&lt;/em&gt;.&lt;/p&gt;




&lt;h3&gt;
  
  
  1. Include rationale
&lt;/h3&gt;

&lt;p&gt;"Never force push" is an instruction. "Never force push — rewrites shared history, unrecoverable for collaborators" is an instruction the agent &lt;em&gt;weighs&lt;/em&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# Without rationale&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Never use &lt;span class="sb"&gt;`rm -rf`&lt;/span&gt; on the project root
&lt;span class="p"&gt;-&lt;/span&gt; Always run tests before committing
&lt;span class="p"&gt;-&lt;/span&gt; Don't modify package-lock.json manually

&lt;span class="gh"&gt;# With rationale&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Never use &lt;span class="sb"&gt;`rm -rf`&lt;/span&gt; on the project root — irrecoverable
&lt;span class="p"&gt;-&lt;/span&gt; Always run tests before committing — CI will reject untested code
&lt;span class="p"&gt;-&lt;/span&gt; Don't modify package-lock.json manually — causes merge conflicts
  and dependency resolution issues
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The rationale doesn't just explain — it gives the agent a way to generalize. An agent that understands &lt;em&gt;why&lt;/em&gt; force push is forbidden will also avoid &lt;code&gt;git reset --hard origin/main&lt;/code&gt; without being told. The "why" turns a single rule into a class of behaviors.&lt;/p&gt;

&lt;p&gt;This is the most undervalued formatting choice. Every prohibition should carry its reason.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Keep heading hierarchy shallow
&lt;/h3&gt;

&lt;p&gt;Three levels is enough. &lt;code&gt;h1&lt;/code&gt; for the file title, &lt;code&gt;h2&lt;/code&gt; for sections, &lt;code&gt;h3&lt;/code&gt; for subsections. That's it.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# Before (5 levels deep)&lt;/span&gt;
&lt;span class="gh"&gt;# Project&lt;/span&gt;
&lt;span class="gu"&gt;## Development&lt;/span&gt;
&lt;span class="gu"&gt;### Testing&lt;/span&gt;
&lt;span class="gu"&gt;#### Unit Tests&lt;/span&gt;
&lt;span class="gu"&gt;##### Mocking Strategy&lt;/span&gt;

&lt;span class="gh"&gt;# After (3 levels max)&lt;/span&gt;
&lt;span class="gh"&gt;# Project&lt;/span&gt;
&lt;span class="gu"&gt;## Testing&lt;/span&gt;
&lt;span class="gu"&gt;### Unit tests&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Deep nesting dilutes attention. An &lt;code&gt;h5&lt;/code&gt; competes with every heading above it for the agent's focus. It doesn't lose the &lt;code&gt;h2&lt;/code&gt;, but the hierarchy creates ambiguity about which level governs. Flat structures keep every instruction at the surface. &lt;strong&gt;If you need an &lt;code&gt;h4&lt;/code&gt;, you probably need a separate file.&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Name files descriptively
&lt;/h3&gt;

&lt;p&gt;When an agent searches your project - browsing a directory listing, running a glob, deciding which file to read - the file name is the first filter. Before content, before headers, before anything.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Before
docs/guide.md
docs/notes.md
scripts/setup.sh

# After
docs/api-authentication.md
docs/deployment-checklist.md
scripts/setup-local-dev.sh
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The agent sees a directory listing and picks what to open. &lt;code&gt;api-authentication.md&lt;/code&gt; tells it whether the file might be relevant to the current task. &lt;code&gt;guide.md&lt;/code&gt; forces it to open and read before it can decide. Descriptive names save the agent a round trip. &lt;strong&gt;In a project with dozens of files, that adds up.&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;This applies to any file the agent might discover: docs, scripts, configs.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;p&gt;Now the four you've heard before - but with a &lt;em&gt;why&lt;/em&gt;.&lt;/p&gt;




&lt;h3&gt;
  
  
  4. Use headers
&lt;/h3&gt;

&lt;p&gt;Agents scan headers the way developers scan a README: as a table of contents. A header says "&lt;strong&gt;new topic, reset attention.&lt;/strong&gt;"&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# Before&lt;/span&gt;
The project uses TypeScript with strict mode enabled. For testing we
use vitest. The CI pipeline runs on GitHub Actions.

&lt;span class="gh"&gt;# After&lt;/span&gt;
&lt;span class="gu"&gt;## Language&lt;/span&gt;

TypeScript with strict mode enabled.

&lt;span class="gu"&gt;## Testing&lt;/span&gt;
&lt;span class="p"&gt;
-&lt;/span&gt; &lt;span class="sb"&gt;`npx vitest`&lt;/span&gt; — run from project root

&lt;span class="gu"&gt;## CI&lt;/span&gt;
&lt;span class="p"&gt;
-&lt;/span&gt; &lt;span class="sb"&gt;`.github/workflows/`&lt;/span&gt; — GitHub Actions

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;One topic per header. The agent navigates to the right section instead of parsing the whole paragraph. Without headers, every instruction competes with every other instruction for attention.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Put commands in code blocks
&lt;/h3&gt;

&lt;p&gt;Commands in prose get read as descriptions. Commands in code blocks get treated as executable.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# Before&lt;/span&gt;
You can run the linter by running npm run lint and the tests
by running npm test.

&lt;span class="gh"&gt;# After&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="sb"&gt;`npm run lint`&lt;/span&gt; — check for issues
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="sb"&gt;`npm test`&lt;/span&gt; — run test suite
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you do nothing else from this post, wrap your commands in backticks. It's the single highest-impact change - &lt;strong&gt;a command in a code fence is a command. A command in a sentence is a suggestion&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  6. Use standard section names
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;## Testing&lt;/code&gt; gets recognized instantly. &lt;code&gt;## Quality Assurance Verification Process&lt;/code&gt; doesn't.&lt;/p&gt;

&lt;p&gt;Agents have been trained on millions of README files. They know what &lt;code&gt;## Testing&lt;/code&gt;, &lt;code&gt;## Commands&lt;/code&gt;, &lt;code&gt;## Structure&lt;/code&gt;, and &lt;code&gt;## Conventions&lt;/code&gt; mean. Those names carry built-in context.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Instead of&lt;/th&gt;
&lt;th&gt;Use&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Quality Assurance&lt;/td&gt;
&lt;td&gt;Testing&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Development Guidelines&lt;/td&gt;
&lt;td&gt;Conventions&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Operational Instructions&lt;/td&gt;
&lt;td&gt;Commands&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Safety and Compliance&lt;/td&gt;
&lt;td&gt;Boundaries&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Project Organization&lt;/td&gt;
&lt;td&gt;Structure&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The familiar name is the signal. The creative name is noise.&lt;/p&gt;

&lt;h3&gt;
  
  
  7. Make instructions actionable
&lt;/h3&gt;

&lt;p&gt;"Follow best practices" is not an instruction. "&lt;em&gt;Use ruff for formatting, run before committing&lt;/em&gt;" is.&lt;/p&gt;

&lt;p&gt;The test: could an agent execute this instruction right now, without asking a clarifying question? If not, it's too vague.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# Before&lt;/span&gt;
Make sure code quality is maintained and follows our standards.

&lt;span class="gh"&gt;# After&lt;/span&gt;
&lt;span class="gu"&gt;## Conventions&lt;/span&gt;
&lt;span class="p"&gt;
-&lt;/span&gt; Format with &lt;span class="sb"&gt;`ruff format`&lt;/span&gt; before committing
&lt;span class="p"&gt;-&lt;/span&gt; Type annotations on all public functions
&lt;span class="p"&gt;-&lt;/span&gt; No &lt;span class="sb"&gt;`print()`&lt;/span&gt; in production code — use &lt;span class="sb"&gt;`logging`&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Every instruction should pass the "act on it immediately" test. If it can't be acted on, it's a wish, not an instruction.&lt;/p&gt;

&lt;h2&gt;
  
  
  The compound effect
&lt;/h2&gt;

&lt;p&gt;Each rule alone is a small improvement. Together, they're multiplicative - not because the rules add up, but because they reinforce each other. Headers create sections. Sections hold code blocks. Code blocks contain actionable commands. Rationale explains why. Descriptive file names route attention to the right file. Shallow hierarchy keeps everything findable.&lt;/p&gt;

&lt;p&gt;Here's a realistic before/after applying all seven:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Before:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;This project is a Python CLI tool. We use pytest for testing and ruff
for linting. Make sure to run tests before you commit anything. The
source code is in src/myapp and tests are in tests/. Don't modify
anything in the dist/ folder because that's generated. Also we have
some rules about how to write tests — they should test behavior not
implementation details, and use parametrize instead of writing lots
of individual test functions that do the same thing.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;After:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gu"&gt;## Testing&lt;/span&gt;
&lt;span class="p"&gt;
-&lt;/span&gt; &lt;span class="sb"&gt;`pytest`&lt;/span&gt; — run from project root before every commit
&lt;span class="p"&gt;-&lt;/span&gt; Test behavior, not implementation — assert on outcomes, not internal calls
&lt;span class="p"&gt;-&lt;/span&gt; Use &lt;span class="sb"&gt;`@pytest.mark.parametrize`&lt;/span&gt; when cases share the same assertion shape

&lt;span class="gu"&gt;## Formatting&lt;/span&gt;
&lt;span class="p"&gt;
-&lt;/span&gt; &lt;span class="sb"&gt;`ruff check --fix &amp;amp;&amp;amp; ruff format`&lt;/span&gt;

&lt;span class="gu"&gt;## Structure&lt;/span&gt;
&lt;span class="p"&gt;
-&lt;/span&gt; Source: &lt;span class="sb"&gt;`src/myapp/`&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Tests: &lt;span class="sb"&gt;`tests/`&lt;/span&gt;

&lt;span class="gu"&gt;## Boundaries&lt;/span&gt;
&lt;span class="p"&gt;
-&lt;/span&gt; &lt;span class="sb"&gt;`dist/`&lt;/span&gt; — generated, do not modify
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Same information. Half the words. Every instruction lands.&lt;/p&gt;

&lt;h2&gt;
  
  
  When to reformat
&lt;/h2&gt;

&lt;p&gt;If you notice:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The agent apologizes for missing an instruction that's in your file&lt;/li&gt;
&lt;li&gt;The same rule gets violated in consecutive sessions&lt;/li&gt;
&lt;li&gt;You keep adding more words to an instruction hoping the agent will "get it"&lt;/li&gt;
&lt;li&gt;Your CLAUDE.md is one long section with no headers&lt;/li&gt;
&lt;li&gt;Commands appear in sentences instead of code blocks&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Your instructions don't need more content. They need more structure.&lt;/p&gt;

&lt;h2&gt;
  
  
  The connection to /bootstrap
&lt;/h2&gt;

&lt;p&gt;In the previous posts we built the delivery system: &lt;code&gt;backbone.yml&lt;/code&gt; maps the project, Mermaid draws the workflows, &lt;code&gt;/bootstrap&lt;/code&gt; loads both in seconds. That's the &lt;em&gt;orientation&lt;/em&gt; layer - the agent knows where it is and how things work.&lt;/p&gt;

&lt;p&gt;This is about &lt;strong&gt;attention budget allocation&lt;/strong&gt;. The agent has a limited context window. What matters isn't just what's in it — it's how the agent decides what's relevant at each step. Structure is what makes your instructions win that competition.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Orientation without compliance means the agent knows your project but ignores your rules. Compliance without orientation means the agent follows instructions but works in the wrong place. You need both.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Open your CLAUDE.md (or whatever instruction file your agent reads)&lt;/li&gt;
&lt;li&gt;Find the longest prose paragraph&lt;/li&gt;
&lt;li&gt;Break it: one header per topic, one code block per command, one sentence of rationale per prohibition&lt;/li&gt;
&lt;li&gt;Run your agent on the same task you ran yesterday&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The instructions didn't change. The signal did.&lt;/p&gt;




&lt;blockquote&gt;
&lt;p&gt;Don't just write more instructions. Format the ones you have.&lt;/p&gt;
&lt;/blockquote&gt;




</description>
      <category>agents</category>
      <category>ai</category>
      <category>documentation</category>
      <category>productivity</category>
    </item>
    <item>
      <title>Why /bootstrap should be the first Command in every Agent session</title>
      <dc:creator> Gábor Mészáros</dc:creator>
      <pubDate>Tue, 24 Feb 2026 12:39:23 +0000</pubDate>
      <link>https://forem.com/cleverhoods/why-bootstrap-should-be-the-first-command-in-every-agent-session-4jg2</link>
      <guid>https://forem.com/cleverhoods/why-bootstrap-should-be-the-first-command-in-every-agent-session-4jg2</guid>
      <description>&lt;p&gt;After a 2.5 hour session you accidentally close your coding agent terminal mid session. The output is there, the commits are there, but something important is gone.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;That synergy that you spent hours to build up.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;You reopen the console and hope you two can start over, but it feels like now you are strangers. The agent is now "&lt;em&gt;Somebody that you used to know.&lt;/em&gt;"&lt;/p&gt;

&lt;p&gt;No, this is not an intro of a light love novel, it's the usual experience with coding agents. Coding agents are stateless by design so each and every new session is a new beginning.&lt;/p&gt;

&lt;h2&gt;
  
  
  The resume illusion
&lt;/h2&gt;

&lt;p&gt;Some agents have &lt;code&gt;--resume&lt;/code&gt; functionality. Claude Code has it. Codex has it. Gemini CLI has it. It's useful, but it has limitations.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;--resume&lt;/code&gt; only &lt;strong&gt;replays&lt;/strong&gt; the conversation log. It doesn't restore the loaded and curated mental model - the understanding of your project's topology, constraints, and current state that the agent built up over those 2.5 hours.&lt;/p&gt;

&lt;p&gt;Resume gives you only the transcript. Not the understanding.&lt;/p&gt;

&lt;h2&gt;
  
  
  Two primitives I already had
&lt;/h2&gt;

&lt;p&gt;Over the last few weeks I wrote about two separate ideas:&lt;/p&gt;

&lt;p&gt;In &lt;a href="https://dev.to/cleverhoods/claudemd-best-practices-the-backboneyml-pattern-30fi"&gt;The backbone.yml Pattern&lt;/a&gt;, I introduced a YAML manifest that maps your project's topology - agents, directories, configs, schemas. &lt;strong&gt;Information.&lt;/strong&gt; The agent reads it once and knows where everything is. No more exploration tax.&lt;/p&gt;

&lt;p&gt;In &lt;a href="https://dev.to/cleverhoods/claudemd-best-practices-mermaid-for-workflows-khb"&gt;Mermaid for Workflows&lt;/a&gt;, I showed how flowcharts give agents reliable step-by-step processes to follow. &lt;strong&gt;Process.&lt;/strong&gt; Structured syntax that sticks out in a context window full of prose, backed by research showing agents follow flowcharts more reliably than natural language.&lt;/p&gt;

&lt;p&gt;Backbone tells the agent &lt;em&gt;what exists&lt;/em&gt;. Workflows tell the agent &lt;em&gt;how to operate&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;But I was using them separately. I'd tell Claude "read the backbone" at session start, then invoke workflows as needed. Manual orchestration. Every session, same ritual. &lt;/p&gt;

&lt;p&gt;Why am I doing this separately? &lt;strong&gt;Isn't context just &lt;em&gt;Information&lt;/em&gt; + &lt;em&gt;Process&lt;/em&gt; ?&lt;/strong&gt; &lt;/p&gt;

&lt;p&gt;Read the map. Follow the process. Produce a working mental model. Every session, one command.&lt;/p&gt;

&lt;p&gt;That's &lt;code&gt;/bootstrap&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  What /bootstrap does
&lt;/h2&gt;

&lt;p&gt;One command. Two modes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;First run&lt;/strong&gt; (no backbone exists): scans the project, detects agents and structure, generates a &lt;code&gt;backbone.yml&lt;/code&gt;, then synthesizes a context report.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Every subsequent run&lt;/strong&gt; (backbone exists): reads the backbone, maps agents, loads constraints, checks project state, and produces a mental model.&lt;/p&gt;

&lt;p&gt;Both modes use the diagram + prose combo from the &lt;a href="https://dev.to/cleverhoods/claudemd-best-practices-mermaid-for-workflows-khb"&gt;mermaid post&lt;/a&gt; - flowcharts for the branching, prose for the reasoning behind each step.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5lb18lctptwks9ktwug7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5lb18lctptwks9ktwug7.png" alt="Bootstrap workflow" width="431" height="1291"&gt;&lt;/a&gt;&lt;/p&gt;
Bootstrap workflow



&lt;p&gt;The output looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Bootstrap complete.

Project: my-app v1.2.0 (branch: feature/auth)
Agents: claude (CLAUDE.md), copilot (.github/copilot-instructions.md)
Structure: src/, tests/, docs/, config/

Navigation:
  Agent config → backbone.agents.{agent}
  Project dirs → backbone.paths.{key}
  Schemas      → backbone.schemas.{name}

Operations:
  Build  → npm run build
  Test   → npm test
  Deploy → ./scripts/deploy.sh

Constraints:
  - Never modify config/production.yml directly
  - Always run tests before committing

State: v1.2.0, 3 unreleased changes (auth module)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After this, the agent knows where things are, how to operate, what's off limits, and what's in progress. No exploration. No guessing.&lt;/p&gt;

&lt;h2&gt;
  
  
  Seed mode: the smart first run
&lt;/h2&gt;

&lt;p&gt;Most bootstrapping tools drop a blank template and say "fill this in." That's 0% useful on day one.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;/bootstrap&lt;/code&gt; scans first, generates second. It detects agents across the ecosystem:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Pattern&lt;/th&gt;
&lt;th&gt;Agent&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;CLAUDE.md&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Claude&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;AGENTS.md&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Codex&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;.github/copilot-instructions.md&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Copilot&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;.cursorrules&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Cursor&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;.windsurfrules&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Windsurf&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;.clinerules&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Cline&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;.aider*&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Aider&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;.continue/config.json&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Continue&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;It maps directories, finds configs, detects build/test workflows from &lt;code&gt;package.json&lt;/code&gt;, &lt;code&gt;Makefile&lt;/code&gt;, CI configs. The generated backbone is 70-80% correct from the scan alone.&lt;/p&gt;

&lt;p&gt;The remaining 20% - semantic connections, domain concepts - gets marked with &lt;code&gt;# TODO: refine&lt;/code&gt; so you know exactly where to invest review time. Verified topology. Flagged guesses. One command.&lt;/p&gt;

&lt;h2&gt;
  
  
  The skill structure
&lt;/h2&gt;

&lt;p&gt;I built this as an &lt;a href="https://agentskills.io" rel="noopener noreferrer"&gt;Agent Skill&lt;/a&gt; - the open standard for packaging reusable instructions across agents:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;bootstrap/
  SKILL.md              # Entry point - frontmatter + instructions
  workflows/
    seed.md             # Scan + generate (mermaid flowchart)
    bootstrap.md        # Read + synthesize (mermaid flowchart)
  templates/
    backbone.yml        # Starter backbone shape
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;See the two primitives? The &lt;code&gt;templates/backbone.yml&lt;/code&gt; is the information layer from the &lt;a href="https://dev.to/cleverhoods/claudemd-best-practices-the-backboneyml-pattern-30fi"&gt;backbone post&lt;/a&gt;. The &lt;code&gt;workflows/*.md&lt;/code&gt; files are the process layer from the &lt;a href="https://dev.to/cleverhoods/claudemd-best-practices-mermaid-for-workflows-khb"&gt;mermaid post&lt;/a&gt; - complete with flowcharts, key decisions, and edge cases.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;/bootstrap&lt;/code&gt; is their love child. One skill that reads both primitives and turns them into a loaded context.&lt;/p&gt;

&lt;h2&gt;
  
  
  Cross-agent by design
&lt;/h2&gt;

&lt;p&gt;The SKILL.md format is an open standard created by Anthropic and now adopted by OpenAI, Google, Cursor, and others. A skill authored once works across 30+ agents - the format is filesystem-based, not API-dependent.&lt;/p&gt;

&lt;p&gt;Drop the &lt;code&gt;bootstrap/&lt;/code&gt; folder into &lt;code&gt;.claude/skills/&lt;/code&gt; for Claude Code, &lt;code&gt;.agents/skills/&lt;/code&gt; for Codex CLI, or wherever your agent looks. Same skill, same result.&lt;/p&gt;

&lt;p&gt;This matters because the bootstrap concept isn't Claude-specific. Every coding agent is stateless. Every agent benefits from a loaded mental model at session start. The problem is universal, so the solution should be too.&lt;/p&gt;

&lt;h2&gt;
  
  
  What changes after bootstrap
&lt;/h2&gt;

&lt;p&gt;Before bootstrap, every session starts with the agent exploring. After bootstrap, every session starts with the agent &lt;em&gt;understanding&lt;/em&gt;.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;No more &lt;code&gt;find&lt;/code&gt; / &lt;code&gt;ls&lt;/code&gt; / &lt;code&gt;grep&lt;/code&gt; loops&lt;/strong&gt; to discover what the backbone already maps&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No more wrong assumptions&lt;/strong&gt; about where configs live&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No more repeated corrections&lt;/strong&gt; - "no, the tests are in &lt;code&gt;spec/&lt;/code&gt;, not &lt;code&gt;tests/&lt;/code&gt;"&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No more context poisoning&lt;/strong&gt; from exploration artifacts cluttering the window&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The agent reads the backbone, follows the workflow, synthesizes the context, and starts working. Every session. In seconds.&lt;/p&gt;

&lt;h2&gt;
  
  
  The progression
&lt;/h2&gt;

&lt;p&gt;Looking back at this series, the progression is clear:&lt;/p&gt;

&lt;p&gt;In the &lt;a href="https://dev.to/cleverhoods/claudemd-best-practices-from-basic-to-adaptive-9lm"&gt;capability levels post&lt;/a&gt; - what maturity looks like for instruction files.&lt;br&gt;
In the &lt;a href="https://dev.to/cleverhoods/claudemd-best-practices-the-backboneyml-pattern-30fi"&gt;backbone.yml post&lt;/a&gt; - give the agent a map (information).&lt;br&gt;
In the &lt;a href="https://dev.to/cleverhoods/claudemd-best-practices-mermaid-for-workflows-khb"&gt;mermaid post&lt;/a&gt; - give the agent reliable processes (workflows).&lt;br&gt;
Now - combine both into a single command that loads a mental model.&lt;/p&gt;

&lt;p&gt;Map + Process = Understanding. That's the whole idea.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;

&lt;p&gt;The bootstrap skill will be published as a cross-agent compatible Agent Skill in the &lt;a href="https://github.com/reporails/skills" rel="noopener noreferrer"&gt;Reporails skills repo&lt;/a&gt; this week.&lt;/p&gt;

&lt;p&gt;In the meantime, the pattern works even without the skill:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Create a &lt;code&gt;backbone.yml&lt;/code&gt; mapping your project (&lt;a href="https://dev.to/cleverhoods/claudemd-best-practices-the-backboneyml-pattern-30fi"&gt;template here&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;Add a workflow with a mermaid flowchart for session initialization (&lt;a href="https://dev.to/cleverhoods/claudemd-best-practices-mermaid-for-workflows-khb"&gt;approach here&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;Start every session with: "Load the backbone, follow the bootstrap workflow, and tell me what you understand"&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That's manual bootstrap. The skill just makes it &lt;code&gt;/bootstrap&lt;/code&gt;.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Don't start a session. Bootstrap it.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;p&gt;&lt;em&gt;This post is part of the &lt;a href="https://dev.to/cleverhoods/series/35305"&gt;Reporails series&lt;/a&gt;. Previous: &lt;a href="https://dev.to/cleverhoods/claudemd-best-practices-mermaid-for-workflows-khb"&gt;Mermaid for Workflows&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>productivity</category>
      <category>tutorial</category>
      <category>architecture</category>
    </item>
    <item>
      <title>CLAUDE.md Best Practices: Mermaid for Workflows</title>
      <dc:creator> Gábor Mészáros</dc:creator>
      <pubDate>Tue, 17 Feb 2026 12:04:57 +0000</pubDate>
      <link>https://forem.com/cleverhoods/claudemd-best-practices-mermaid-for-workflows-khb</link>
      <guid>https://forem.com/cleverhoods/claudemd-best-practices-mermaid-for-workflows-khb</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;I picture says a thousand words. I wanted to see my system.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Not the code. I wanted to see the &lt;strong&gt;workflows&lt;/strong&gt;. What happens when a rule gets validated. What happens when a session starts. What happens when compaction triggers. Systems are workflows, and I couldn't see mine.&lt;/p&gt;

&lt;p&gt;I had them written down, of course. Prose paragraphs in CLAUDE.md/SKILL.md or RULES describing each process step by step. But past four or five steps with branching, the prose became unreadable. I'd write it, come back a week later, and need to re-parse the whole thing to understand what I'd written. Mental overload, every time.&lt;/p&gt;

&lt;p&gt;My coding agent had the same problem. Research calls it "&lt;a href="https://arxiv.org/abs/2307.03172" rel="noopener noreferrer"&gt;lost in the middle&lt;/a&gt;" - LLMs perform best with information at the beginning and end of their context, and significantly worse with information buried in the middle. My prose workflows were exactly that: critical branching logic buried in paragraphs, sandwiched between other instructions. Claude would miss steps. Skip branches. Drift from the intended process.&lt;/p&gt;

&lt;p&gt;And the workflows themselves drifted too. I'd remove a pipeline phase and update one paragraph but miss another. Prose makes that invisible - three sentences can reference a removed step and nothing looks broken.&lt;/p&gt;

&lt;p&gt;So I rewrote my workflows as Mermaid diagrams. And three things happened at once:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;I could see the system.&lt;/strong&gt; Rendered Mermaid gives you a visual map of what's happening - for free.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Claude followed them more reliably.&lt;/strong&gt; Structured syntax sticks out in a context window full of prose.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;They stopped rotting.&lt;/strong&gt; You can't leave a dangling arrow in a flowchart the way you can leave a stale sentence in a paragraph.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Turns out there's research backing all three.&lt;/p&gt;

&lt;h2&gt;
  
  
  The research
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;FlowBench&lt;/strong&gt; (&lt;a href="https://arxiv.org/abs/2406.14884" rel="noopener noreferrer"&gt;Xiao et al., EMNLP 2024&lt;/a&gt;) tested how LLM agents perform when given the same workflow knowledge in different formats - natural language, pseudo-code, and flowcharts. Across 51 scenarios on GPT-4o, GPT-4-Turbo, and GPT-3.5-Turbo:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Flowcharts achieved the best trade-off for agent performance&lt;/li&gt;
&lt;li&gt;Combining formats (text + code + flowcharts) outperformed any single format&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Format matters. It measurably affects how well the agent follows your instructions.&lt;/p&gt;

&lt;h2&gt;
  
  
  What to convert
&lt;/h2&gt;

&lt;p&gt;Not everything benefits equally from a diagram. The rule:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;If it has branches, it needs a diagram. If it has judgment, it also needs prose. Most real workflows need both.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Deterministic pipelines - CI/CD, deployment, validation, review workflows - are pure flowchart territory. Every step has a defined outcome, every branch has a condition.&lt;/p&gt;

&lt;p&gt;But most workflows aren't purely deterministic. They have branching &lt;em&gt;and&lt;/em&gt; judgment: "if the tests fail with a type error, fix inline; if it's a logic error, rethink the approach." The diagram captures the branch. The prose below it captures the judgment. Neither format alone carries both.&lt;/p&gt;

&lt;h2&gt;
  
  
  Before and after
&lt;/h2&gt;

&lt;p&gt;Here's what my rule validation workflow looked like before - prose only, describing the same process:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gu"&gt;## Rule Validation&lt;/span&gt;

Run validation on all rules. For each rule, first validate the
schema (fields, types, format). If that passes, check the contract
(.md and .yml matching). If the contract is valid, resolve template
variables and run OpenGrep validation on pattern syntax. If OpenGrep
returns exit 2 or 7, report the error. If it returns 0 or 1,
the rule passes. After all rules are checked, output a summary.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And here's what the Mermaid version looks like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;flowchart TD
    START([/validate-rules options]) --&amp;gt; COLLECT[Collect rules from paths]
    COLLECT --&amp;gt; LOOP[For each rule]
    LOOP --&amp;gt; SCHEMA[1. Schema validation&amp;lt;br/&amp;gt;Fields, types, format]
    SCHEMA --&amp;gt;|fail| REPORT
    SCHEMA --&amp;gt;|pass| CONTRACT[2. Contract validation&amp;lt;br/&amp;gt;.md and .yml matching]
    CONTRACT --&amp;gt;|fail| REPORT
    CONTRACT --&amp;gt;|pass| RESOLVE[Resolve template variables]
    RESOLVE --&amp;gt; OPENGREP[3. OpenGrep validation&amp;lt;br/&amp;gt;Pattern syntax]
    OPENGREP --&amp;gt;|exit 2 or 7| REPORT
    OPENGREP --&amp;gt;|exit 0 or 1| REPORT[Report results]
    REPORT --&amp;gt; NEXT{More rules?}
    NEXT --&amp;gt;|yes| LOOP
    NEXT --&amp;gt;|no| SUMMARY[Summary output]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And the result:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhocw4w3crfqdbwndsul7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhocw4w3crfqdbwndsul7.png" alt="Rendered Mermaid workflow from Reporails rule validation" width="800" height="1222"&gt;&lt;/a&gt;&lt;/p&gt;
Rendered Mermaid workflow from Reporails rule validation



&lt;p&gt;Same information. But the flowchart makes every branch explicit and every failure path visible. Claude can't accidentally skip a validation step or misinterpret which exit codes mean failure.&lt;/p&gt;

&lt;p&gt;But the diagram alone is still only half the answer.&lt;/p&gt;

&lt;h2&gt;
  
  
  The combo: diagram + prose
&lt;/h2&gt;

&lt;p&gt;FlowBench's strongest finding wasn't "use flowcharts" - it was "combine formats." Each format carries what it's best at.&lt;/p&gt;

&lt;p&gt;Here's what one of my actual workflows looks like after conversion - &lt;a href="https://github.com/reporails/rules/blob/main/.shared/workflows/rule-validation.md" rel="noopener noreferrer"&gt;rule-validation.md&lt;/a&gt; from Reporails:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gu"&gt;## Rule Validation Workflow&lt;/span&gt;

​mermaid
flowchart TD
    START([/validate-rules options]) --&amp;gt; COLLECT[Collect rules from paths]
    COLLECT --&amp;gt; LOOP[For each rule]
    LOOP --&amp;gt; SCHEMA[1. Schema validation&lt;span class="nt"&gt;&amp;lt;br/&amp;gt;&lt;/span&gt;Fields, types, format]
    SCHEMA --&amp;gt;|fail| REPORT
    SCHEMA --&amp;gt;|pass| CONTRACT[2. Contract validation&lt;span class="nt"&gt;&amp;lt;br/&amp;gt;&lt;/span&gt;.md and .yml matching]
    CONTRACT --&amp;gt;|fail| REPORT
    CONTRACT --&amp;gt;|pass| RESOLVE[Resolve template variables]
    RESOLVE --&amp;gt; OPENGREP[3. OpenGrep validation&lt;span class="nt"&gt;&amp;lt;br/&amp;gt;&lt;/span&gt;Pattern syntax]
    OPENGREP --&amp;gt;|exit 2 or 7| REPORT
    OPENGREP --&amp;gt;|exit 0 or 1| REPORT[Report results]
    REPORT --&amp;gt; NEXT{More rules?}
    NEXT --&amp;gt;|yes| LOOP
    NEXT --&amp;gt;|no| SUMMARY[Summary output]
​

&lt;span class="gu"&gt;## Why Three Layers in This Order&lt;/span&gt;
&lt;span class="p"&gt;
1.&lt;/span&gt; &lt;span class="gs"&gt;**Schema validation**&lt;/span&gt; catches structural errors (missing fields, wrong
   types) with zero external dependencies. Cheapest check - filters out
   rules that would cause confusing downstream failures.
&lt;span class="p"&gt;
2.&lt;/span&gt; &lt;span class="gs"&gt;**Contract validation**&lt;/span&gt; confirms that rule.md and rule.yml agree.
   Catches the class of bugs where one file was updated but the other
   wasn't. Requires both files to be schema-valid first.
&lt;span class="p"&gt;
3.&lt;/span&gt; &lt;span class="gs"&gt;**OpenGrep validation**&lt;/span&gt; runs actual patterns against the syntax
   checker. Most expensive step - requires template resolution, file I/O,
   agent config loading. Only runs on rules that are already structurally
   sound.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The diagram shows the three-step pipeline with its branches. The prose explains &lt;em&gt;why&lt;/em&gt; that ordering - cheapest first, most expensive last, each layer depending on the previous one being clean. Neither format alone carries both the flow and the reasoning.&lt;/p&gt;

&lt;h2&gt;
  
  
  When to adopt this
&lt;/h2&gt;

&lt;p&gt;If your CLAUDE.md has any of these, you have a flowchart waiting to happen:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"First do X. If X passes, do Y. If Y fails, do Z."&lt;/li&gt;
&lt;li&gt;"Run A, then B, then C. If any step fails, stop."&lt;/li&gt;
&lt;li&gt;"Check for X. If found, do Y. Otherwise, do Z."&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Sequential steps with conditions = flowchart. Convert those, leave everything else as prose.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Find a workflow in your CLAUDE.md that reads like a recipe with conditions&lt;/li&gt;
&lt;li&gt;Rewrite the control flow as Mermaid&lt;/li&gt;
&lt;li&gt;Keep the rationale and judgment calls as prose below the diagram&lt;/li&gt;
&lt;li&gt;Delete the original prose-only version&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;One converted workflow. See if Claude follows it more reliably - and enjoy being able to &lt;em&gt;see&lt;/em&gt; your system for the first time.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Don't describe the path. Draw the map.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;p&gt;*The FlowBench paper is at &lt;a href="https://arxiv.org/abs/2406.14884" rel="noopener noreferrer"&gt;arxiv.org/abs/2406.14884&lt;/a&gt;. The "lost in the middle" paper is at &lt;a href="https://arxiv.org/abs/2307.03172" rel="noopener noreferrer"&gt;arxiv.org/abs/2307.03172&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;I'm building instruction file governance at &lt;a href="https://github.com/reporails/rules" rel="noopener noreferrer"&gt;Reporails&lt;/a&gt; - this finding led to a new rule category (Context Quality) that I'll cover in the next post.*&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Previous in series: &lt;a href="https://dev.to/cleverhoods/claudemd-best-practices-the-backboneyml-pattern-30fi"&gt;The backbone.yml Pattern&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>claudecode</category>
      <category>ai</category>
      <category>productivity</category>
      <category>devtools</category>
    </item>
    <item>
      <title>Reporails: Copilot adapter, built with copilot, for copilot.</title>
      <dc:creator> Gábor Mészáros</dc:creator>
      <pubDate>Mon, 16 Feb 2026 07:54:28 +0000</pubDate>
      <link>https://forem.com/cleverhoods/reporails-copilot-adapter-built-with-copilot-for-copilot-2gfo</link>
      <guid>https://forem.com/cleverhoods/reporails-copilot-adapter-built-with-copilot-for-copilot-2gfo</guid>
      <description>&lt;p&gt;&lt;em&gt;This is a submission for the &lt;a href="https://dev.to/challenges/github-2026-01-21"&gt;GitHub Copilot CLI Challenge&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Built
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://github.com/reporails" rel="noopener noreferrer"&gt;Reporails&lt;/a&gt; is a validator for AI agent instruction files: CLAUDE.md, AGENTS.md, copilot-instructions.md. It scores your files, tells you what's missing, and helps you fix it.&lt;/p&gt;

&lt;p&gt;The project already supported Claude Code and Codex. For this challenge, I added &lt;strong&gt;GitHub Copilot CLI as a first-class supported agent&lt;/strong&gt; - using Copilot CLI itself to build the adapter.&lt;/p&gt;

&lt;p&gt;The architecture was already multi-agent by design. A &lt;code&gt;.shared/&lt;/code&gt; directory holds agent-agnostic workflows and knowledge. Each agent gets its own adapter that wires into the shared content. Claude does it through &lt;code&gt;.claude/skills/&lt;/code&gt;, Copilot through &lt;code&gt;.github/copilot-instructions.md&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Adding Copilot took &lt;strong&gt;113 lines&lt;/strong&gt;. Not because the work was trivial - but because the architecture was ready.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Repos:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;CLI: &lt;a href="https://github.com/reporails/cli" rel="noopener noreferrer"&gt;reporails/cli&lt;/a&gt; (v0.3.0)&lt;/li&gt;
&lt;li&gt;Rules: &lt;a href="https://github.com/reporails/rules" rel="noopener noreferrer"&gt;reporails/rules&lt;/a&gt; (v0.4.0)&lt;/li&gt;
&lt;li&gt;Recommended: &lt;a href="https://github.com/reporails/recommended" rel="noopener noreferrer"&gt;reporails/recommended&lt;/a&gt; (v0.2.0)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Demo
&lt;/h2&gt;

&lt;p&gt;After adding Copilot support, each agent gets its own rule set with no cross-contamination:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Agent&lt;/th&gt;
&lt;th&gt;Rules&lt;/th&gt;
&lt;th&gt;Breakdown&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Copilot&lt;/td&gt;
&lt;td&gt;29&lt;/td&gt;
&lt;td&gt;30 CORE - 1 excluded + 0 COPILOT-specific&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Claude&lt;/td&gt;
&lt;td&gt;39&lt;/td&gt;
&lt;td&gt;30 CORE - 1 excluded + 10 CLAUDE-specific&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Codex&lt;/td&gt;
&lt;td&gt;37&lt;/td&gt;
&lt;td&gt;30 CORE + 7 CODEX-specific&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Run it yourself:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx @reporails/cli check &lt;span class="nt"&gt;--agent&lt;/span&gt; copilot
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  My Experience with GitHub Copilot CLI
&lt;/h2&gt;

&lt;h3&gt;
  
  
  It understood the architecture immediately
&lt;/h3&gt;

&lt;p&gt;I explained the &lt;code&gt;.shared/&lt;/code&gt; folder — that it was created specifically so both Claude and Copilot (and other agents) can reference the same workflows and knowledge without duplication. Copilot got it on the first exchange:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqoefznpb6t0hjh8l7old.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqoefznpb6t0hjh8l7old.png" alt="Copilot understanding .shared/ architecture" width="800" height="416"&gt;&lt;/a&gt;&lt;/p&gt;
Copilot understanding .shared/ architecture



&lt;p&gt;The key insight it surfaced: "The .shared/ content is already agent-agnostic. Both agents reference the same workflows. No duplication is needed - just different entry points."&lt;/p&gt;

&lt;p&gt;That's exactly right. Claude reaches shared workflows through &lt;code&gt;/generate-rule&lt;/code&gt; → &lt;code&gt;.claude/skills/&lt;/code&gt; → &lt;code&gt;.shared/workflows/rule-creation.md&lt;/code&gt;. Copilot reads instructions → &lt;code&gt;.shared/workflows/rule-creation.md&lt;/code&gt;. Same destination, different front doors.&lt;/p&gt;

&lt;h3&gt;
  
  
  What it built
&lt;/h3&gt;

&lt;p&gt;Copilot created the full adapter in three phases:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Foundation&lt;/strong&gt; - &lt;code&gt;.github/copilot-instructions.md&lt;/code&gt;, &lt;code&gt;agents/copilot/config.yml&lt;/code&gt;, updated &lt;code&gt;backbone.yml&lt;/code&gt;, verified test harness supports &lt;code&gt;--agent copilot&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Workflow Wiring&lt;/strong&gt; - entry points in copilot-instructions.md, context-specific conditional instructions, wired to &lt;code&gt;.shared/workflows/&lt;/code&gt; and &lt;code&gt;.shared/knowledge/&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Documentation&lt;/strong&gt; - updated README and CONTRIBUTING with agent-agnostic workflow guidance&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwrcu4g25up7ii012xtx6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwrcu4g25up7ii012xtx6.png" alt="Copilot Contribution Parity Complete" width="800" height="433"&gt;&lt;/a&gt;&lt;/p&gt;
Copilot Contribution Parity Complete



&lt;h3&gt;
  
  
  The bug it found (well, helped find)
&lt;/h3&gt;

&lt;p&gt;While testing the Copilot adapter, I discovered that the test harness had a cross-contamination bug. When running &lt;code&gt;--agent copilot&lt;/code&gt;, it was testing CODEX rules too — because &lt;code&gt;_scan_root()&lt;/code&gt; scanned ALL &lt;code&gt;agents/*/rules/&lt;/code&gt; directories indiscriminately.&lt;/p&gt;

&lt;p&gt;The fix was three lines of Python:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# If agent is specified, only scan that agent's rules directory
&lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;agent_dir&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;continue&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz2wqbd0tqkpiet99eyvn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz2wqbd0tqkpiet99eyvn.png" alt="Test Harness Agent Isolation Fix" width="800" height="383"&gt;&lt;/a&gt;Test Harness Agent Isolation Fix&lt;/p&gt;

&lt;h3&gt;
  
  
  The model selector surprise
&lt;/h3&gt;

&lt;p&gt;When I opened the Copilot CLI model selector, the default model was &lt;strong&gt;Claude Sonnet 4.5&lt;/strong&gt;. The irony of building a Copilot adapter using Copilot CLI running Claude was not lost on me.&lt;/p&gt;

&lt;h3&gt;
  
  
  What worked, honestly
&lt;/h3&gt;

&lt;p&gt;Copilot CLI understood multi-agent architecture without hand-holding. It generated correct config files matching existing adapter patterns. The co-author signature was properly included in all commits. It didn't try to duplicate content that was already shared - it just wired the entry points.&lt;/p&gt;

&lt;p&gt;The whole experience reinforced something I've been thinking about: the tool matters less than the architecture underneath. If your project is structured well, any competent agent can extend it. That's the whole point of reporails - making sure your instruction files are good enough that the agent can actually help you.&lt;/p&gt;

&lt;h3&gt;
  
  
  What also happened during this challenge
&lt;/h3&gt;

&lt;p&gt;While building the Copilot adapter, I also rebuilt the entire rules framework from scratch. Went from 47 rules (v0.3.1) to 35 rules (v0.4.0) - fewer rules, dramatically higher quality. Every rule is now distinct, detectable, and backed by evidence. But that's a story for another post.&lt;/p&gt;




&lt;p&gt;Try it: &lt;code&gt;npx @reporails/cli check&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/reporails" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; | &lt;a href="https://dev.to/cleverhoods"&gt;Previous posts&lt;/a&gt;&lt;/p&gt;

</description>
      <category>devchallenge</category>
      <category>githubchallenge</category>
      <category>cli</category>
      <category>githubcopilot</category>
    </item>
    <item>
      <title>CLAUDE.md Best Practices: The backbone.yml Pattern</title>
      <dc:creator> Gábor Mészáros</dc:creator>
      <pubDate>Tue, 10 Feb 2026 12:31:44 +0000</pubDate>
      <link>https://forem.com/cleverhoods/claudemd-best-practices-the-backboneyml-pattern-30fi</link>
      <guid>https://forem.com/cleverhoods/claudemd-best-practices-the-backboneyml-pattern-30fi</guid>
      <description>&lt;p&gt;There's a Dutch scouting tradition called "dropping." Kids get driven to an unfamiliar forest at night - sometimes blindfolded - and have to find their way back to camp. It builds independence, problem-solving, resilience.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;That's what most people do to their AI agents.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Drop them in a codebase. No orientation. Figure it out. (&lt;em&gt;Veel succes en heel gezellig&lt;/em&gt;, as the Dutch would say.)&lt;/p&gt;

&lt;p&gt;The difference is that, unlike people, the AI Agent memory goes as far as context allows.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;find &lt;span class="nb"&gt;.&lt;/span&gt; &lt;span class="nt"&gt;-name&lt;/span&gt; &lt;span class="s2"&gt;"*.yml"&lt;/span&gt; &lt;span class="nt"&gt;-type&lt;/span&gt; f
&lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; &lt;span class="s2"&gt;"config"&lt;/span&gt; &lt;span class="nt"&gt;--include&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"*.md"&lt;/span&gt;
&lt;span class="nb"&gt;ls&lt;/span&gt; &lt;span class="nt"&gt;-la&lt;/span&gt; .claude/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The agent explores. Makes wrong assumptions. Gets corrected. Tries again. Eventually finds what it needs, or doesn't and quietly poison context.&lt;/p&gt;

&lt;p&gt;I call this the &lt;strong&gt;exploration tax&lt;/strong&gt; - the &lt;strong&gt;tokens&lt;/strong&gt; and &lt;strong&gt;time&lt;/strong&gt; spent orienting instead of working.&lt;/p&gt;

&lt;h2&gt;
  
  
  Give the agent a map
&lt;/h2&gt;

&lt;p&gt;The fix is simple: one file that maps your project.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# backbone.yml&lt;/span&gt;
&lt;span class="na"&gt;version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;

&lt;span class="na"&gt;structure&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;config&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;config/&lt;/span&gt;
  &lt;span class="na"&gt;src&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;src/&lt;/span&gt;
  &lt;span class="na"&gt;tests&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;tests/&lt;/span&gt;
  &lt;span class="na"&gt;docs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;docs/&lt;/span&gt;

&lt;span class="na"&gt;conventions&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;test_pattern&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;*.test.ts"&lt;/span&gt;
  &lt;span class="na"&gt;config_format&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;yaml&lt;/span&gt;

&lt;span class="na"&gt;boundaries&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;never_modify&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;.env&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;migrations/&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;vendor/&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's enough to start. Claude reads this once and knows: config lives in &lt;code&gt;config/&lt;/code&gt;, tests are &lt;code&gt;*.test.ts&lt;/code&gt;, never touch &lt;code&gt;.env&lt;/code&gt; or &lt;code&gt;migrations/&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;No more exploration loops. No more wrong guesses. No more "sorry, I thought the config was in the root directory."&lt;/p&gt;

&lt;h2&gt;
  
  
  Scaling up
&lt;/h2&gt;

&lt;p&gt;As your project grows, so can your backbone. Here's what mine looks like for &lt;a href="https://github.com/reporails/rules/blob/main/.reporails/backbone.yml" rel="noopener noreferrer"&gt;Reporails rules&lt;/a&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;3&lt;/span&gt;

&lt;span class="na"&gt;agents&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;claude&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;main_instruction_file&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;CLAUDE.md&lt;/span&gt;
    &lt;span class="na"&gt;config&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;agents/claude/config.yml&lt;/span&gt;
    &lt;span class="na"&gt;skills&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;.claude/skills/&lt;/span&gt;
    &lt;span class="na"&gt;tasks&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;.claude/tasks/&lt;/span&gt;
  &lt;span class="na"&gt;codex&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;config&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;agents/codex/config.yml&lt;/span&gt;

&lt;span class="na"&gt;rules&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;core&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;core/&lt;/span&gt;
  &lt;span class="na"&gt;agents&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;agents/&lt;/span&gt;
  &lt;span class="na"&gt;patterns&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;rule_dir&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;{category}/{slug}/"&lt;/span&gt;
    &lt;span class="na"&gt;definition&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rule.md"&lt;/span&gt;
    &lt;span class="na"&gt;test_pass&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tests/pass/"&lt;/span&gt;
    &lt;span class="na"&gt;test_fail&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tests/fail/"&lt;/span&gt;
  &lt;span class="na"&gt;categories&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;structure&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;core/structure/&lt;/span&gt;
    &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;core/content/&lt;/span&gt;
    &lt;span class="na"&gt;efficiency&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;core/efficiency/&lt;/span&gt;
    &lt;span class="na"&gt;maintenance&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;core/maintenance/&lt;/span&gt;

&lt;span class="na"&gt;schemas&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;rule&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;schemas/rule.schema.yml&lt;/span&gt;
  &lt;span class="na"&gt;capability&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;schemas/capability.schema.yml&lt;/span&gt;
  &lt;span class="na"&gt;agent&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;schemas/agent.schema.yml&lt;/span&gt;

&lt;span class="na"&gt;registry&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;capabilities&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;registry/capabilities.yml&lt;/span&gt;
  &lt;span class="na"&gt;levels&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;registry/levels.yml&lt;/span&gt;
  &lt;span class="na"&gt;coordinate_map&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;registry/coordinate-map.yml&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Multiple agents, rule patterns, schemas, registries - all mapped. Claude can construct paths directly instead of exploring.&lt;/p&gt;

&lt;h2&gt;
  
  
  Wiring it up
&lt;/h2&gt;

&lt;p&gt;The backbone file alone isn't enough - you need to tell Claude to use it. Add this to your CLAUDE.md:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gu"&gt;## Initialization&lt;/span&gt;

Read these files before searching or modifying anything:
&lt;span class="p"&gt;
1.&lt;/span&gt; Read &lt;span class="sb"&gt;`backbone.yml`&lt;/span&gt; for project structure and path resolution
&lt;span class="p"&gt;2.&lt;/span&gt; Read any registries or schemas referenced there as needed
&lt;span class="p"&gt;3.&lt;/span&gt; Read &lt;span class="sb"&gt;`.claude/rules/`&lt;/span&gt; for context-specific constraints

&lt;span class="gu"&gt;## Structure&lt;/span&gt;

Defined in &lt;span class="sb"&gt;`backbone.yml`&lt;/span&gt; - the single source of truth for project topology.

&lt;span class="gs"&gt;**BEFORE**&lt;/span&gt; running &lt;span class="sb"&gt;`find`&lt;/span&gt;, &lt;span class="sb"&gt;`grep`&lt;/span&gt;, &lt;span class="sb"&gt;`ls`&lt;/span&gt;, or glob to locate project files, read &lt;span class="sb"&gt;`backbone.yml`&lt;/span&gt; first. All paths are mapped there. Do not use exploratory commands to discover paths that the backbone already provides.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is the key: explicit instruction to read the map before exploring. Without it, Claude might still wander.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why a separate file?
&lt;/h2&gt;

&lt;p&gt;You could put all of this directly in your CLAUDE.md. But there's a tradeoff.&lt;/p&gt;

&lt;p&gt;Everything in CLAUDE.md sits in the context window from the start - every session, every message, whether the agent needs it or not.&lt;/p&gt;

&lt;p&gt;backbone.yml is read-on-demand. Claude doesn't load it at session start - it reads it when it would otherwise start exploring. The map replaces discovery, not adds to it.&lt;/p&gt;

&lt;p&gt;There are also things a directory structure can't express:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Patterns.&lt;/strong&gt; &lt;code&gt;{category}/{slug}/rule.md&lt;/code&gt; isn't a folder - it's a convention.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Relationships.&lt;/strong&gt; Which agent owns which config? What schema validates what file?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Boundaries.&lt;/strong&gt; What's off-limits? What's deprecated?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Directories show what exists. backbone.yml shows how it fits together.&lt;/p&gt;

&lt;h2&gt;
  
  
  The cost of exploration
&lt;/h2&gt;

&lt;p&gt;I tracked my Claude Code usage across 176 sessions. A significant chunk of friction came from wrong assumptions about project structure:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Used the wrong YAML library (PyYAML instead of ruamel.yaml)&lt;/li&gt;
&lt;li&gt;Wrote changes to the wrong repo in a monorepo&lt;/li&gt;
&lt;li&gt;Assumed directories existed that didn't&lt;/li&gt;
&lt;li&gt;Missed config files that were right there&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each mistake costs tokens, time, and trust. The models are smart enough - the problem is orientation.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where this fits
&lt;/h2&gt;

&lt;p&gt;In my &lt;a href="https://dev.to/cleverhoods/claudemd-best-practices-from-basic-to-adaptive-9lm"&gt;previous post&lt;/a&gt;, I introduced capability levels for instruction files:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;L1-L2&lt;/strong&gt;: CLAUDE.md exists, has basic constraints&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;L3&lt;/strong&gt;: External references, multiple files&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;L4&lt;/strong&gt;: Path-scoped rules that load conditionally&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;L5&lt;/strong&gt;: backbone.yml - maintained structure, active upkeep&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;L6&lt;/strong&gt;: Dynamic context, skills, MCP integration&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Most setups stop at L2-3. The jump to L5 isn't about adding more rules - it's about making your existing setup navigable. backbone.yml is how you get there.&lt;/p&gt;

&lt;h2&gt;
  
  
  When to adopt this
&lt;/h2&gt;

&lt;p&gt;Not every project needs it. Weekend hack? Basic CLAUDE.md is fine.&lt;/p&gt;

&lt;p&gt;But if you notice:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Claude repeatedly exploring the same directories&lt;/li&gt;
&lt;li&gt;Wrong assumptions about project structure&lt;/li&gt;
&lt;li&gt;Corrections like "no, the config is in X, not Y"&lt;/li&gt;
&lt;li&gt;Monorepo confusion about which repo to modify&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;...you're paying the exploration tax. A backbone file pays for itself in the first session.&lt;/p&gt;

&lt;h2&gt;
  
  
  Keep it accurate
&lt;/h2&gt;

&lt;p&gt;A backbone.yml only works if it's true. Paths that don't resolve, patterns that don't match reality - those are worse than no map at all.&lt;/p&gt;

&lt;p&gt;Structure that rots is worse than no structure.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Create &lt;code&gt;backbone.yml&lt;/code&gt; in your project root&lt;/li&gt;
&lt;li&gt;Map your directories, configs, conventions&lt;/li&gt;
&lt;li&gt;Add the initialization section to your CLAUDE.md&lt;/li&gt;
&lt;li&gt;Watch Claude stop guessing&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;I use this with Claude Code daily. The pattern should work for any agent that reads instruction files - Codex, Copilot, Cursor - though I haven't tested all of them. If you try it, let me know how it goes.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Don't drop your agent in the dark. Give it a map.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;p&gt;&lt;em&gt;&lt;a href="https://github.com/reporails/rules" rel="noopener noreferrer"&gt;Reporails&lt;/a&gt; is where I'm building instruction file governance. The &lt;a href="https://github.com/reporails/rules/blob/main/.reporails/backbone.yml" rel="noopener noreferrer"&gt;backbone.yml example&lt;/a&gt; above is from there.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>claudecode</category>
      <category>ai</category>
      <category>devtools</category>
      <category>productivity</category>
    </item>
    <item>
      <title>CLAUDE.md best practices - From Basic to Adaptive</title>
      <dc:creator> Gábor Mészáros</dc:creator>
      <pubDate>Tue, 03 Feb 2026 12:15:28 +0000</pubDate>
      <link>https://forem.com/cleverhoods/claudemd-best-practices-from-basic-to-adaptive-9lm</link>
      <guid>https://forem.com/cleverhoods/claudemd-best-practices-from-basic-to-adaptive-9lm</guid>
      <description>&lt;blockquote&gt;
&lt;h2&gt;
  
  
  &lt;em&gt;How do you learn new things as a developer?&lt;/em&gt;
&lt;/h2&gt;
&lt;/blockquote&gt;

&lt;p&gt;My take on it is to find yourself an actual project (&lt;em&gt;not tutorials&lt;/em&gt;) and start &lt;strong&gt;iterating&lt;/strong&gt;. I wanted to learn LangGraph for my SageCompass project. SageCompass is a monorepo with LangGraph + Drupal (for RAG content management) and Gradio (for UI). &lt;/p&gt;

&lt;p&gt;I iterated ... &lt;em&gt;&lt;strong&gt;a LOT&lt;/strong&gt;&lt;/em&gt;. &lt;strong&gt;&lt;em&gt;A lot lot&lt;/em&gt;&lt;/strong&gt;. &lt;a href="https://dev.to/cleverhoods/from-prompt-to-platform-architecture-rules-i-use-59gp"&gt;A lot lot lot.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;After 2 months of learning the principles of managing a python project and on top of that a LangGraph project, I felt ready to start using a coding agent (Codex at that time), to reduce refactoring times. As it turned out, coding agents are working significantly more reliably, if you have strong boundaries. I had my unit test structure hammered out, directives and contract were clear and strongly defined.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;&lt;em&gt;However&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;SageCompass is a monorepo. I needed some highly elevated AGENTS.md setup to manage all of them together. The LangGraph part? Tight. Contracts, test structure, clear boundaries - the agent barely needed hand-holding. The Drupal part? I've been working with Drupal for 17 years. I know what I need, but I hadn't written it down for an agent yet. The Gradio part? I was still learning it myself - &lt;em&gt;how do you write instructions for something you don't fully understand yet&lt;/em&gt;?&lt;/p&gt;

&lt;p&gt;I couldn't just have one big instruction file. Each component was at a different stage of readiness. Copy-pasting rules across them would have been worse than having no rules at all.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;&lt;strong&gt;That's when it hit me&lt;/strong&gt;: instruction setups have capability levels. And if they have levels, they can be measured. And if they can be measured, they can be improved systematically.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Emergence of capability levels
&lt;/h2&gt;

&lt;p&gt;When I tried to port my LangGraph rules to the Gradio component, I needed to figure out which ones were universal and which ones were specific to a well-established, contract-heavy setup. &lt;/p&gt;

&lt;p&gt;A rule like &lt;strong&gt;&lt;em&gt;'never commit .env files'&lt;/em&gt;&lt;/strong&gt; applies everywhere. A rule like &lt;strong&gt;&lt;em&gt;'implement nodes as make_node&lt;/em&gt;* &lt;em&gt;factories'&lt;/em&gt;&lt;/strong&gt; is meaningless outside LangGraph.&lt;br&gt;
That forced me to categorize. Not just what rules do, but what level of project capability they assume. &lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;&lt;strong&gt;A basic project needs different instructions than one with enforced contracts and navigation maps.&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;


&lt;h2&gt;
  
  
  What I found?
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;&lt;strong&gt;A starting point and Six levels. L1 to L6.&lt;/strong&gt;&lt;/em&gt;&lt;br&gt;
&lt;/p&gt;


&lt;/blockquote&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;L0  Absent      → No instruction file (The starting point)
L1  Basic       → File exists, tracked
L2  Scoped      → Project-specific constraints  
L3  Structured  → External references, modular
L4  Abstracted  → Path-scoped loading
L5  Maintained  → Structural discipline
L6  Adaptive    → Dynamic context, skills, MCP
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here's what each one means in practice.&lt;/p&gt;




&lt;h2&gt;
  
  
  L0: Absent
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;No CLAUDE.md. No AGENTS.md. Nothing.&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Claude works from its training data and whatever it can infer from your code. It'll guess your stack from package.json, maybe pick up patterns from existing files. But it has zero guidance about your preferences, constraints, or "never do this" rules.&lt;/p&gt;

&lt;p&gt;For quick scripts or throwaway experiments, this is fine. For anything you'll maintain, you're probably leaving value on the table.&lt;/p&gt;




&lt;h2&gt;
  
  
  L1: Basic
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;your-project/
└── CLAUDE.md       ← exists
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;&lt;em&gt;A file exists. It's tracked in git.&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Content might be &lt;code&gt;/init&lt;/code&gt; boilerplate — the auto-generated stuff Claude Code produces. Might be a few lines you wrote yourself. The point is you've acknowledged that Claude needs context, and you've given it somewhere to live.&lt;/p&gt;

&lt;p&gt;This is the "I know this matters" stage. Most people get here quickly.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;What changes&lt;/strong&gt;: Claude has &lt;em&gt;something&lt;/em&gt; project-specific. It knows this isn't just a random repo.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What's still missing&lt;/strong&gt;: Rules. Claude knows &lt;em&gt;about&lt;/em&gt; your project, but not your &lt;em&gt;constraints&lt;/em&gt;.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  L2: Scoped
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# CLAUDE.md&lt;/span&gt;

&lt;span class="gu"&gt;## Project&lt;/span&gt;
E-commerce API, Node.js, PostgreSQL.

&lt;span class="gu"&gt;## Constraints&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; MUST use TypeScript strict mode
&lt;span class="p"&gt;-&lt;/span&gt; MUST NOT use &lt;span class="sb"&gt;`any`&lt;/span&gt; type  
&lt;span class="p"&gt;-&lt;/span&gt; MUST run tests before committing
&lt;span class="p"&gt;-&lt;/span&gt; NEVER modify migration files directly
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;&lt;em&gt;Explicit constraints. &lt;a href="https://www.rfc-editor.org/rfc/rfc2119.html" rel="noopener noreferrer"&gt;MUSTs and MUST NOTs.&lt;/a&gt;&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This is where you stop describing and start prescribing. Not just "here's what the project is" but "here's what you can and cannot do."&lt;/p&gt;

&lt;p&gt;The language matters. "Prefer TypeScript" is a suggestion Claude might ignore. "MUST use TypeScript strict mode" is a rule it tends to follow.&lt;/p&gt;

&lt;p&gt;For small projects with simple conventions, this is often enough. You have your rules in one place. Claude follows them. Life is reasonable.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;What changes&lt;/strong&gt;: Claude follows &lt;em&gt;your&lt;/em&gt; rules, not just generic best practices.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What's still missing&lt;/strong&gt;: Scale. When the file gets long, important stuff gets lost in the noise.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  L3: Structured
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# CLAUDE.md&lt;/span&gt;

See @docs/architecture.md for system overview.
See @docs/api-conventions.md for API patterns.

&lt;span class="gu"&gt;## Constraints&lt;/span&gt;
...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;&lt;em&gt;External references. Multiple files. Content split by concern.&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;You've hit the point where one file isn't working anymore. So you break it up. Architecture in one place. API conventions in another. Your CLAUDE.md becomes a router pointing to the right context.&lt;/p&gt;

&lt;p&gt;This is also where team collaboration gets easier. Different people can own different files.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;What changes&lt;/strong&gt;: Separation of concerns. Easier to maintain. Each file has a job.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What's still missing&lt;/strong&gt;: All files load regardless of what you're working on. Editing tests? Claude still loads your API conventions. Noisy.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  L4: Abstracted
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;your-project/
├── CLAUDE.md
└── .claude/
    └── rules/
        ├── api-rules.md        # paths: src/api/**
        ├── frontend-rules.md   # paths: src/components/**
        └── test-rules.md       # paths: tests/**
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;&lt;em&gt;Path-scoped loading. Different rules for different parts of the codebase.&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Edit &lt;code&gt;src/api/users.ts&lt;/code&gt;? Only API rules load. Edit &lt;code&gt;tests/user.test.ts&lt;/code&gt;? Only test rules load.&lt;/p&gt;

&lt;p&gt;This is where context efficiency gets real. You're not wasting tokens on irrelevant rules. Claude's attention stays on what matters for the task at hand.&lt;/p&gt;

&lt;p&gt;How you implement this depends on the tool. Claude Code uses &lt;code&gt;.claude/rules/&lt;/code&gt; with frontmatter. Cursor uses &lt;code&gt;.cursor/rules/&lt;/code&gt;. The concept is the same.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;What changes&lt;/strong&gt;: Claude adapts to &lt;em&gt;what you're working on&lt;/em&gt;, not just what project you're in.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What's still missing&lt;/strong&gt;: Maintenance. Structures rot. Rules go stale.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  L5: Maintained
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;L4 with discipline.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;Same structure, but with habits to keep it current:&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A backbone file mapping the codebase, updated when things change&lt;/li&gt;
&lt;li&gt;Some way to track what's stale&lt;/li&gt;
&lt;li&gt;Regular reviews (however often makes sense for you)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The difference between L4 and L5 isn't features — it's upkeep. L4 is "I set this up." L5 is "I keep it working."&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;What changes&lt;/strong&gt;: Reliability over time. The setup doesn't quietly rot.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What's still missing&lt;/strong&gt;: Dynamic capabilities. Claude follows instructions but can't extend itself.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  L6: Adaptive
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;your-project/
├── CLAUDE.md
├── .claude/
│   ├── rules/
│   └── skills/
│       ├── database-migrations/
│       │   └── SKILL.md
│       └── api-testing/
│           └── SKILL.md
└── mcp.json
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;&lt;em&gt;Skills that load based on task. MCP servers for external integrations.&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;At this level, Claude doesn't just follow instructions — it loads capabilities. Working on migrations? The migration skill activates with its own context. Need to hit an external API? MCP handles it.&lt;/p&gt;

&lt;p&gt;Very few setups are here yet. The tooling is new. The patterns are still emerging.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;What changes&lt;/strong&gt;: Claude extends its abilities based on what it detects you're doing.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Quick self-check
&lt;/h2&gt;

&lt;p&gt;Where do you land?&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Question&lt;/th&gt;
&lt;th&gt;If yes...&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;em&gt;Do you have any instruction file?&lt;/em&gt;&lt;/td&gt;
&lt;td&gt;At least &lt;strong&gt;L1&lt;/strong&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;em&gt;Does it have explicit constraints (MUST/MUST NOT)?&lt;/em&gt;&lt;/td&gt;
&lt;td&gt;At least &lt;strong&gt;L2&lt;/strong&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;em&gt;Do you use @imports or multiple files?&lt;/em&gt;&lt;/td&gt;
&lt;td&gt;At least &lt;strong&gt;L3&lt;/strong&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;em&gt;Do different paths load different rules?&lt;/em&gt;&lt;/td&gt;
&lt;td&gt;At least &lt;strong&gt;L4&lt;/strong&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;em&gt;Do you actively maintain the structure?&lt;/em&gt;&lt;/td&gt;
&lt;td&gt;At least &lt;strong&gt;L5&lt;/strong&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;em&gt;Do you use skills or MCP?&lt;/em&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;L6&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;From what I've seen, most setups are &lt;strong&gt;L1&lt;/strong&gt; &lt;em&gt;(Basic)&lt;/em&gt; or &lt;strong&gt;L2&lt;/strong&gt; &lt;em&gt;(Scoped)&lt;/em&gt;. Some reach &lt;strong&gt;L3&lt;/strong&gt; &lt;em&gt;(Structured)&lt;/em&gt;. &lt;strong&gt;L4&lt;/strong&gt; &lt;em&gt;(abstracted)&lt;/em&gt; and above is rare - not because it's hard, but because the patterns aren't widely known yet.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why bother with levels?
&lt;/h2&gt;

&lt;p&gt;It's not about chasing a high score.&lt;/p&gt;

&lt;p&gt;It's about having words for things.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"I'm at &lt;strong&gt;L2&lt;/strong&gt; &lt;em&gt;(Scoped)&lt;/em&gt; and wondering if &lt;strong&gt;L4&lt;/strong&gt; &lt;em&gt;(abstracted)&lt;/em&gt; is worth the effort" &lt;strong&gt;&lt;em&gt;is a conversation you can actually have.&lt;/em&gt;&lt;/strong&gt; "My CLAUDE.md is pretty good" isn't.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The right level depends on your project. A weekend hack doesn't need path scoping. A complex system with multiple domains probably does. The framework just helps you think about where you are and where you might want to go.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I'm building
&lt;/h2&gt;

&lt;p&gt;I'm working on a validator that uses this framework: &lt;strong&gt;&lt;em&gt;detects your level, checks structure, score your setup&lt;/em&gt;&lt;/strong&gt;. &lt;em&gt;(If you run it from Claude Code CLI, it helps you fix issues too.)&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;It's early. Like, really early. I'm still working through core level implementations. But if you want to poke at it and tell me what's broken, I'd appreciate it:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Reporails CLI:&lt;/strong&gt; &lt;a href="https://github.com/reporails/cli" rel="noopener noreferrer"&gt;github.com/reporails/cli&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Or just use the levels as a mental model. &lt;strong&gt;&lt;em&gt;That's the real value anyway.&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Links
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/reporails/rules/blob/main/docs/capability-levels.md" rel="noopener noreferrer"&gt;Capability levels docs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/reporails/rules" rel="noopener noreferrer"&gt;Rules repo&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




</description>
      <category>claudecode</category>
      <category>devtools</category>
      <category>ai</category>
      <category>programming</category>
    </item>
    <item>
      <title>CLAUDE.md: Check, Score, Improve &amp; Repeat</title>
      <dc:creator> Gábor Mészáros</dc:creator>
      <pubDate>Tue, 27 Jan 2026 08:54:55 +0000</pubDate>
      <link>https://forem.com/cleverhoods/claudemd-lint-score-improve-repeat-2om5</link>
      <guid>https://forem.com/cleverhoods/claudemd-lint-score-improve-repeat-2om5</guid>
      <description>&lt;p&gt;&lt;em&gt;&lt;strong&gt;The missing quality checker for AI instruction files.&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;You asked for a small refactor. A &lt;em&gt;&lt;strong&gt;small(!)&lt;/strong&gt;&lt;/em&gt; refactor.&lt;br&gt;
Claude Code rewrote &lt;strong&gt;&lt;em&gt;half&lt;/em&gt;&lt;/strong&gt; the module.&lt;/p&gt;

&lt;p&gt;"&lt;em&gt;You're right, I apologize.&lt;/em&gt;" "&lt;em&gt;Let me fix that.&lt;/em&gt;" "&lt;em&gt;Sorry, I misunderstood.&lt;/em&gt;" — on repeat.&lt;/p&gt;

&lt;p&gt;So you open the &lt;strong&gt;CLAUDE.md&lt;/strong&gt;. Then the &lt;strong&gt;rules&lt;/strong&gt;. Then the &lt;strong&gt;SKILLS&lt;/strong&gt;. Each is  400 lines at least. 24 files total. &lt;br&gt;
You cross-reference the official docs, skim three "best practices" blog posts, dig through GitHub examples. &lt;/p&gt;

&lt;p&gt;Hours of trial and error later, you do what any reasonable person would: you ask Claude to figure it out.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;───────────────────────────────────────────────────────────────────
❯ review my CLAUDE.md and rules. Tell me what is wrong.
───────────────────────────────────────────────────────────────────
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Burn ALL the tokens
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Claude Code obliges.&lt;/strong&gt; It reads all 24 files, their cross-referencing imports and all the additional relevant documentation. It neatly summarizes them. It suggests improvements, you accept them, it rewrites a few sections, adds here, removes there. &lt;/p&gt;

&lt;p&gt;It burns tokens like kindling. &lt;/p&gt;

&lt;p&gt;Your &lt;strong&gt;CLAUDE.md&lt;/strong&gt;, &lt;strong&gt;rules&lt;/strong&gt;, &lt;strong&gt;SKILLS&lt;/strong&gt; got just a bit longer, but you're fine with that — at least it won't happen again... right? This is fine. Right?&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fowvqb5elm4m42hdjw2v4.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fowvqb5elm4m42hdjw2v4.jpg" alt="This is fine" width="561" height="265"&gt;&lt;/a&gt;Everything is fine&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Forty minutes later, you have a slightly different mess and no idea if it's better. So you open &lt;em&gt;CLAUDE.md&lt;/em&gt; ...&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  &lt;strong&gt;Lint the vibes!&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Your code needs &lt;strong&gt;structure&lt;/strong&gt;, &lt;strong&gt;types&lt;/strong&gt;, &lt;strong&gt;format&lt;/strong&gt;. It has &lt;strong&gt;tests&lt;/strong&gt;, &lt;strong&gt;type checks&lt;/strong&gt; and &lt;strong&gt;linters&lt;/strong&gt;. Your AI instructions? &lt;strong&gt;&lt;em&gt;Vibes&lt;/em&gt;&lt;/strong&gt;.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Reporails helps with that.&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;
&lt;/blockquote&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;claude mcp add reporails &lt;span class="nt"&gt;--&lt;/span&gt; uvx &lt;span class="nt"&gt;--from&lt;/span&gt; reporails-cli ails-mcp
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Then ask:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;
&lt;/blockquote&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;───────────────────────────────────────────────────────────────────
❯ what ails claude?
───────────────────────────────────────────────────────────────────
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Runs deterministic checks and semantic validations. Produces actionable fixes Claude can apply.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqcugat31q1lj56jpqtdn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqcugat31q1lj56jpqtdn.png" alt="What ails claude?" width="800" height="665"&gt;&lt;/a&gt;It will tell you what can be done better&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;&lt;em&gt;Just run&lt;/em&gt;&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;
&lt;/blockquote&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;───────────────────────────────────────────────────────────────────
❯ fix the duplicate   
───────────────────────────────────────────────────────────────────
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;&lt;em&gt;... and observe:&lt;/em&gt;&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;● I&lt;span class="s1"&gt;'ll remove the duplicate from writing-rules.md since instruction-file-style.md already covers the "actionable content" requirement for .claude/rules/**.                  

● Update(.claude/rules/writing-rules.md)                                                                                                                                     
  ⎿  Removed 1 line                                                                                                                                                          
      26  - Use descriptive filenames (`api-validation.md` not `rules1.md`)                                                                                                  
      27  - Add `paths` frontmatter to reduce noise when not relevant
      28  - No paths = loads globally for all files
      29 -- Content MUST be actionable, not explanatory

● Done. Removed the duplicate "Content MUST be actionable, not explanatory" from writing-rules.md:29 — this is already covered by instruction-file-style.md:12 ("Every line
  MUST be actionable or informative") which applies to the same scope.
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  The result
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;No more vibes. No more gluing together &lt;em&gt;best practices&lt;/em&gt; blog posts.&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;
&lt;/blockquote&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;● Semantic Evaluation                                                                                                                                                        

  I already have context from the previous read. Evaluating the 9 candidates:

  M4 &lt;span class="o"&gt;(&lt;/span&gt;Contradictions&lt;span class="o"&gt;)&lt;/span&gt;: 2 candidates - all passed

  C6 &lt;span class="o"&gt;(&lt;/span&gt;Duplicates&lt;span class="o"&gt;)&lt;/span&gt;: 6 candidates - all passed &lt;span class="o"&gt;(&lt;/span&gt;the duplicate I fixed is no longer flagged&lt;span class="o"&gt;)&lt;/span&gt;

  G3 &lt;span class="o"&gt;(&lt;/span&gt;Security ownership&lt;span class="o"&gt;)&lt;/span&gt;: 1 candidate - passed &lt;span class="o"&gt;(&lt;/span&gt;organizational guidance, not a security rule&lt;span class="o"&gt;)&lt;/span&gt;

  &lt;span class="nt"&gt;---&lt;/span&gt;
  Final Score
  ┌────────────┬─────────────────────┐
  │   Metric   │        Value        │
  ├────────────┼─────────────────────┤
  │ Score      │ 8.3↑ / 10&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;complete&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; │
  ├────────────┼─────────────────────┤
  │ Capability │ Governed &lt;span class="o"&gt;(&lt;/span&gt;L5+&lt;span class="o"&gt;)&lt;/span&gt;      │
  ├────────────┼─────────────────────┤
  │ Violations │ 21 &lt;span class="o"&gt;(&lt;/span&gt;↓2 from 23&lt;span class="o"&gt;)&lt;/span&gt;     │
  └────────────┴─────────────────────┘
  The C6 duplicate between instruction-file-style.md and writing-rules.md is resolved.

✻ Cogitated &lt;span class="k"&gt;for &lt;/span&gt;32s

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Reporails v0.0.1
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;&lt;em&gt;What took hours now takes minutes.&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/reporails/rules" rel="noopener noreferrer"&gt;42 rules&lt;/a&gt;. &lt;a href="https://github.com/reporails/rules/tree/main/docs" rel="noopener noreferrer"&gt;Documented&lt;/a&gt;. &lt;a href="https://github.com/reporails/rules/blob/main/LICENSE" rel="noopener noreferrer"&gt;Open source&lt;/a&gt;. &lt;a href="https://github.com/reporails/rules/blob/main/CONTRIBUTING.md" rel="noopener noreferrer"&gt;Easy to extend.&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://pypi.org/project/reporails-cli/" rel="noopener noreferrer"&gt;PyPI: reporails-cli&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/reporails/cli" rel="noopener noreferrer"&gt;GitHub: CLI &amp;amp; MCP&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/reporails/rules" rel="noopener noreferrer"&gt;GitHub: Rules&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;CLAUDE.md is just the start. More agents coming soon.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

</description>
      <category>claudecode</category>
      <category>productivity</category>
      <category>opensource</category>
      <category>devtool</category>
    </item>
    <item>
      <title>From Prompt to Platform: Architecture Rules I Use</title>
      <dc:creator> Gábor Mészáros</dc:creator>
      <pubDate>Tue, 20 Jan 2026 07:36:43 +0000</pubDate>
      <link>https://forem.com/cleverhoods/from-prompt-to-platform-architecture-rules-i-use-59gp</link>
      <guid>https://forem.com/cleverhoods/from-prompt-to-platform-architecture-rules-i-use-59gp</guid>
      <description>&lt;p&gt;The "&lt;em&gt;build -&amp;gt; &lt;strong&gt;surprise&lt;/strong&gt; -&amp;gt; restructure -&amp;gt; repeat&lt;/em&gt;" loop is amazing early on. However, after a while it's like two clowns trying to out-prank each other: it gets funnier and funnier, lots of laughs... until one of them pulls out a flamethrower for one last prank and the laughter gets a little awkward.&lt;/p&gt;

&lt;p&gt;This type of iteration is fun until it isn't. &lt;strong&gt;So I went looking for guidance.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Experiences With LangGraph Tutorials
&lt;/h2&gt;

&lt;p&gt;Most examples show you how to build a graph. Define some nodes. Wire them together. Ship it. &lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Great for prototyping.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;They don't show you where to put things when you have:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;8 nodes&lt;/li&gt;
&lt;li&gt;3 agents&lt;/li&gt;
&lt;li&gt;5 tools&lt;/li&gt;
&lt;li&gt;Shared state across subgraphs&lt;/li&gt;
&lt;li&gt;Middleware for guardrails&lt;/li&gt;
&lt;li&gt;A platform layer that stays framework-independent&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I searched. Found bits and pieces, but no complete picture. So I built it.&lt;/p&gt;




&lt;h2&gt;
  
  
  A Folder Structure That Scales
&lt;/h2&gt;

&lt;p&gt;Here's what my LangGraph component looks like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;app/
├── agents/           # Agent factories (build_agent_*)
├── graphs/           # Graph definitions (main, subgraphs, phases)
├── nodes/            # Node factories (make_node_*)
├── states/           # Pydantic state models
├── tools/            # Tool definitions
├── middlewares/      # Cross-cutting concerns (guardrails, redaction)
└── platform/
    ├── core/         # Pure types, contracts, policies (no wiring)
    │   ├── contract/ # Validators: state, tools, prompts, phases
    │   ├── dto/      # Pure data transfer objects
    │   └── policy/   # Pure decision logic
    ├── adapters/     # Boundary translation (DTOs ↔ State)
    ├── runtime/      # Evidence hydration, state helpers
    ├── config/       # Environment, paths
    └── observability/# Logging
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Why this structure?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;It mirrors LangGraph's mental model: agents are agents; nodes are nodes; graphs are graphs. In the orchestration layer, things are &lt;strong&gt;easy to find&lt;/strong&gt; and responsibilities stay separated.&lt;/p&gt;

&lt;p&gt;But the real insight is the &lt;code&gt;platform/&lt;/code&gt; layer.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Platform Layer: Why It Exists
&lt;/h2&gt;

&lt;p&gt;While separating the LangGraph components was easy, separating the wiring was hard. The structure didn't appear on day one. It emerged after a number of iterations - each cycle surfaced a different missing architectural rule, whose absence made refactors rapidly more difficult with every new component. &lt;/p&gt;

&lt;p&gt;Without architectural rules, everything gets spaghettified:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# WITHOUT PLATFORM LAYER - Everything mixed together
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;problem_framing_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;SageState&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Command&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# Guardrail logic mixed with state management
&lt;/span&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;unsafe&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;gating&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;guardrail&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;GuardrailResult&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;is_safe&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;...)&lt;/span&gt;

    &lt;span class="c1"&gt;# Evidence hydration mixed with node orchestration  
&lt;/span&gt;    &lt;span class="n"&gt;store&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get_store&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;item&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;phase_entry&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;evidence&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;doc&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;store&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;namespace&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="c1"&gt;# ... inline hydration logic
&lt;/span&gt;
    &lt;span class="c1"&gt;# Validation mixed with execution
&lt;/span&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;problem_framing&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;phases&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;ValueError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Invalid state update!&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# ... good luck writing tests for it!
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;With the platform layer&lt;/strong&gt;, concerns are separated:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# WITH PLATFORM LAYER - Clean separation
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;problem_framing_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;SageState&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Command&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# Use platform contracts for validation
&lt;/span&gt;    &lt;span class="nf"&gt;validate_state_update&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;update&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;owner&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;problem_framing&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Use platform runtime helpers for evidence
&lt;/span&gt;    &lt;span class="n"&gt;bundle&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;collect_phase_evidence&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;phase&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;problem_framing&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Use platform policies for decisions
&lt;/span&gt;    &lt;span class="n"&gt;guardrail&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;evaluate_guardrails&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_input&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Use adapters for state translation
&lt;/span&gt;    &lt;span class="n"&gt;context&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;guardrail_to_gating&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;guardrail&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user_input&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Node only orchestrates - all logic in platform!
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The node becomes what it should be: &lt;strong&gt;orchestration only&lt;/strong&gt;. No domain logic. No direct store access. No inline validation.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Hexagonal Split
&lt;/h2&gt;

&lt;p&gt;The pattern that solved it: &lt;a href="https://alistair.cockburn.us/hexagonal-architecture" rel="noopener noreferrer"&gt;hexagonal architecture&lt;/a&gt;. Core stays pure - no framework dependencies, no imports from the layers above. Everything else can depend on Core, but Core depends on nothing. This makes the boundaries testable and the rules enforceable.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;┌─────────────────────────────────────────────────────────┐
│                    APPLICATION LAYER                    │
│  (app/nodes, app/graphs, app/agents, app/middlewares)   │
│  - LangGraph orchestration                              │
│  - Calls platform services via contracts                │
└───────────────────────────┬─────────────────────────────┘
                            │
                            ▼
┌─────────────────────────────────────────────────────────┐
│                    PLATFORM LAYER                       │
│  ┌───────────┐ ┌───────────┐ ┌─────────┐ ┌───────────┐  │
│  │  Adapters │ │  Runtime  │ │ Config  │ │Observabil.│  │
│  │DTO&amp;lt;-&amp;gt;State│ │  helpers  │ │env/paths│ │  logging  │  │
│  └─────┬─────┘ └─────┬─────┘ └────┬────┘ └─────┬─────┘  │
│        │             │            │            │        │
│        └─────────────┴──────┬─────┴────────────┘        │
│                             ▼                           │
│  ┌────────────────────────────────────────────────────┐ │
│  │  Core (PURE - no framework dependencies)           │ │
│  │  - Contracts and validators                        │ │
│  │  - Policy evaluation (pure functions)              │ │
│  │  - DTOs (frozen dataclasses)                       │ │
│  └────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;The rule&lt;/strong&gt;: &lt;code&gt;core/&lt;/code&gt; has NO imports from anything above it - no app orchestration (agents, nodes, graphs, etc.), no wiring, no adapters. Dependencies point inward only.&lt;/p&gt;

&lt;p&gt;This isn't just a guideline. It's enforced.&lt;/p&gt;




&lt;h3&gt;
  
  
  How to enforce a guideline?
&lt;/h3&gt;

&lt;p&gt;Simple: write a test for it that would catch the violation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# tests/unit/architecture/test_core_purity.py
&lt;/span&gt;
&lt;span class="n"&gt;FORBIDDEN_IMPORTS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;app.state&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;app.graphs&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;app.nodes&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;app.agents&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="c1"&gt;# ... all app orchestration and platform wiring
&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;test_core_has_no_forbidden_imports&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Core layer must remain pure - no wiring dependencies.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;core_files&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Path&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;app/platform/core&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;rglob&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;*.py&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="nb"&gt;file&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;core_files&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read_text&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;forbidden&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;FORBIDDEN_IMPORTS&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="n"&gt;forbidden&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; imports &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;forbidden&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; - core must stay pure&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you break the boundary, test fails. &lt;strong&gt;No exceptions.&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Beyond guidelines, you can also define &lt;a href="https://en.wikipedia.org/wiki/Design_by_contract" rel="noopener noreferrer"&gt;contracts&lt;/a&gt; that validate at runtime.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Contracts That Validate
&lt;/h2&gt;

&lt;p&gt;The &lt;code&gt;core/contract/&lt;/code&gt; directory contains validators that enforce contract rules at runtime:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Contract&lt;/th&gt;
&lt;th&gt;What it does&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;validate_state_update()&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Restricts mutations to authorized owners&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;validate_structured_response()&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Forces validation before persisting&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;validate_phase_registry()&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Ensures phase keys match declared schemas&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;validate_allowlist_contains_schema()&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Ensures tool allowlist correctness&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;These aren't optional - every node calls them:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Every state update goes through the contract
&lt;/span&gt;&lt;span class="n"&gt;update&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;phases&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;phase_key&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;phase_entry&lt;/span&gt;&lt;span class="p"&gt;}}&lt;/span&gt;
&lt;span class="nf"&gt;validate_state_update&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;update&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;owner&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;problem_framing&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;Command&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;update&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;update&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;goto&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;next_node&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The contracts themselves are also tested - validation logic, phase dependencies, invalidation cascades. See &lt;a href="https://github.com/cleverhoods/sagecompass/blob/main/langgraph/tests/unit/platform/core/contract/test_state.py" rel="noopener noreferrer"&gt;test_state.py&lt;/a&gt; for the full suite.&lt;/p&gt;




&lt;h2&gt;
  
  
  Test Structure That Scales
&lt;/h2&gt;

&lt;p&gt;Tests are organized by type (unit, integration, e2e) and category (architecture, orchestration, platform). This makes coverage gaps obvious and lets you run targeted subsets.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;tests/
├── unit/
│   ├── architecture/      # Boundary enforcement
│   │   ├── test_core_purity.py
│   │   ├── test_adapter_boundary.py
│   │   └── test_import_time_construction.py
│   ├── orchestration/     # Agents, nodes, graphs
│   └── platform/          # Core + adapters
├── integration/
│   ├── orchestration/
│   └── platform/
└── e2e/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With pytest markers:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# pyproject.toml
# Test markers for categorizing tests by purpose and scope
&lt;/span&gt;&lt;span class="n"&gt;markers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
  &lt;span class="c1"&gt;# Test Type Markers (by scope)
&lt;/span&gt;  &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;unit: Fast, isolated tests with no external dependencies&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;integration: Tests crossing component boundaries (may use test fixtures)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;e2e: End-to-end workflow tests (full pipeline validation)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

  &lt;span class="c1"&gt;# Test Category Markers (organizational categories)
&lt;/span&gt;  &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;architecture: Hexagonal architecture enforcement (import rules, layer boundaries)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;orchestration: LangGraph orchestration components (agents, nodes, graphs, middlewares, tools)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;platform: Platform layer tests (hexagonal architecture - core, adapters, runtime)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Run unit architecture tests alone: &lt;code&gt;uv run pytest -m "unit and architecture"&lt;/code&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;The architecture is validated by 110 tests - 11 of which specifically enforce architecture boundaries.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  What This Enables
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Here's where it gets interesting.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;You might be thinking: &lt;em&gt;&lt;em&gt;cool story, but...&lt;/em&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr582q6kjk9w0cqylz04m.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr582q6kjk9w0cqylz04m.gif" alt="...but why?" width="480" height="258"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Because when your architecture is predictable and enforceable, something curious happens: &lt;strong&gt;coding agents stop being a liability and start being useful.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;When every node follows the same pattern...&lt;br&gt;
When every state update goes through a validator...&lt;br&gt;
When every boundary is well-defined and tested...&lt;/p&gt;

&lt;p&gt;...an AI agent can't accidentally break your architecture without the tests catching it. It can't import forbidden modules. It can't skip validation. It can't bypass the contracts - not without failing the test suite.&lt;/p&gt;

&lt;p&gt;The rules become more than just documentation. They're guardrails for both humans and AI.&lt;/p&gt;




&lt;h2&gt;
  
  
  Want the Full Thing?
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;46 architecture principles (tiered):&lt;/strong&gt; &lt;a href="https://github.com/cleverhoods/sagecompass/blob/main/docs/langgraph-python-architecture-principles.md" rel="noopener noreferrer"&gt;Architecture principles&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Platform contracts README:&lt;/strong&gt; &lt;a href="https://github.com/cleverhoods/sagecompass/blob/main/langgraph/app/platform/core/contract/README.md" rel="noopener noreferrer"&gt;Platform contracts&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Architecture tests:&lt;/strong&gt; &lt;a href="https://github.com/cleverhoods/sagecompass/tree/main/langgraph/tests/unit/architecture" rel="noopener noreferrer"&gt;Architecture tests&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Next up
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;What happens when you point Claude Code at an architecture it can't break.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The CLAUDE.md file isn't just a conglomerate of instructions - it's a contract that preserves context and enforces boundaries during development.&lt;/p&gt;

&lt;p&gt;I built a framework for it with measurable results.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Coming next: The CLAUDE.md Maturity Model.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;This is part of my "From Prompt to Platform" series documenting the SageCompass build. &lt;a href="https://dev.to/cleverhoods/from-zero-to-agentic-platform-building-the-sagecompass-origin-story-series-prologue-2g3i"&gt;Start from the prologue&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>langgraph</category>
      <category>langchain</category>
      <category>python</category>
      <category>architecture</category>
    </item>
    <item>
      <title>From Zero to Agentic Platform Building: The SageCompass Origin Story (series prologue)</title>
      <dc:creator> Gábor Mészáros</dc:creator>
      <pubDate>Wed, 14 Jan 2026 10:40:01 +0000</pubDate>
      <link>https://forem.com/cleverhoods/from-zero-to-agentic-platform-building-the-sagecompass-origin-story-series-prologue-2g3i</link>
      <guid>https://forem.com/cleverhoods/from-zero-to-agentic-platform-building-the-sagecompass-origin-story-series-prologue-2g3i</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;The story of how a small prompt idea snowballed into a real agentic &lt;a href="https://www.langchain.com/langgraph" rel="noopener noreferrer"&gt;&lt;strong&gt;LangGraph&lt;/strong&gt;&lt;/a&gt; runtime and I didn’t even speak &lt;strong&gt;Python&lt;/strong&gt; three months ago.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;em&gt;This is the first post in the series, the "why/how did you even end up building this" part. The next posts will be technical deep dives into specific topics (project architecture, RAG with Drupal, contracts/guardrails, ambiguity detection, etc).&lt;/em&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  TL;DR
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;I attended &lt;a href="https://aws.amazon.com/certification/certified-machine-learning-engineer-associate/" rel="noopener noreferrer"&gt;AW-MLEA (Machine Learning Engineering on AWS)&lt;/a&gt; which ignited my professional enthusiasm.&lt;/li&gt;
&lt;li&gt;I needed a real project to learn. Luckily, the course gave an easy-looking idea: an "&lt;em&gt;&lt;strong&gt;ML success criteria framework&lt;/strong&gt;&lt;/em&gt;". I named it  &lt;strong&gt;SageCompass&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;What I thought would be "a small practice thing" quickly turned into something that taught me how to think with these technologies.&lt;/li&gt;
&lt;li&gt;After a few months I found myself rebuilding parts of the runtime again and again, mostly because the code kept exposing what was unclear or brittle.&lt;/li&gt;
&lt;li&gt;This series is me documenting the path and the architecture lessons as they formed (no prior training or experience in Python, ML, or agentic systems so things weren't obvious from day one).&lt;/li&gt;
&lt;li&gt;Current implementation is here: &lt;a href="https://github.com/cleverhoods/sagecompass" rel="noopener noreferrer"&gt;https://github.com/cleverhoods/sagecompass&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;h4&gt;
  
  
  It was late October, 2025.
&lt;/h4&gt;

&lt;p&gt;I was waiting excitedly for my long, 4-day weekend after a long, 6-day working week (... yeah, in Hungary you have to "&lt;em&gt;work off&lt;/em&gt;" the extra holiday). I was so looking forward to it, that I completely forgot that I'd signed up for &lt;a href="https://aws.amazon.com/certification/certified-machine-learning-engineer-associate/" rel="noopener noreferrer"&gt;AW-MLEA (Machine Learning Engineering on AWS)&lt;/a&gt; training. The training started on Wednesday. My long weekend started on Thursday. The training was 3 days long.&lt;/p&gt;

&lt;p&gt;... great.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Little did I know that my previous weekend was my last free one for the forthcoming months.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The training kicked off, and just a few hours in, &lt;strong&gt;I was hooked&lt;/strong&gt;. &lt;/p&gt;

&lt;p&gt;Finally! High-grade, educational information about Machine Learning! How it learns, what's the difference between ML training approaches, what challenges are there for ML solutions, what is a Model, what type of Models are there and what are the main differences between them, what kind of dimensions responsible AI encompasses, what kind of biases are there, how does an ML success criteria framework look like and so on and on.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;... and we haven't even arrived at lunch and already had 2 coffee breaks.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This went on for 2 (and a half) more days. With similar intensity and information density. It was only manageable thanks to the great learning material and the great guidance (a shout-out to &lt;a href="https://www.linkedin.com/in/thomasfruin/" rel="noopener noreferrer"&gt;Thomas Fruin&lt;/a&gt;, the exceptional presenter of this course).&lt;/p&gt;

&lt;p&gt;... and just like that, it was over. The course was completed and I was left with a burning enthusiasm to utilize the freshly learned concepts.&lt;/p&gt;

&lt;p&gt;I could've played around with sandbox environments (very limited instances of the &lt;a href="https://aws.amazon.com/sagemaker/ai/studio/" rel="noopener noreferrer"&gt;SageMaker Studio&lt;/a&gt; with &lt;a href="https://jupyter.org/" rel="noopener noreferrer"&gt;Jupyter notebooks&lt;/a&gt;) but that wasn't what I was looking for. &lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;I'm adamant about learning new concepts: the easiest way to learn is by doing. Implement a real project. Don't just poke at sandboxes.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;All I needed was an example project, and I already had my eyes on something I was introduced to on the very first day: an &lt;em&gt;ML success criteria framework&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foygg1y8pzlzxz34ronuc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foygg1y8pzlzxz34ronuc.png" alt="an ML success criteria framework" width="800" height="320"&gt;&lt;/a&gt;an ML success criteria framework&lt;/p&gt;

&lt;p&gt;It got the name &lt;em&gt;SageCompass&lt;/em&gt; because it's supposed to help me determine whether the input is actually an ML problem.&lt;/p&gt;

&lt;h2&gt;
  
  
  How SageCompass evolved
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;The original idea was simple: take the ML success criteria framework, turn it into a guided intake flow, and provide structured, detailed answer to the core question:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;"Is this even an ML problem?"&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;And if it is, make sure the answer includes &lt;strong&gt;measurable business values&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Business Goals&lt;/li&gt;
&lt;li&gt;KPIs&lt;/li&gt;
&lt;li&gt;Data baselines and readiness&lt;/li&gt;
&lt;li&gt;Risk assessment&lt;/li&gt;
&lt;li&gt;Minimal pilot implementation plan with a kill criteria if things go in the wrong direction.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;At the time I thought: "I'll build a small example project, practice some concepts on the business values and move on."&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhoboy9f5o63v4yus4mw5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhoboy9f5o63v4yus4mw5.png" alt="we do this not because it is easy, but because we thought it would be easy" width="800" height="573"&gt;&lt;/a&gt;Yep.&lt;/p&gt;

&lt;p&gt;Instead it turned into a loop: &lt;strong&gt;build -&amp;gt; surprise -&amp;gt; restructure -&amp;gt; repeat&lt;/strong&gt;. Over and over again.&lt;/p&gt;




&lt;h3&gt;
  
  
  v1-3: The Prompt era
&lt;/h3&gt;

&lt;p&gt;Originally I did not plan to build an agentic platform. Not even a system. Just one well structured prompt, with deterministic rules, and a structured output.&lt;/p&gt;

&lt;p&gt;The original prompt quickly turned to several specific prompts as I've kept hitting ChatGPT's upper prompt length limit.&lt;/p&gt;

&lt;p&gt;Behavioral expectations, process and task rules, policies, limitations, etc., all got into dedicated markdown files.&lt;/p&gt;

&lt;p&gt;A single prompt turned into a small prompt system.&lt;/p&gt;

&lt;p&gt;Eventually I ended up with 9 different prompt segment files + the main instructions:&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpzoz4k90hxllia2aqe5r.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpzoz4k90hxllia2aqe5r.png" alt="Instruction library of SageCompass v3.3" width="800" height="368"&gt;&lt;/a&gt;Instruction library of SageCompass v3.3&lt;/p&gt;

&lt;p&gt;At this stage the project was only "&lt;strong&gt;&lt;em&gt;prompt engineering with ambition&lt;/em&gt;&lt;/strong&gt;" with &lt;strong&gt;&lt;em&gt;wow&lt;/em&gt;&lt;/strong&gt; factor. Whilst the prompts and the outputs reflected what I had just learned about models and how they process information, maintaining and extending them was a nightmare. I had to be on guard all the time when I changed something and had to be sure that change is followed up everywhere. The solution was rough and definitely not production-useful.&lt;/p&gt;

&lt;p&gt;I needed some help validating the direction, so I asked our AI Director (&lt;a href="https://www.linkedin.com/in/svdenboer/" rel="noopener noreferrer"&gt;Sebastiaan den Boer&lt;/a&gt;) for a sanity check, and his first reaction was: "&lt;em&gt;This sounds like a &lt;strong&gt;LangChain&lt;/strong&gt; project.&lt;/em&gt;"&lt;/p&gt;

&lt;p&gt;My first reaction to that was: "&lt;em&gt;Sounds like a what now?&lt;/em&gt;"&lt;/p&gt;

&lt;p&gt;So I set out to learn what &lt;a href="https://docs.langchain.com/oss/python/langchain/overview" rel="noopener noreferrer"&gt;LangChain&lt;/a&gt; is and how to implement my project with it.&lt;/p&gt;




&lt;h3&gt;
  
  
  v4: My first ever Python project
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://docs.langchain.com/oss/python/langchain/overview" rel="noopener noreferrer"&gt;LangChain&lt;/a&gt; offers 2 types of implementation: &lt;strong&gt;JavaScript&lt;/strong&gt; and &lt;strong&gt;Python&lt;/strong&gt;. I always wanted to learn Python and during the training we heavily relied on &lt;a href="https://jupyter.org/" rel="noopener noreferrer"&gt;Jupyter notebooks&lt;/a&gt; so I felt that NOW is the right time to get this off my bucket list.&lt;/p&gt;

&lt;p&gt;For this phase I set one simple goal: a running Python environment with at least one Agent.&lt;/p&gt;

&lt;p&gt;My expectations for the project were the same as with any other projects:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;it should be encapsulated&lt;/li&gt;
&lt;li&gt;it should provide its own runtime
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In the first iterations, I tried to implement the Python local development environment with &lt;a href="https://ddev.com/" rel="noopener noreferrer"&gt;DDEV&lt;/a&gt; (which is a great, project-level Docker environment manager/orchestrator, using it for years now for Drupal projects).&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;When you have a hammer, everything is a nail.&lt;/strong&gt; &lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;It worked well enough to create a super basic LangChain implementation, that would read and validate LLM provider (OpenAI and Perplexity) configurations, log what's happening in the system, read my humongous prompt and communicate with &lt;a href="https://www.gradio.app/" rel="noopener noreferrer"&gt;Gradio UI&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Wow! That was easy!&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I was able to quickly create a base Python class for all the future Agents and to create the very first agent: Problem Framing. It was the first phase in the &lt;em&gt;SageCompass Process Model&lt;/em&gt; (v3), from my &lt;code&gt;reasoning-flow.md&lt;/code&gt;. Its singular job was to reframe the user input into a rich business format so I could set expectations for the input data before I ran the whole shebang.&lt;/p&gt;

&lt;p&gt;All dandy. I hit the phase goals, so I gave myself permission to ignore the mysteriously slow response time in the UI when I ran a query.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;At least until I add more agents and this thing starts fighting back.&lt;/em&gt;&lt;/p&gt;




&lt;h3&gt;
  
  
  v5: Journey to a multi-agent workflow
&lt;/h3&gt;

&lt;p&gt;At the end of v4 I had exactly what I wanted: a working Python runtime and my first Agent. &lt;/p&gt;

&lt;p&gt;I also had something else: a UI that was oddly slow.&lt;/p&gt;

&lt;p&gt;So I did the obvious thing: ignored the problem and started drafting more Agents anyway.&lt;/p&gt;

&lt;p&gt;I took the rest of the &lt;em&gt;SageCompass Process Model&lt;/em&gt; (v3), from my &lt;code&gt;reasoning-flow.md&lt;/code&gt; and split it into Agents.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Agent&lt;/th&gt;
&lt;th&gt;Responsibility&lt;/th&gt;
&lt;th&gt;Output&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Problem Framing&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Reframe input into a structured brief&lt;/td&gt;
&lt;td&gt;&lt;code&gt;brief&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Business Goal&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Produce SMART business goal(s)&lt;/td&gt;
&lt;td&gt;&lt;code&gt;goals[]&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;KPI&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Define KPI set with thresholds&lt;/td&gt;
&lt;td&gt;&lt;code&gt;kpis[]&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Eligibility&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Decide AI/ML eligibility (yes/no + why)&lt;/td&gt;
&lt;td&gt;&lt;code&gt;eligibility{decision, reasons}&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Solution Design&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Propose candidate approaches (high level)&lt;/td&gt;
&lt;td&gt;&lt;code&gt;approaches[]&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Cost Estimation&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Estimate cost/effort envelope&lt;/td&gt;
&lt;td&gt;&lt;code&gt;cost_envelope&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Decision&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Issue summary with final recommendation&lt;/td&gt;
&lt;td&gt;&lt;code&gt;recommendation&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;My approach was simple:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;every Agent should be responsible for one thing&lt;/li&gt;
&lt;li&gt;every Agent should have the same folder structure &lt;/li&gt;
&lt;li&gt;every Agent should be built similarly&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This was enabled by the base Python class implementation from v4. Each Agent expected to extend that class.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Learned some Python, yay&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Put it all together and this is what the workflow looked like.&lt;/strong&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsi4e1p7qvyez7xhpz0qp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsi4e1p7qvyez7xhpz0qp.png" alt="workflow pipeline with eligibility gate and Non-AI branch" width="800" height="1127"&gt;&lt;/a&gt;workflow pipeline with eligibility gate and Non-AI branch&lt;/p&gt;

&lt;p&gt;Looks neat, isn't it? Let's run it! &lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;drum rolls&lt;br&gt;
more drum rolls&lt;br&gt;
even more drum rolls&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;... wow. That took &lt;em&gt;7 minutes 32 seconds&lt;/em&gt; to run. Feature-first tunnel vision sure bites hard.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;So I stopped pretending this was fine and started cleaning house. First move: I switched the project's packaging and workflow to &lt;a href="https://docs.astral.sh/uv/" rel="noopener noreferrer"&gt;uv&lt;/a&gt; and ditched the earlier setup. Then I found the real culprit: I was running another virtual Python environment inside Docker.&lt;/p&gt;

&lt;p&gt;After fixing that, runtime dropped to ~1 minute per run.&lt;/p&gt;

&lt;p&gt;Lesson learned: &lt;em&gt;don't stack Python environments.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;And while building the workflow, a sudden realization hit me:&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;If I'm building an augmented decision system that checks whether an idea needs ML at all... wouldn't that naturally translate to whether an idea needs AI at all?&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;If that's true, then the original "ML success criteria" flow doesn't disappear.&lt;/p&gt;

&lt;p&gt;It becomes a reusable component I would have to build anyway.&lt;/p&gt;

&lt;p&gt;The difference is: I'd end up with something way more useful.&lt;/p&gt;

&lt;p&gt;So I took the longer road.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;And that's where the prompt stopped being "the thing" and everything around it started becoming the thing.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h3&gt;
  
  
  v6: In for a penny, in for a pound - creating an agentic platform
&lt;/h3&gt;

&lt;p&gt;I liked the idea.&lt;/p&gt;

&lt;p&gt;How would it roughly look like? Gonna need RAG for sure.&lt;br&gt;
&lt;a href="https://dev-to-uploads.s3.amazonaws.com/uploads/articles/nam74ml2lchyg4hkjq0b.png" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnam74ml2lchyg4hkjq0b.png" alt="supervisor graph orchestrating subgraphs + RAG node + outputs/reporting boundaries" width="800" height="603"&gt;&lt;/a&gt;supervisor graph orchestrating subgraphs + RAG node + outputs/reporting boundaries&lt;/p&gt;

&lt;p&gt;It also would be nice if data would come from a curated source, like a Drupal CMS for example.&lt;/p&gt;

&lt;p&gt;And that's where SageCompass started forcing architectural decisions. &lt;/p&gt;

&lt;p&gt;Drupal needed its own place. LangGraph needed its own place. And if I wanted to do this properly, the UI had to be separated from LangGraph (Gradio UI was still living under the same Python project).&lt;/p&gt;

&lt;p&gt;So I reorganized the repo. &lt;/p&gt;

&lt;p&gt;And that was when a core issue was immediately surfaced:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;There were no rules.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;For example: there was no project-wide rule for referencing file locations, because in the early prototype I could take shortcuts and everything still worked.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;&lt;em&gt;... Shortcuts are always the longer road in the long run.&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;And it wasn't just paths.&lt;/p&gt;

&lt;p&gt;There were no solid architectural rules for Agents, Nodes, middleware, schemas.&lt;br&gt;&lt;br&gt;
No rules for state management. Prompt management. Tool management.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;It was all still ... duct tape and hope.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;It gave me back that late &lt;strong&gt;v3&lt;/strong&gt; feeling, where every change was a nightmare. Touch one thing, five other things break, and suddenly I'm debugging the project instead of building it.&lt;/p&gt;

&lt;p&gt;So I decided to slow down and get into control.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The goal of &lt;strong&gt;v6&lt;/strong&gt; became crystal clear:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;make the existing implementation smaller&lt;/li&gt;
&lt;li&gt;lock down the architecture so I can add LangGraph components without surprises&lt;/li&gt;
&lt;li&gt;add quality assurances so the system behaves deterministically, even though it's an LLM-based augmented decision system&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;

&lt;p&gt;SageCompass v6 is currently here. It only has one of the v5 Agents (&lt;strong&gt;Problem Framing&lt;/strong&gt;), but it has the stuff that makes a platform a platform:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Rule/Contract&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Enforced architectural contracts&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Platform contract map + tests that fail if you break it&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Typed models and state&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;a href="https://docs.pydantic.dev/" rel="noopener noreferrer"&gt;Pydantic&lt;/a&gt; everywhere, so the runtime has something solid to hold onto&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Reusable subgraphs and nodes&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Clear phase boundaries instead of “giant pipeline code”&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Guardrails and deterministic tool policies&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Allowlists, bounded behavior, no surprises&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Ambiguity detection + bounded clarification loops&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Scoped, retry-limited, no infinite “can you clarify?” spirals&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Namespaced artifacts and context&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Retrieval doesn’t just bloat the system prompt&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Bigger test surface&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Contract tests + integration coverage&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Long-term memory&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Context can persist without turning prompts into novels&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;v6&lt;/strong&gt; is where platform behavior became enforceable: if you violate a boundary, tests fail; if a tool is not allowlisted, it cannot run; if ambiguity persists, clarification retries are capped.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgmg1qo7jy36o5vmpd922.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgmg1qo7jy36o5vmpd922.jpeg" alt="Iterative design, condensed into one picture" width="800" height="502"&gt;&lt;/a&gt;Same idea. Less wiring. Fewer surprises.&lt;/p&gt;




&lt;h2&gt;
  
  
  What exists in the repository today
&lt;/h2&gt;

&lt;p&gt;SageCompass repo: &lt;a href="https://github.com/cleverhoods/sagecompass" rel="noopener noreferrer"&gt;https://github.com/cleverhoods/sagecompass&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Right now it contains:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;ambiguity detection subgraph&lt;/li&gt;
&lt;li&gt;problem framing subgraph&lt;/li&gt;
&lt;li&gt;main supervisor graph&lt;/li&gt;
&lt;li&gt;a RAG writer graph that takes information from Drupal&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Other pieces exist as roadmap items, but they are yet to be implemented. &lt;strong&gt;&lt;em&gt;I'd rather be explicit about that.&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  What to expect in this series
&lt;/h2&gt;

&lt;p&gt;Each next post will take one architectural idea and tell its story from problem → messy attempts → final approach → lessons.&lt;/p&gt;

&lt;p&gt;The next four posts are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Project architecture&lt;/strong&gt; — repo shape, boundaries, and why "where things live" becomes a runtime feature&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;RAG with Drupal&lt;/strong&gt; — curated context without turning prompts into novels&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Contracts &amp;amp; guardrails&lt;/strong&gt; — making safety and behavior testable instead of wishful&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Ambiguity detection&lt;/strong&gt; — when "unclear input" becomes an orchestration problem, not a UX problem&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Final words
&lt;/h2&gt;

&lt;p&gt;If you're building agentic systems that need to behave reliably under ambiguity, or systems that integrate with real content sources like a CMS, follow along - next post is &lt;strong&gt;Project architecture (repo boundaries as a runtime feature)&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Special thanks to &lt;a href="https://www.linkedin.com/in/liudmyla-ravliuk-359198a2/" rel="noopener noreferrer"&gt;Liudmyla Ravliuk&lt;/a&gt; and &lt;a href="https://www.linkedin.com/in/erik-poolman-0b479a99/" rel="noopener noreferrer"&gt;Erik Poolman&lt;/a&gt; for proofreading and mental sparring.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;If you have fun with similar problems, find me on &lt;a href="https://www.linkedin.com/in/cleverhoods/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;I'm always happy to compare notes.&lt;/p&gt;




&lt;h3&gt;
  
  
  Sources / credits
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;ul&gt;
&lt;li&gt;Header comic: &lt;a href="https://gunshowcomic.com/513" rel="noopener noreferrer"&gt;https://gunshowcomic.com/513&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;SpaceX Raptor image: &lt;a href="https://x.com/SpaceX/status/1819772716339339664/photo/1" rel="noopener noreferrer"&gt;https://x.com/SpaceX/status/1819772716339339664/photo/1&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>langgraph</category>
      <category>softwareengineering</category>
    </item>
  </channel>
</rss>
