<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: ShipWithAI</title>
    <description>The latest articles on Forem by ShipWithAI (@shipwithaiio).</description>
    <link>https://forem.com/shipwithaiio</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3878878%2Fd66b5c8e-e12a-4e3c-bf3b-b04ed48b4def.png</url>
      <title>Forem: ShipWithAI</title>
      <link>https://forem.com/shipwithaiio</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/shipwithaiio"/>
    <language>en</language>
    <item>
      <title>Beyond CLAUDE.md: 5 Layers Your AI Agent Harness Is Missing</title>
      <dc:creator>ShipWithAI</dc:creator>
      <pubDate>Wed, 22 Apr 2026 01:00:00 +0000</pubDate>
      <link>https://forem.com/shipwithaiio/beyond-claudemd-5-layers-your-ai-agent-harness-is-missing-475h</link>
      <guid>https://forem.com/shipwithaiio/beyond-claudemd-5-layers-your-ai-agent-harness-is-missing-475h</guid>
      <description>&lt;p&gt;Most developers stop at CLAUDE.md. That's layer 1. A production Claude Code harness needs 5 layers: memory, tools, permissions, hooks, and observability. Here's the full setup guide.&lt;/p&gt;

&lt;p&gt;Claude Code harness has 5 layers:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Memory&lt;/strong&gt; — CLAUDE.md, MEMORY.md, .claude/commands/&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tools&lt;/strong&gt; — MCP servers (sweet spot: 2–3)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Permissions&lt;/strong&gt; — settings.json allow/deny lists&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hooks&lt;/strong&gt; — PreToolUse/PostToolUse verification&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Observability&lt;/strong&gt; — Decision logging, cost tracking, anomaly detection&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Most developers only have layer 1. &lt;strong&gt;Setup order: 1→4→2→3→5&lt;/strong&gt; (guardrails before capabilities).&lt;/p&gt;

&lt;p&gt;Why? Because LangChain gained +13.7 benchmark points from harness changes alone — jumping from 52.8% to 66.5% on the same model.&lt;/p&gt;




&lt;h2&gt;
  
  
  Layer 1: Memory (The Foundation)
&lt;/h2&gt;

&lt;p&gt;Your CLAUDE.md is the project rules file. Claude reads it every prompt and follows it consistently.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What goes in memory:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;CLAUDE.md&lt;/strong&gt; — 40–60 lines max. Project context, conventions, constraints.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MEMORY.md&lt;/strong&gt; — Long-term learning. "We discovered X fails without Y."&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;.claude/commands/&lt;/strong&gt; — Reusable prompt templates as commands.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The ETH Zurich finding:&lt;/strong&gt; CLAUDE.md alone caps improvement at ~4%. It's necessary but not sufficient.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The HumanLayer benchmark:&lt;/strong&gt; Teams keeping CLAUDE.md under 60 lines saw better compliance than those writing 200-line manifestos. Shorter = clearer.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# Example CLAUDE.md structure&lt;/span&gt;

&lt;span class="gu"&gt;## Project Identity&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Framework: Next.js 15 + TypeScript
&lt;span class="p"&gt;-&lt;/span&gt; Package manager: pnpm
&lt;span class="p"&gt;-&lt;/span&gt; Architecture: API routes + React components

&lt;span class="gu"&gt;## You Are&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; A full-stack developer shipping features
&lt;span class="p"&gt;-&lt;/span&gt; Opinionated about patterns: prefer hooks &amp;gt; HOCs
&lt;span class="p"&gt;-&lt;/span&gt; Balancing speed with maintainability

&lt;span class="gu"&gt;## Rules&lt;/span&gt;
&lt;span class="p"&gt;1.&lt;/span&gt; Always include tests when modifying /lib
&lt;span class="p"&gt;2.&lt;/span&gt; Use conventional commits for all commits
&lt;span class="p"&gt;3.&lt;/span&gt; If suggesting breaking changes, warn first
&lt;span class="p"&gt;4.&lt;/span&gt; Database migrations need rollback logic

&lt;span class="gu"&gt;## Code Conventions&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Folder structure: /pages, /components, /lib, /styles
&lt;span class="p"&gt;-&lt;/span&gt; Component naming: PascalCase for React files
&lt;span class="p"&gt;-&lt;/span&gt; API routes: camelCase for endpoint handlers

&lt;span class="gu"&gt;## What NOT to do&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Don't refactor without atomic commits
&lt;span class="p"&gt;-&lt;/span&gt; Don't add dependencies without checking bundle impact
&lt;span class="p"&gt;-&lt;/span&gt; Don't commit .env files
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Layer 2: Tools (Adding Capability)
&lt;/h2&gt;

&lt;p&gt;Tools are MCP servers. Claude uses them to read files, run commands, query databases.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The HumanLayer finding:&lt;/strong&gt; Too many tools cause agent confusion. Each tool is context overhead. Sweet spot: &lt;strong&gt;2–3 MCP servers per project&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Not 20. Not "all available servers."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Which 2–3 tools?&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Filesystem tool&lt;/strong&gt; — read/write/execute (almost always)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;One domain-specific tool&lt;/strong&gt; — database, API, CLI&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Optional: Observability tool&lt;/strong&gt; — logs, metrics&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Example for a Next.js project:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Filesystem (built-in)&lt;/li&gt;
&lt;li&gt;PostgreSQL client (query → fix migrations)&lt;/li&gt;
&lt;li&gt;GitHub API (check PR status → adjust approach)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;More tools = more tokens + more decision fatigue for Claude.&lt;/p&gt;




&lt;h2&gt;
  
  
  Layer 3: Permissions (The Guardrails)
&lt;/h2&gt;

&lt;p&gt;Permissions live in &lt;code&gt;settings.json&lt;/code&gt;. Specify exactly what Claude is allowed to do.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Allowlist over denylist.&lt;/strong&gt; It's safer to say "Claude can only modify these files" than "Claude cannot do X."&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"permissions"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"filesystem"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"allow"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"/src/**"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"/public/**"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"*.config.js"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;".env.local"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"deny"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"/node_modules/**"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"/.git/**"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"/build/**"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;".env"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"execution"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"allow"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"npm run test"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"npm run build"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"deny"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"rm -rf"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"sudo *"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Why this matters:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Claude won't accidentally delete node_modules (been there)&lt;/li&gt;
&lt;li&gt;Can't run destructive commands without review&lt;/li&gt;
&lt;li&gt;Enforced at runtime, not a suggestion&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Check settings.json into git.&lt;/strong&gt; This becomes part of your project's DNA.&lt;/p&gt;




&lt;h2&gt;
  
  
  Layer 4: Hooks (Deterministic Enforcement)
&lt;/h2&gt;

&lt;p&gt;Hooks are the most powerful layer. They run &lt;em&gt;before&lt;/em&gt; and &lt;em&gt;after&lt;/em&gt; Claude uses tools.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;PreToolUse hook:&lt;/strong&gt; Intercept tool calls, validate them, reject bad ones.&lt;br&gt;
&lt;strong&gt;PostToolUse hook:&lt;/strong&gt; Inspect results, catch anomalies, trigger alerts.&lt;/p&gt;

&lt;p&gt;Boris Cherny, Anthropic, calls verification "the most important thing" for quality. Hooks are that verification.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;#!/bin/bash&lt;/span&gt;
&lt;span class="c"&gt;# Runs before every tool use&lt;/span&gt;

&lt;span class="nv"&gt;TOOL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;$1&lt;/span&gt;
&lt;span class="nv"&gt;PARAMS&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;$2&lt;/span&gt;

&lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="nv"&gt;$TOOL&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt;
  &lt;span class="s2"&gt;"filesystem_write"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$PARAMS&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; | &lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="nt"&gt;-E&lt;/span&gt; &lt;span class="s2"&gt;"(node_modules|&lt;/span&gt;&lt;span class="se"&gt;\.&lt;/span&gt;&lt;span class="s2"&gt;git|&lt;/span&gt;&lt;span class="se"&gt;\.&lt;/span&gt;&lt;span class="s2"&gt;env)"&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; /dev/null&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then
      &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"REJECTED: Protected path"&lt;/span&gt;
      &lt;span class="nb"&gt;exit &lt;/span&gt;1
    &lt;span class="k"&gt;fi&lt;/span&gt;
    &lt;span class="p"&gt;;;&lt;/span&gt;
  &lt;span class="s2"&gt;"command_execute"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$PARAMS&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; | &lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="nt"&gt;-E&lt;/span&gt; &lt;span class="s2"&gt;"(rm -rf|:(){ :|:)"&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; /dev/null&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then
      &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"REJECTED: Dangerous command"&lt;/span&gt;
      &lt;span class="nb"&gt;exit &lt;/span&gt;1
    &lt;span class="k"&gt;fi&lt;/span&gt;
    &lt;span class="p"&gt;;;&lt;/span&gt;
&lt;span class="k"&gt;esac&lt;/span&gt;

&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"APPROVED"&lt;/span&gt;
&lt;span class="nb"&gt;exit &lt;/span&gt;0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;#!/bin/bash&lt;/span&gt;
&lt;span class="c"&gt;# Runs after every tool use&lt;/span&gt;

&lt;span class="nv"&gt;TOOL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;$1&lt;/span&gt;
&lt;span class="nv"&gt;RESULT&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;$2&lt;/span&gt;
&lt;span class="nv"&gt;DURATION&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;$3&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;((&lt;/span&gt; DURATION &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; 30 &lt;span class="o"&gt;))&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then
  &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"⚠️  Slow tool: &lt;/span&gt;&lt;span class="nv"&gt;$TOOL&lt;/span&gt;&lt;span class="s2"&gt; took &lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;DURATION&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;s"&lt;/span&gt;
&lt;span class="k"&gt;fi

if &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$RESULT&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; | &lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="nt"&gt;-i&lt;/span&gt; &lt;span class="s2"&gt;"error&lt;/span&gt;&lt;span class="se"&gt;\|&lt;/span&gt;&lt;span class="s2"&gt;failed&lt;/span&gt;&lt;span class="se"&gt;\|&lt;/span&gt;&lt;span class="s2"&gt;undefined"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then
  &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"🔴 Tool failed: &lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="nv"&gt;$RESULT&lt;/span&gt; | &lt;span class="nb"&gt;head&lt;/span&gt; &lt;span class="nt"&gt;-20&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;span class="k"&gt;fi&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Where to set hooks:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;.claude/hooks/pre-tool-use.sh&lt;/li&gt;
&lt;li&gt;.claude/hooks/post-tool-use.sh&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Hooks are not bypassed. They're enforcement.&lt;/p&gt;




&lt;h2&gt;
  
  
  Layer 5: Observability (Learning from Decisions)
&lt;/h2&gt;

&lt;p&gt;Observability means: logging decisions, tracking costs, detecting anomalies.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What to log:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Which tools Claude called and why&lt;/li&gt;
&lt;li&gt;Tokens used per session (cost tracking)&lt;/li&gt;
&lt;li&gt;Time spent on each decision&lt;/li&gt;
&lt;li&gt;Failures and retries&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The HumanLayer insight:&lt;/strong&gt; Surface only failures, not 4,000 lines of passing tests.&lt;/p&gt;

&lt;p&gt;Most developers log everything. Better: log strategically.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;#!/bin/bash&lt;/span&gt;
&lt;span class="c"&gt;# Log Claude's decisions&lt;/span&gt;

&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;date&lt;/span&gt; &lt;span class="s1"&gt;'+%Y-%m-%d %H:%M:%S'&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="s2"&gt; | Tool: &lt;/span&gt;&lt;span class="nv"&gt;$TOOL&lt;/span&gt;&lt;span class="s2"&gt; | Status: &lt;/span&gt;&lt;span class="nv"&gt;$STATUS&lt;/span&gt;&lt;span class="s2"&gt; | Tokens: &lt;/span&gt;&lt;span class="nv"&gt;$TOKENS&lt;/span&gt;&lt;span class="s2"&gt; | Duration: &lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;DURATION&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;s"&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; .claude/logs/decisions.log

&lt;span class="nv"&gt;TOTAL_COST&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="s2"&gt;"Tokens:"&lt;/span&gt; .claude/logs/decisions.log | &lt;span class="nb"&gt;awk&lt;/span&gt; &lt;span class="s1"&gt;'{sum+=$NF} END {print sum}'&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;((&lt;/span&gt; &lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$TOTAL_COST&lt;/span&gt;&lt;span class="s2"&gt; &amp;gt; 5.00"&lt;/span&gt; | bc &lt;span class="nt"&gt;-l&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt; &lt;span class="o"&gt;))&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then
  &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"💰 Cost alert: &lt;/span&gt;&lt;span class="nv"&gt;$TOTAL_COST&lt;/span&gt;&lt;span class="s2"&gt; USD today"&lt;/span&gt;
&lt;span class="k"&gt;fi

&lt;/span&gt;&lt;span class="nv"&gt;ERROR_RATE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="s2"&gt;"FAILED"&lt;/span&gt; .claude/logs/decisions.log | &lt;span class="nb"&gt;wc&lt;/span&gt; &lt;span class="nt"&gt;-l&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;((&lt;/span&gt; ERROR_RATE &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; 5 &lt;span class="o"&gt;))&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then
  &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"🚨 High error rate detected: &lt;/span&gt;&lt;span class="nv"&gt;$ERROR_RATE&lt;/span&gt;&lt;span class="s2"&gt; failures in last hour"&lt;/span&gt;
&lt;span class="k"&gt;fi&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Setup Order Matters: 1 → 4 → 2 → 3 → 5
&lt;/h2&gt;

&lt;p&gt;Why not 1 → 2 → 3 → 4 → 5?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Wrong order: Capabilities before guardrails&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Build CLAUDE.md ✅&lt;/li&gt;
&lt;li&gt;Add 10 MCP servers ⚠️&lt;/li&gt;
&lt;li&gt;Grant all permissions ⚠️&lt;/li&gt;
&lt;li&gt;No hooks (too late, broke things already)&lt;/li&gt;
&lt;li&gt;Now add observability (chaos already happened)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Right order: Guardrails first&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Build CLAUDE.md ✅ (memory/rules)&lt;/li&gt;
&lt;li&gt;Add hooks ✅ (enforcement before tools exist)&lt;/li&gt;
&lt;li&gt;Add 2–3 MCP servers ✅ (now hooks guard them)&lt;/li&gt;
&lt;li&gt;Restrict permissions ✅ (layered safety)&lt;/li&gt;
&lt;li&gt;Add observability ✅ (track what's working)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Adding hooks after tools is like adding seatbelts after the crash.&lt;/p&gt;




&lt;h2&gt;
  
  
  Production-Ready Harness: 10-Item Checklist
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;[ ] CLAUDE.md exists, 40–60 lines, checked into git&lt;/li&gt;
&lt;li&gt;[ ] MEMORY.md setup with "lessons learned"&lt;/li&gt;
&lt;li&gt;[ ] .claude/commands/ has 3+ reusable prompts&lt;/li&gt;
&lt;li&gt;[ ] Max 3 MCP servers chosen and documented&lt;/li&gt;
&lt;li&gt;[ ] settings.json has allowlist (filesystem, execution)&lt;/li&gt;
&lt;li&gt;[ ] .claude/hooks/pre-tool-use.sh validates calls&lt;/li&gt;
&lt;li&gt;[ ] .claude/hooks/post-tool-use.sh inspects results&lt;/li&gt;
&lt;li&gt;[ ] .claude/logs/ directory exists + observability hook running&lt;/li&gt;
&lt;li&gt;[ ] Cost tracking implemented (tokens/session)&lt;/li&gt;
&lt;li&gt;[ ] Team knows where each file lives + how to update it&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Which layer do I need first?&lt;/strong&gt;&lt;br&gt;
Layer 1 (CLAUDE.md). Everything depends on clear memory. Start there.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Does this harness slow down Claude Code?&lt;/strong&gt;&lt;br&gt;
No. Hooks add ~100–300ms per tool use. Worth it for the safety. Observability has negligible cost.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What are the most important hooks?&lt;/strong&gt;&lt;br&gt;
PreToolUse (validation) and PostToolUse (anomaly detection). Those two prevent 80% of issues.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How many MCP servers is "too many"?&lt;/strong&gt;&lt;br&gt;
More than 5 becomes noise. More than 3 means you're probably adding tools you won't use. Start with 1–2, add more only when they solve a real workflow problem.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Can I skip permissions and just use hooks?&lt;/strong&gt;&lt;br&gt;
Technically yes, but no. Permissions are defense-in-depth. Hooks catch mistakes. Permissions prevent them.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How do I update CLAUDE.md over time?&lt;/strong&gt;&lt;br&gt;
Document it in MEMORY.md. "We added this rule because X failed." Over time, CLAUDE.md stabilizes.&lt;/p&gt;




&lt;p&gt;Originally published on &lt;a href="https://shipwithai.io/blog/claude-code-harness-5-layers/" rel="noopener noreferrer"&gt;ShipWithAI&lt;/a&gt;. I write about Claude Code workflows, AI-assisted development, and building production systems with AI. Full blog + templates at shipwithai.io.&lt;/p&gt;

&lt;p&gt;What's your harness score? Drop it in the comments. Do you have all 5 layers, or are you still at layer 1?&lt;/p&gt;

</description>
      <category>ai</category>
      <category>productivity</category>
      <category>claudecode</category>
      <category>shipwithai</category>
    </item>
    <item>
      <title>Harness Engineering: Why the System Around AI Matters More Than the AI Itself</title>
      <dc:creator>ShipWithAI</dc:creator>
      <pubDate>Mon, 20 Apr 2026 12:03:07 +0000</pubDate>
      <link>https://forem.com/shipwithaiio/harness-engineering-why-the-system-around-ai-matters-more-than-the-ai-itself-1o9i</link>
      <guid>https://forem.com/shipwithaiio/harness-engineering-why-the-system-around-ai-matters-more-than-the-ai-itself-1o9i</guid>
      <description>&lt;p&gt;Harness engineering is everything around your AI agent except the model: memory, tools, permissions, hooks, observability. LangChain gained 13.7 benchmark points by changing only the harness (52.8% to 66.5%, same model). Most developers only have Layer 1 (CLAUDE.md). Production needs all 5.&lt;/p&gt;




&lt;p&gt;Two lines of config. Same AI model. Completely different reliability:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# CLAUDE.md approach (can be ignored)&lt;/span&gt;
&lt;span class="s2"&gt;"Never delete production database tables."&lt;/span&gt;
&lt;span class="c"&gt;# Claude reads this, weighs it against 200K tokens of context, may ignore it.&lt;/span&gt;

&lt;span class="c"&gt;# Hook approach (always enforced)&lt;/span&gt;
&lt;span class="c"&gt;# PreToolUse hook: command contains "DROP TABLE" + env=production → exit 2 → BLOCKED.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The first is advice. The second is enforcement.&lt;/p&gt;

&lt;p&gt;One lives in a markdown file that competes with thousands of other tokens for the model's attention. The other is a shell script that runs before every command and cannot be bypassed. The gap between these two approaches is the gap most teams don't know exists.&lt;/p&gt;

&lt;p&gt;That gap has a name now: &lt;strong&gt;harness engineering&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  What is harness engineering? (And why prompt engineering isn't enough)
&lt;/h2&gt;

&lt;p&gt;Harness engineering is the discipline of building constraints, tools, feedback loops, and observability around an AI agent to make it reliable in production. The formula, popularized by &lt;a href="https://blog.langchain.com/improving-deep-agents-with-harness-engineering/" rel="noopener noreferrer"&gt;LangChain&lt;/a&gt; and refined on &lt;a href="https://martinfowler.com/articles/harness-engineering.html" rel="noopener noreferrer"&gt;Martin Fowler's site&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Agent = Model + Harness&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The model is a commodity. The harness is your competitive advantage.&lt;/p&gt;

&lt;p&gt;Mitchell Hashimoto, creator of Terraform and Ghostty, defined the core idea: anytime you find an agent makes a mistake, you engineer a solution so the agent never makes that mistake again. In Ghostty's repository, each line in the AGENTS.md file corresponds to a specific past agent failure that's now prevented.&lt;/p&gt;

&lt;p&gt;The industry has moved through three distinct eras:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Era&lt;/th&gt;
&lt;th&gt;Years&lt;/th&gt;
&lt;th&gt;Focus&lt;/th&gt;
&lt;th&gt;Key Question&lt;/th&gt;
&lt;th&gt;Limitation&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Prompt Engineering&lt;/td&gt;
&lt;td&gt;2022-2024&lt;/td&gt;
&lt;td&gt;Crafting better instructions&lt;/td&gt;
&lt;td&gt;"How do I phrase this?"&lt;/td&gt;
&lt;td&gt;Instructions get diluted in long contexts&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Context Engineering&lt;/td&gt;
&lt;td&gt;2025&lt;/td&gt;
&lt;td&gt;Curating what the model sees&lt;/td&gt;
&lt;td&gt;"What information does it need?"&lt;/td&gt;
&lt;td&gt;Knowing isn't doing&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Harness Engineering&lt;/td&gt;
&lt;td&gt;2026&lt;/td&gt;
&lt;td&gt;Building systems around the agent&lt;/td&gt;
&lt;td&gt;"What can it do, and what can't it?"&lt;/td&gt;
&lt;td&gt;Emerging discipline&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Prompt engineering shapes what the agent &lt;em&gt;tries&lt;/em&gt;. Context engineering shapes what the agent &lt;em&gt;knows&lt;/em&gt;. Harness engineering shapes what the agent &lt;strong&gt;can and cannot do&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  How did LangChain gain 13.7 benchmark points without changing the model?
&lt;/h2&gt;

&lt;p&gt;By improving three harness components, LangChain jumped from 52.8% to 66.5% on &lt;a href="https://www.tbench.ai/news/announcement-2-0" rel="noopener noreferrer"&gt;Terminal Bench 2.0&lt;/a&gt; (a benchmark of 89 real-world terminal tasks) while keeping the same model, gpt-5.2-codex. They went from Top 30 to Top 5. No fine-tuning. No model swap. Just harness changes.&lt;/p&gt;

&lt;p&gt;Here are the three changes:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Context injection.&lt;/strong&gt; LangChain's &lt;code&gt;LocalContextMiddleware&lt;/code&gt; maps the environment upfront and injects it directly into the agent's context. Before this change, the agent wasted steps trying to understand its surroundings.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Self-verification loops.&lt;/strong&gt; After each action, the agent verifies its output against task-specific criteria before moving on. Not just "run the tests." The agent checks whether the output matches what the task actually asked for.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Compute allocation.&lt;/strong&gt; This one is counterintuitive: running at maximum reasoning budget (xhigh) scored only 53.9%, while the high setting scored 63.6%. More compute caused timeouts that hurt overall performance.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Setting&lt;/th&gt;
&lt;th&gt;Score&lt;/th&gt;
&lt;th&gt;Notes&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Before harness changes&lt;/td&gt;
&lt;td&gt;52.8%&lt;/td&gt;
&lt;td&gt;Baseline, Top 30&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;After harness changes (high reasoning)&lt;/td&gt;
&lt;td&gt;66.5%&lt;/td&gt;
&lt;td&gt;Top 5, +13.7pp&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Max reasoning (xhigh)&lt;/td&gt;
&lt;td&gt;53.9%&lt;/td&gt;
&lt;td&gt;Worse than baseline, timeouts&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;If you're evaluating AI coding tools by comparing model benchmarks alone, you're measuring the wrong variable.&lt;/p&gt;




&lt;h2&gt;
  
  
  What are the 5 layers of an AI agent harness?
&lt;/h2&gt;

&lt;p&gt;A production harness has five layers. Most developers I talk to in the Claude Code community have Layer 1 and maybe part of Layer 2. That leaves three layers of reliability on the table.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;What It Is&lt;/th&gt;
&lt;th&gt;Problem It Solves&lt;/th&gt;
&lt;th&gt;Claude Code Implementation&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1. Memory&lt;/td&gt;
&lt;td&gt;Persistent context across sessions&lt;/td&gt;
&lt;td&gt;Agent "forgets" your conventions every session&lt;/td&gt;
&lt;td&gt;CLAUDE.md, MEMORY.md, .claude/commands/&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2. Tools&lt;/td&gt;
&lt;td&gt;Extended capabilities beyond built-ins&lt;/td&gt;
&lt;td&gt;Agent can't access your APIs, databases, or services&lt;/td&gt;
&lt;td&gt;MCP servers, custom tools&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3. Permissions&lt;/td&gt;
&lt;td&gt;What the agent is allowed to do&lt;/td&gt;
&lt;td&gt;Agent edits sensitive files or runs dangerous commands&lt;/td&gt;
&lt;td&gt;settings.json allow/deny lists&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4. Hooks&lt;/td&gt;
&lt;td&gt;Automated enforcement at lifecycle points&lt;/td&gt;
&lt;td&gt;Instructions get ignored under context pressure&lt;/td&gt;
&lt;td&gt;PreToolUse/PostToolUse hooks&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5. Observability&lt;/td&gt;
&lt;td&gt;Knowing what the agent actually did&lt;/td&gt;
&lt;td&gt;No visibility into agent decisions or cost&lt;/td&gt;
&lt;td&gt;Session logs, cost tracking, action audit&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Think of it like your CI/CD pipeline. You built that infrastructure once, and the whole team benefits on every push. A harness works the same way for AI agent sessions.&lt;/p&gt;

&lt;p&gt;OpenAI demonstrated this at scale. Their Codex team shipped roughly one million lines of production code, with zero lines written by human hands, over five months. Their harness included AGENTS.md files, reproducible dev environments, and mechanical invariants in CI. Development throughput was roughly one-tenth the time a human team would have needed.&lt;/p&gt;




&lt;h2&gt;
  
  
  Where is your harness right now?
&lt;/h2&gt;

&lt;p&gt;Run this checklist:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;#&lt;/th&gt;
&lt;th&gt;Question&lt;/th&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Do you have a CLAUDE.md with project conventions and constraints?&lt;/td&gt;
&lt;td&gt;Memory&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;Do you have MCP servers connecting Claude Code to external tools?&lt;/td&gt;
&lt;td&gt;Tools&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;Do you have settings.json with explicit allow/deny lists?&lt;/td&gt;
&lt;td&gt;Permissions&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;Do you have at least one PreToolUse hook that blocks dangerous actions?&lt;/td&gt;
&lt;td&gt;Hooks&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;Can you see what Claude did in each session and how much it cost?&lt;/td&gt;
&lt;td&gt;Observability&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Your score:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;1/5&lt;/strong&gt;: You're in the majority. Most developers stop at CLAUDE.md.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;2-3/5&lt;/strong&gt;: Ahead of most. You've started building real infrastructure.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;4-5/5&lt;/strong&gt;: Production-ready. You're doing harness engineering whether you knew the name or not.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Be honest about question 4. If the answer is no, your agent can still &lt;code&gt;rm -rf&lt;/code&gt; your project directory. CLAUDE.md says "don't do that." A hook actually prevents it.&lt;/p&gt;

&lt;p&gt;Here's why this matters: an ETH Zurich study (Feb 2026) tested context files across 138 real-world tasks from 12 Python repositories. Human-written context files improved agent success by only about 4%. LLM-generated ones actually &lt;em&gt;reduced&lt;/em&gt; success by ~3% while increasing inference costs by over 20%. Instructions alone aren't enough. You need enforcement layers.&lt;/p&gt;




&lt;h2&gt;
  
  
  How do you start building a harness today?
&lt;/h2&gt;

&lt;p&gt;You don't need all 5 layers at once. Start with three high-impact changes that take less than 30 minutes total.&lt;/p&gt;

&lt;h3&gt;
  
  
  Quick Win 1: Create a MEMORY.md (5 minutes)
&lt;/h3&gt;

&lt;p&gt;MEMORY.md is a lightweight index that points to where knowledge lives in your project. Unlike CLAUDE.md (which holds static rules), MEMORY.md tracks evolving state: recent decisions, architectural changes, active work.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;Auth&lt;/span&gt;&lt;span class="p"&gt;](&lt;/span&gt;&lt;span class="sx"&gt;src/lib/auth/&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; — Clerk, not NextAuth. Migrated March 2026.
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;DB&lt;/span&gt;&lt;span class="p"&gt;](&lt;/span&gt;&lt;span class="sx"&gt;prisma/schema.prisma&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; — PostgreSQL on Supabase. All queries via Prisma.
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;Deploy&lt;/span&gt;&lt;span class="p"&gt;](&lt;/span&gt;&lt;span class="sx"&gt;docs/deploy.md&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; — Vercel preview for PRs, production on main.
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;Testing&lt;/span&gt;&lt;span class="p"&gt;](&lt;/span&gt;&lt;span class="sx"&gt;vitest.config.ts&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; — Vitest unit, Playwright E2E. Min 80% coverage.
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;API&lt;/span&gt;&lt;span class="p"&gt;](&lt;/span&gt;&lt;span class="sx"&gt;src/app/api/&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; — Server Actions preferred over API routes for mutations.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Quick Win 2: Add one PreToolUse guardrail hook (15 minutes)
&lt;/h3&gt;

&lt;p&gt;This hook blocks Claude Code from editing sensitive files. Copy-paste ready:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;#!/bin/bash&lt;/span&gt;
&lt;span class="c"&gt;# .claude/hooks/block-sensitive-files.sh&lt;/span&gt;
&lt;span class="c"&gt;# Blocks edits to .env, credentials, and CI config&lt;/span&gt;

&lt;span class="nv"&gt;INPUT&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;cat&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;
&lt;span class="nv"&gt;FILE_PATH&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$INPUT&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; | jq &lt;span class="nt"&gt;-r&lt;/span&gt; &lt;span class="s1"&gt;'.tool_input.file_path // empty'&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;

&lt;span class="nv"&gt;SENSITIVE&lt;/span&gt;&lt;span class="o"&gt;=(&lt;/span&gt;&lt;span class="s1"&gt;'.env'&lt;/span&gt; &lt;span class="s1"&gt;'credentials'&lt;/span&gt; &lt;span class="s1"&gt;'.github/workflows'&lt;/span&gt; &lt;span class="s1"&gt;'secrets'&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;for &lt;/span&gt;pattern &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;SENSITIVE&lt;/span&gt;&lt;span class="p"&gt;[@]&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;do
  if&lt;/span&gt; &lt;span class="o"&gt;[[&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$FILE_PATH&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="k"&gt;*&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$pattern&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;*&lt;/span&gt; &lt;span class="o"&gt;]]&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then
    &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"BLOCKED: Cannot edit sensitive file: &lt;/span&gt;&lt;span class="nv"&gt;$FILE_PATH&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&amp;amp;2
    &lt;span class="nb"&gt;exit &lt;/span&gt;2
  &lt;span class="k"&gt;fi
done

&lt;/span&gt;&lt;span class="nb"&gt;exit &lt;/span&gt;0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Register it in &lt;code&gt;.claude/settings.json&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"hooks"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"PreToolUse"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"matcher"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Edit|Write"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"hooks"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
            &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"bash .claude/hooks/block-sensitive-files.sh"&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Quick Win 3: Enable cost awareness (10 minutes)
&lt;/h3&gt;

&lt;p&gt;Track what each session costs so you notice anomalies early. Boris Cherny, creator of Claude Code, calls verification "probably the most important thing" for quality:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Give Claude a way to verify its work. If Claude has that feedback loop, it will 2-3x the quality of the final result."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Start simple: review &lt;code&gt;~/.claude/projects/&lt;/code&gt; after each session to check what Claude did and how much it cost.&lt;/p&gt;




&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What is the difference between harness engineering and prompt engineering?
&lt;/h3&gt;

&lt;p&gt;Prompt engineering shapes what the agent tries. Context engineering shapes what the agent knows. Harness engineering shapes what the agent can and cannot do. They're not replacements — they're layers. A production AI workflow uses all three, but harness engineering provides the strongest reliability guarantees because it uses enforcement (hooks, permissions) rather than suggestions (prompts, context).&lt;/p&gt;

&lt;h3&gt;
  
  
  Do I need harness engineering for Claude Code?
&lt;/h3&gt;

&lt;p&gt;Yes. Claude Code is itself a harness that Anthropic built around their model. But it's the &lt;em&gt;inner&lt;/em&gt; harness. You need an &lt;em&gt;outer&lt;/em&gt; harness tailored to your project: CLAUDE.md for conventions, hooks for guardrails, MCP servers for tools, permissions for safety boundaries, and observability for cost control.&lt;/p&gt;

&lt;h3&gt;
  
  
  Is harness engineering only for Claude Code?
&lt;/h3&gt;

&lt;p&gt;No. The principles apply to any AI coding agent: Cursor, GitHub Copilot, OpenAI Codex, Windsurf, Cline. Claude Code happens to offer the most programmable harness surface (17 hook events, MCP protocol, skills system), which is why examples here use it. The concepts transfer directly to other tools.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Try it now:&lt;/strong&gt; Pick one quick win above and implement it before your next Claude Code session. Quick Win 2 is copy-paste ready and takes 3 minutes.&lt;/p&gt;

&lt;p&gt;What's your harness score right now? Drop it in the comments — I'm curious how many devs have gone beyond Layer 1.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published on &lt;a href="https://shipwithai.io/blog/harness-engineering-claude-code/" rel="noopener noreferrer"&gt;ShipWithAI&lt;/a&gt;. I write about Claude Code workflows, AI-assisted development, and shipping software faster with structured AI.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>claude</category>
      <category>productivity</category>
      <category>programming</category>
    </item>
  </channel>
</rss>
