Forem: Mario

Claude Code Skills and CLAUDE.md — How They Fit Together

Mario — Mon, 27 Apr 2026 07:57:27 +0000

Claude Code Skills and CLAUDE.md — How They Fit Together

Skills are the part of Claude Code that most people discover late. Slash commands like /handoff, /review, or /wt look like simple shortcuts, but each one is backed by a markdown file that loads scoped behavior into the agent only when invoked. The relationship between skills and CLAUDE.md is consequential — it determines what rules the agent sees in every session, what rules it sees only when needed, and where you should put a new rule when you decide to write one. This is how the two fit together as of early 2026, and how to use the relationship to keep your CLAUDE.md short while still capturing rich workflows.

What a skill actually is

A skill is a markdown file with frontmatter, stored under ~/.claude/skills/ (user-global) or .claude/skills/ (project-local). The frontmatter declares the skill's name, description, and triggering rules. The body is markdown prose that gets loaded into the agent's context when the skill is invoked.

Skills get invoked in two ways. The user can type /<skill-name> directly. Or the user can describe a task in natural language, and the harness will detect the match against the skill's description field and offer to invoke it. The second mode is the one most people miss — the description field is doing a lot of work, and a well-written description is what makes the skill actually trigger when it should.

The body of a skill can include workflow steps, format specifications, hard rules, examples, and pointers to other documents. When the skill is invoked, the entire body becomes part of the agent's working context for that workflow. This is the key architectural property: skills are scoped context, while CLAUDE.md is always-on context.

Why scoping matters

CLAUDE.md is loaded at every session start. Every line in it costs context budget every session, regardless of whether the rule is relevant to the current task. A 300-line CLAUDE.md costs you 300 lines of context for the rest of the session, even when you are debugging a one-file CSS issue and only three of those lines apply.

Skills are loaded only when invoked. A 500-line skill costs you nothing for sessions that don't invoke it. If you have a workflow with rich rules — handoff procedures, content review protocols, release checklists — putting them in a skill instead of CLAUDE.md keeps the always-on context small while still encoding the workflow precisely.

The 2026 design pattern, emerging from teams that run dozens of skills, is: CLAUDE.md captures the project-wide rules that apply to most sessions. Skills capture the workflow-specific rules that apply only when the workflow is active. The split is similar in spirit to global-vs-project CLAUDE.md, but operates on a different axis: always-applicable vs sometimes-applicable.

What belongs in a skill instead of CLAUDE.md

Five categories of rules tend to belong in a skill rather than in CLAUDE.md.

Workflow procedures. Multi-step processes that fire on specific user intent. Handoff procedures (/handoff), release procedures (/release), planning procedures (/gotoplan), review procedures (/review). The rules for "how do we hand off a session" do not apply when you are not handing off — they should not be in your context for those sessions.

Format specifications. When a workflow requires a specific output format (an experience note schema, a HANDOFF.md schema, a plan-file format), the format spec belongs in the skill that produces or consumes the format. CLAUDE.md can carry a one-liner pointing at the skill if the format is referenced often.

Tool-specific configurations. Rules about how to use a particular tool — when to invoke it, what flags to pass, how to interpret its output. Examples: rules about how to use a specific API client, rules about how to format git commits in a specific way for a specific tool. If the rule only matters when the tool is in use, the rule belongs near the tool's invocation, not in always-on CLAUDE.md.

One-off operational drills. "Run the daily forgotten-scan." "Run the weekly digest cleanup." These are scheduled or on-demand actions that have specific procedures but do not apply to most sessions.

Verbose reference material. If a rule needs five paragraphs of background to be applicable, the five paragraphs belong in a skill, not in CLAUDE.md. CLAUDE.md prefers terse, falsifiable rules; skills can afford longer prose because they only load when needed.

What still belongs in CLAUDE.md

Five categories that do not migrate well to skills, because their rules apply in every session.

Project orientation. What the project is, what stack it runs, who uses it. The agent needs this in every session.

Hard process rules. "All changes go through PR." "Tests must pass before merge." These apply universally and need to be visible at session start so they constrain decisions before any tool is invoked.

Language and style commitments. What language, what frameworks, what naming. These apply whenever code is being written.

Operational facts that surface in many tasks. "This path is a symlink." "The CI matrix runs Linux only." If the fact applies to multiple workflows, it belongs in CLAUDE.md.

Principles. The rules that apply when no other rule applies. By construction, these need to be in every session.

Skill triggers and the description field

The mechanism by which a skill loads requires brief discussion because the trigger semantics are easy to misunderstand.

Each skill has a description field in its frontmatter. The Claude Code harness compares user intent against skill descriptions and offers to invoke a matching skill. This match is not a literal substring search — it is intent matching, which means the description should describe the intent the skill serves, not the procedure the skill performs.

The pattern that works:

description: |
  Use this skill when the user wants to wrap up a session and produce
  a brief for the next session. Triggers on '/handoff', '收尾', '总结一下',
  '交接', or 'wrap up'. Skip pure Q&A or mid-execution unstable state.

The pattern that does not work:

description: Handoff skill that generates a YAML brief in Schema A or B format.

The first describes the user intent (wrapping up, ending a session) and lists explicit invocation triggers. The second describes the implementation, which the matcher cannot tie back to user intent.

This is the rule of thumb: write skill descriptions for when to invoke, not what the skill does internally. The body of the skill is where the procedure lives.

A migration example

Here is a common situation. CLAUDE.md has grown a 60-line section called "How to write a HANDOFF file." The section has the schema, the format rules, the do/don't list, examples. It is a precise, useful section. It is also dead weight in every session that is not ending — which is most sessions.

Migration: extract the section into ~/.claude/skills/handoff/SKILL.md. Keep the frontmatter description tight ("Use when the user wants to wrap up a session..."). Keep the body — schema, rules, examples — exactly as it was in CLAUDE.md. In CLAUDE.md, replace the 60-line section with a single line:

Session ending: invoke /handoff. Schema and format rules in the skill.

That is the migration. CLAUDE.md is now 60 lines shorter for every session. The rules are still encoded — the agent reads them when /handoff is invoked. The relevant context is loaded when relevant; the dead weight is gone the rest of the time.

The same pattern works for plan-management procedures, review checklists, release runbooks, content review protocols, and any other workflow with rich rules and infrequent activation.

When skills are the wrong choice

Skills are not always better. Three cases where a rule should stay in CLAUDE.md.

The rule applies in every session. If the agent needs to know the rule before it can make any decision, the rule has to be in CLAUDE.md. Hard process rules and project orientation are the canonical examples.

The rule is short. Migrating a one-line rule to a skill costs more (in indirection, in maintenance) than it saves. Skills earn their keep when the procedure is long enough that scoping the load matters.

The rule depends on always-on enforcement. If the rule needs a hook to enforce, the hook needs to know the rule at session start. CLAUDE.md is the natural carrier for rules that are also wired into hooks.

Putting it together

The mental model: CLAUDE.md is the agent's persistent operating context — short, dense, always loaded. Skills are the agent's situational briefings — verbose, procedural, loaded on demand. Between them, you can encode rich behavioral rules without paying the context cost of always-on density.

Most projects that have been using Claude Code for a year are mid-migration: their CLAUDE.md is too long because it accumulated workflow-specific sections that should have been skills. The remediation path is to identify the workflow sections, extract each one into a skill, and replace it in CLAUDE.md with a one-line pointer. The shorter CLAUDE.md and the richer skills carry more total weight than the long CLAUDE.md ever did.

If you have not written a skill yet, the cheapest first one is the workflow you find yourself doing most often that has a recognizable shape. Session handoffs are the easy first target because every team does them and the format is well-defined. Plan-writing is another. Code review is another. Once you have one skill working, the second and third come quickly — and the migration of workflow rules out of CLAUDE.md becomes mechanical.

CLAUDE.md and skills are not competitors. They are the two halves of the agent's operating context, scoped differently for different purposes. Knowing which half a new rule belongs in is one of the cleanest design decisions you can make in your harness, and one of the highest-leverage refactors when you realize the existing distribution is wrong.

Originally posted on agentlint.app/blog/claude-code-skills-and-claude-md.

The 4 Lines Every CLAUDE.md Needs

Mario — Mon, 27 Apr 2026 07:56:52 +0000

The 4 Lines Every CLAUDE.md Needs

If you only have time to write four lines of CLAUDE.md, write these four. They are the lines that, in audits across about a hundred projects, separated the CLAUDE.md files that visibly carried weight from the ones that drifted into the dead-letter pile. They are the lines whose absence reliably correlated with "the agent did something I didn't want," and whose presence reliably correlated with "the agent's first decision was the right one." This piece is about which four lines, why those four, and what they each do that no other line can.

Why four

The number is empirical, not aesthetic. We catalogued the rules that appeared in CLAUDE.md files we considered well-written, looked for the rules that appeared in most of them, and ranked by impact on observed agent behavior. Above four lines, the marginal value drops off fast — line five matters less than line four, line ten matters less than line five. Below four, the file is missing something that audits flagged in nearly every project that didn't have it. So four became the operational floor.

This is not the same as saying four lines is enough. Four is the minimum. Most working files are 100-250 lines. But if you ship a CLAUDE.md with only the four below, you have already captured most of the value that prose can add. Everything beyond four is refinement.

Line one: what this project is, in one sentence

This is a real-time collaboration tool for editing shared documents
in the browser. TypeScript, React, Yjs, Postgres, Redis. Deployed on Fly.io.

The agent needs to know what kind of project this is before it has read a single source file. The first sentence prevents an entire class of bad early decisions: the agent suggesting database migrations for a frontend-only project, proposing browser polyfills for a Node script, optimizing for mobile in a desktop-only tool. Without this line, the agent infers from package.json and src/ structure, which usually works but occasionally produces a confident misclassification that derails the session.

The line should answer three questions: what does it do, what is it built with, where does it run. Twenty-five words is enough. If you cannot describe your project in twenty-five words, the agent's confusion in the first session is your warning that the project itself lacks a clear identity.

Line two: how to make changes — the process rule

All changes go through PR. No direct pushes to main. Tests must pass before merge.

Process rules separate "the agent did something good" from "the agent shipped something to production at 3am." Without the process line, the agent's defaults take over, and the defaults are often "do whatever the user asks." If the user asks for a quick fix, the agent will push to main. If the user is in a hurry, the agent will skip the test run.

The process line tells the agent what is non-negotiable about the way changes happen. Three sub-rules cover most projects: branching policy, push policy, gating policy. Some projects add review policy, commit-message policy, or merge strategy. Four to six rules in this line is fine; the irreducible core is "where do changes start, where do they end up, what gates them."

This is also the line that needs an enforcement pair. CLAUDE.md describes; hooks enforce. The line "no direct pushes to main" is wishful thinking unless there is a pre-push hook or a branch protection rule actually blocking the push. Pair this line with ~/.claude/scripts/push-gate.sh or equivalent, and now the rule has teeth.

Line three: the language and style commitment

TypeScript with strict mode. No `any`. State via Jotai (Redux/Zustand
deprecated). Tests live next to source as `<name>.test.ts`.

The language and style line tells the agent what the project's technical choices already are, so it does not propose alternatives. Without this line, the agent will sometimes look at the project and propose migrations that the team explicitly chose not to do. "This codebase looks like it should use Pinia" — except the team picked Zustand for specific reasons. "TypeScript would be cleaner with branded types here" — except the team has a rule against branded types. The line preempts these proposals.

Three sub-rules in this line cover most cases: language pinning (TypeScript strict, Python with uv, etc.), framework pinning (which library for state, which for forms, which for routing), structural conventions the linter does not catch (where tests live, what naming uses kebab-case vs PascalCase). Five to ten rules in this line is typical.

The 2026 best practice is to capture the deprecated choices explicitly. "State via Jotai (Redux/Zustand deprecated)" is more useful than "State via Jotai" because it tells the agent which proposals to actively avoid. Agents reading the code can deduce that Jotai is in use; they cannot deduce that Redux was tried and rejected.

Line four: the principle that catches edge cases

Don't add error handling for cases that can't happen. Trust internal
code; validate at boundaries. Three identical lines is better than a
premature abstraction.

The principle line is the one that handles cases the other lines do not cover. Specific rules tell the agent what to do when the case is named; principles tell the agent how to reason when the case is not named. Without principles, the agent's defaults take over, and the defaults are usually "be defensive, abstract early, handle every case" — which is fine for some projects and exactly wrong for others.

Three to seven principles is the operational range. Each principle should have a one-sentence rationale, even if you don't write it down — the rationale lets the agent apply the principle to edge cases the rule does not explicitly cover. "Don't add error handling for cases that can't happen — because the handling code itself becomes a maintenance burden and obscures the real failure modes" applies cleanly to a case the rule's surface form does not name.

The hardest part of the principles line is keeping it short. Principles are seductive; once you start writing them, every team value wants to become one. The discipline is to ask: would removing this principle change a decision the agent would otherwise get wrong? If yes, keep it. If no, delete it.

What the four lines do together

Each line covers a different layer of the agent's decision space. Line one orients the agent ("what world am I in"). Line two constrains the agent's actions ("what can I not do"). Line three steers the agent's choices ("what defaults apply"). Line four guides the agent's edge-case reasoning ("what do I do when no rule applies").

Without line one, the agent operates on inferred context, which is right most of the time and wildly wrong occasionally. Without line two, the agent ships things you wouldn't have shipped. Without line three, the agent proposes migrations you don't want. Without line four, the agent's edge cases default to its training, which is averaged over millions of projects and probably not optimized for yours.

The four lines together produce an agent that, in the first session, makes decisions that look like decisions you would have made. That is the whole game.

What the four lines deliberately omit

It is worth saying what these four lines are not trying to do.

They do not constitute the whole CLAUDE.md. Operational notes (the symlink that catches everyone, the flaky CI step), communication style (English / Chinese, terse / verbose), and meta-rules (review cadence, override declarations) all belong in the file. They just do not belong in the irreducible four.

They do not replace enforcement. Each of these lines needs a corresponding mechanism — process line needs hooks, style line needs linter config, principles need code review or AgentLint sweeps. The CLAUDE.md prose is the briefing; the harness is the enforcement.

They do not replace AGENTS.md or .cursor/rules. If your project has multiple agent harness files, each needs its own four lines (or pointer to a shared source of truth). The four-line discipline applies file-by-file.

A challenge

If you have an existing CLAUDE.md, here is a useful exercise. Open it, find the four lines that map to "what this project is," "how changes happen," "language and style," and "principles." If your file does not have all four, add the missing ones. If your file has more than four lines competing for any one slot, consolidate.

If you do not have a CLAUDE.md yet, write the four lines first, ship them, and let the file grow only when you observe specific agent behavior that justifies a new rule. The first session with the four lines will be 80% as good as the first session with a 200-line file. The second session, after you have observed which rules are actually load-bearing, you will know what to add — and the additions will be better than anything you could have predicted before you saw the agent in action.

The four lines are not the destination. They are the launchpad. But shipping the launchpad is the part most projects skip, which is why so many CLAUDE.md files start at line one hundred without ever having been useful at line four.

Originally posted on agentlint.app/blog/the-4-lines-every-claude-md-needs.

CLAUDE.md: Global vs Project — What Goes Where

Mario — Mon, 27 Apr 2026 07:36:00 +0000

CLAUDE.md: Global vs Project — What Goes Where

Claude Code reads CLAUDE.md at two levels. The user-global file at ~/CLAUDE.md (or ~/.claude/CLAUDE.md) loads in every session, on every project. The project-local CLAUDE.md at the repo root loads only when you are working in that repo. Both files are read; their contents stack. The decision of which content goes where is one of the most consequential choices in the harness, and one of the easiest to get wrong.

Why the split exists

The split is not cosmetic. It exists because some rules belong to you and some belong to the project, and conflating the two produces predictable failures.

Rules that belong to you are about how you work — your communication style, your tools, your operational quirks, the conventions you have across all your projects. These do not change when you switch projects. They are not negotiable with collaborators. They live in ~/CLAUDE.md, and every project you touch inherits them.

Rules that belong to the project are about what the project is and how the team has decided to operate. These do not change when you leave or someone else joins. They are decided by the team, encoded in the repo, and apply equally to every contributor and every agent that touches the code. They live in the project's CLAUDE.md, alongside the rest of the project state.

When you put project rules in ~/CLAUDE.md, the rules follow you instead of the project — which means a collaborator who clones the same repo gets a different agent experience. When you put personal preferences in the project file, they get shipped to everyone, including collaborators who do not share them. Both cases produce friction.

What goes in the global file

Eight categories tend to belong in ~/CLAUDE.md for most users.

Communication style. What language to talk to you in, how terse, what tone, what nicknames if any. The agent's communication preferences are personal — they should not change per project.

Universal hard rules you apply everywhere. "Never edit lockfiles by hand," "never push directly to main," "all changes through PR." If you apply these to every project, they belong global so you don't repeat them in fifteen project files.

Operational notes about your machine. Where your tools live, what shell you use, what virtualization platform, what code editor. The agent benefits from knowing your local environment; the project does not need to know.

Default tools and aliases. "Prefer rg over grep," "I have a shortcut kw that means kalami + worktree." These are personal ergonomic preferences.

Your knowledge bases. The paths to your Armory, your AIMD experience library, your wiki, your scratchpad. These are yours, not the project's.

Default behaviors when context is missing. "When unsure, ask one specific question, not three." "Don't create new files in the home directory." Personal defaults that should kick in regardless of project.

Your workflow conventions. "I review code before pushing," "I want a plan before any 3+ step task," "I expect commit messages in imperative mood." How you work, not how the project works.

Your meta-rules. "Tell me what you're about to do before each tool call." These shape the agent's interaction style with you specifically.

What goes in the project file

Six categories tend to belong in the project's CLAUDE.md, regardless of who is contributing.

Project description. What this project does. What it is for. What stack it runs. The first paragraph in every well-written project CLAUDE.md.

Team-decided process rules. Branching strategy, commit conventions, PR requirements, test requirements, release process. These are team contracts; every contributor must follow them.

Project-specific tooling. Build commands, test commands, dev environment setup, package manager. Different projects use different tools; the project file is where this gets pinned.

Architecture decisions. "Monolith, not microservices." "State via Jotai, not Zustand." "Use TanStack Query for server state." Decisions the project has made that the agent should not relitigate.

Operational facts about the codebase. "This path is a symlink." "The Playwright suite is flaky on Tuesdays." "The migration in 0042 is intentionally non-reversible." Project-specific knowledge that survives across contributors.

Project principles. Three to seven, with rationale. The principles that the team has agreed apply to this codebase. Different projects can have different principles — this is where they go.

The override mechanic

When a rule appears in both files, the project rule wins. This is the documented and expected behavior in Claude Code. It also produces a recurring source of confusion: a global rule "use bun" gets overridden by a project rule "use npm" without any warning. The agent obediently switches, but the global rule remains in your ~/CLAUDE.md looking like it should still apply.

The discipline is to make overrides explicit. When a project file overrides a global rule, the line should say so: Use npm — overrides ~/CLAUDE.md § Tools (project CI requires npm). This makes the override traceable and tells the agent that the contradiction was deliberate.

The opposite case — a project rule that contradicts a global rule unintentionally — is harder to catch. AgentLint v0.4 (planned) will check for global-vs-project contradictions and flag undeclared overrides. Until then, the manual sweep is to grep both files for the same nouns and check for contradictions.

Migration patterns

Two migration patterns come up often.

Promoting a project rule to global. You have written the same rule in three project CLAUDE.md files in a row. That is the signal it should be global. Move it once to ~/CLAUDE.md, delete from the three project files, run AgentLint to confirm none of them re-introduce it as an override.

Demoting a global rule to project. You have a global rule that does not actually apply to a new project. The temptation is to add an exception in the project file. Better path: check whether the global rule is correct as-stated for most of your projects. If yes, add an explicit override in the new project. If no, the global rule was wrong — move it down to the projects where it actually applies.

The principle is that ~/CLAUDE.md should encode rules that apply almost universally to your work. If you find yourself overriding a global rule in 30% of projects, the global rule is wrong.

A worked example

Here is the same rule cluster, allocated correctly:

~/CLAUDE.md (excerpt):

## Communication
- Talk to me in spoken Chinese during chat. Code, comments,
  commits, docs in English.
- Be terse. No "Let me know if..." trailers.
- Default discussion mode unless I use action verbs.

## Universal rules
- All changes go through PR. No direct pushes to main.
- Don't edit lockfiles by hand. Use the package manager.
- No secrets in commits. .env* is gitignored globally.

## Operational
- I work on a Mac Studio M4 Max. Tools live at /opt/homebrew/bin.
- I prefer rg over grep, fd over find, bat over cat.
- My scratch dir is /tmp; my project dir is ~/Projects.

Project CLAUDE.md (excerpt, same conceptual cluster):

## Project
A real-time collaboration tool. TypeScript, React, Yjs,
Postgres, Redis, deployed on Fly.io.

## Process
- Branch from main, PR, squash-merge.
- Tests required: Vitest unit + Playwright E2E.
- Use bun, not npm — overrides ~/CLAUDE.md § Tools.

## Architecture
- Monolith. We tried microservices in Q3 2024 — do not propose.
- State: TanStack Query for server, Zustand for client. No Redux.
- CRDT via Yjs. Do not propose Automerge or Y.js wrappers.

## Operational
- Postgres 16 + Redis 7 via docker-compose; `bun dev:up`.
- Playwright requires `bun exec playwright install` after clone.
- CI runs Linux only — Mac/Windows local issues do not fail CI.

Notice: the global file talks about how you work; the project file talks about how the project works. They share the abstraction "rules" but partition cleanly. The project file's bun, not npm line explicitly cites the global override, so the contradiction is deliberate and traceable.

What to do this week

Two practical exercises. First, read your ~/CLAUDE.md and the CLAUDE.md of the project you are most active in. Identify any rule in ~/CLAUDE.md that is actually project-specific (mentions a tool only one project uses, references a path only one project has). Move those to the project file. Identify any rule in the project file that is actually personal preference (your communication style, your shortcuts). Move those to global. The reallocation usually shrinks both files.

Second, make sure every undeclared override is either declared or removed. Grep for common nouns (build command, package manager, test runner, branching strategy) in both files. Where they contradict, add the explicit "overrides ~/CLAUDE.md § X" tag in the project file. The agent will appreciate the disambiguation, and so will future contributors who read both files cold.

Originally posted on agentlint.app/blog/claude-md-global-vs-project.

How Claude Code Rules Actually Work

Mario — Mon, 27 Apr 2026 07:35:25 +0000

How Claude Code Rules Actually Work

The phrase "Claude Code rules" gets used loosely. Sometimes it means CLAUDE.md. Sometimes it means slash command definitions. Sometimes it means hook configurations. Sometimes it means MCP server permissions or tool restrictions. The pieces interact in non-obvious ways, and a rule that works in one place can be silently overridden by a rule somewhere else. This piece traces the actual loading order, what each layer can and cannot do, and where the most common confusions come from.

What "rules" means inside Claude Code

Claude Code, as of early 2026, has at least six places where rules can live. Each layer affects the agent differently, has its own loading semantics, and interacts with the others in specific ways.

The first is CLAUDE.md, both the user-global one (~/.claude/CLAUDE.md or ~/CLAUDE.md) and the project-local one. These are markdown prose, loaded into the agent's context at session start. They establish the operating context and behavioral norms.

The second is settings.json, again both user-global and project-local. This is where hooks, permissions, environment variables, and plugin configuration live. Settings.json is config, not prose — the agent does not read it; the harness does, and the harness enforces what it says.

The third is hooks — shell scripts triggered by lifecycle events (SessionStart, PreToolUse, PostToolUse, PreCompact, Stop). Hooks are configured in settings.json and executed by the Claude Code runtime. They can inject text into the session, block tool calls, run validations, or send notifications.

The fourth is slash commands — markdown files in ~/.claude/skills/ or .claude/skills/ whose names are invocation triggers. When you type /handoff or /review, you are invoking a skill. Skills can prescribe a workflow, restrict tool access, or require specific outputs.

The fifth is MCP server permissions — entries in settings.json that allow or deny specific tools from MCP servers. These are the finest-grained access controls.

The sixth is environment-level constraints — environment variables, permission modes (auto, plan, bypassPermissions), and runtime flags. These are usually set by the launching shell or by the Claude Code CLI itself.

The loading order

When a session starts, the harness loads in this order. Understanding the order is the difference between a rule that works and a rule that is silently overridden.

Step one: environment-level constraints load first. The permission mode is set, environment variables are bound, the working directory is established. These cannot be changed by anything inside the session.

Step two: settings.json is loaded. Both user-global and project-local. Project-local settings layer on top of user-global. Hooks are registered, MCP servers are connected, plugin lists are resolved.

Step three: SessionStart hooks fire. Each registered SessionStart hook runs as a shell command. Their stdout is injected into the session as system-reminder messages. This is the layer that can dynamically inject context — it is also the layer that introduces the most surprise, because hook output is invisible until the agent actually receives it.

Step four: CLAUDE.md is loaded. User-global first, then project-local. The contents are added to the session context. CLAUDE.md cannot override settings.json or hooks; it can only describe behavioral norms within the constraints those layers set.

Step five: the agent's first turn begins. The user's first message arrives. The agent's response is shaped by everything loaded above.

When something seems "wrong" — a rule isn't being followed, a tool isn't accessible, a hook isn't firing — the debug protocol is to walk back up the loading order and check each layer in turn.

What CLAUDE.md can and cannot do

CLAUDE.md is prose. It can describe norms, encode decisions, and explain reasoning. It cannot enforce.

It can say "All changes go through PR." It cannot prevent the agent from invoking git push directly to main. The push will happen if the harness allows it; only a PreToolUse hook on git push can actually block it.

It can say "Run lint before commit." It cannot run lint. Only a pre-commit hook (Husky or similar) can run lint.

It can say "Use TypeScript strict mode." It cannot make tsc strict. Only tsconfig.json can.

This is the single biggest confusion. Authors write a CLAUDE.md rule, see the agent follow it for a week, and conclude the rule is enforced. Then the agent has a different session — different context, different attention pressure, different temperature — and the rule is silently violated. The lesson is: every rule that matters needs a layer below CLAUDE.md doing the actual enforcement.

What hooks can do that CLAUDE.md cannot

Hooks are where the actual enforcement lives. Three patterns are common.

Block tool invocations. A PreToolUse hook on Bash can examine the command being run and exit non-zero to block it. The push-gate hook in many setups blocks git push origin main from a non-main branch unless a review step has run.

Inject context dynamically. A SessionStart hook can run a script that fetches the current branch, recent commits, open PRs, and inject a one-line summary into the session. This is how some teams give the agent a fresh briefing every session without bloating CLAUDE.md.

Validate post-action. A PostToolUse hook on Edit or Write can run a linter or typecheck on the changed file and inject the result. The agent sees the validation output as a system-reminder and can react in the same turn.

The catch with hooks is that they run on the user's machine, in the user's shell, with the user's environment. A hook script that assumes a tool is on PATH, that a directory exists, or that an environment variable is set will silently fail under different conditions. The robust pattern is to guard hook scripts with explicit checks and fallbacks.

Slash commands as scoped rule sets

Slash commands (skills) are where you scope rules to specific workflows. A skill is a markdown file with frontmatter and a body. The frontmatter declares triggers and constraints. The body is read into the agent's context when the skill is invoked.

This is how you say "for handoff workflows specifically, the file format must follow Schema A or B" without bloating CLAUDE.md with handoff-specific rules. The skill activates only when the workflow does. The rest of the time, the rules in the skill are dormant and do not consume context budget.

The 2026 trend is to move workflow-specific rules out of CLAUDE.md and into skills. CLAUDE.md becomes the always-loaded, project-wide rule set; skills become the workflow-loaded, scoped rule sets. This keeps CLAUDE.md short and lets specific workflows have richer rules without paying for them in every session.

Where the confusion comes from

Three patterns produce most of the "rules don't work" complaints.

Confusing CLAUDE.md with enforcement. Already covered above. CLAUDE.md describes; the harness enforces. If you don't have a hook or CI check below the rule, the rule is advisory.

Layered overrides without clarity. A rule in user-global CLAUDE.md is overridden by a contradicting rule in project-local CLAUDE.md, but the override is not always intended. The agent reads both files; if they contradict, the agent may pick either depending on phrasing. The fix is to make project-local rules say "overrides ~/CLAUDE.md § X" explicitly when overriding.

Hook scripts that fail silently. A SessionStart hook that errors does not crash the session — it just doesn't inject what it was supposed to inject. The session proceeds without the missing context. This produces "the agent didn't know X" bugs that are very hard to diagnose because nothing visible failed. The fix is to log hook failures somewhere persistent and check the log when behavior is surprising.

Putting it together

The mental model that keeps the layers straight: CLAUDE.md is the agent's reading material; settings.json + hooks are the agent's environment; skills are scoped briefings; environment constraints are the agent's hard limits. Rules live in the layer that has the right teeth. Process rules with hard consequences live in hooks. Workflow-specific rules live in skills. Behavioral norms live in CLAUDE.md. Hard access controls live in MCP permissions. Trying to enforce a rule from the wrong layer produces the silent failure mode that frustrates so many teams new to Claude Code.

If you are setting up a new project's rules, the order is: write ~/CLAUDE.md for global norms, write project CLAUDE.md for project-specific decisions, write hooks for enforcement of process rules, write skills for workflow-scoped rule sets, set MCP permissions for tool access. Run AgentLint to keep the CLAUDE.md files honest. The system feels intricate but the layering is consistent — and once the layering clicks, "Claude Code rules" stops being a vague phrase and becomes a specific architecture you can debug.

Originally posted on agentlint.app/blog/how-claude-code-rules-actually-work.

CLAUDE.md Best Practices, 2026

Mario — Mon, 27 Apr 2026 07:29:43 +0000

CLAUDE.md Best Practices, 2026

CLAUDE.md is two years old as a widely-used artifact and the practices around it have shifted significantly in the last twelve months. The 2024 practice was to dump everything into one long file and hope. The 2025 practice was to split by section and hope slightly less. The 2026 practice — emerging from a few hundred public projects, the dispatches from Hashimoto / OpenAI / LangChain on agent harness design, and the operational lessons from teams running Claude Code, Cursor, and Codex side by side — is starting to converge on something more disciplined. This is what those practices look like as of early 2026.

What changed since 2024

Three things changed that reshaped the practice. First, the agents themselves got better at reading less and inferring more, so the volume of "convention documentation" needed in CLAUDE.md dropped. Second, the agents got worse at handling contradictions and stale rules — partly because they trust the file more, so a stale rule corrupts more decisions. Third, multi-agent workflows became common, with the same project being touched by Claude Code, Cursor, and Codex within a single sprint, which means CLAUDE.md / AGENTS.md / .cursor/rules all have to align rather than each be a separate kingdom.

The practical effect is that CLAUDE.md is shorter than it was, more strict than it was, and increasingly authored alongside parallel files (AGENTS.md, .cursor/rules, .codex/instructions) that have to stay consistent.

The current consensus, in eight rules

These are the practices that pass the "more than half of the well-run projects I've seen do this" bar. They are not universal — but if you are picking which practices to adopt, these are the ones with the best return.

Rule one: keep it under 250 lines. Files above this length get skim-read by both humans and agents. The cap is not arbitrary; it tracks what an agent can absorb at session start without burning more than 5% of its context budget. Most projects that started 2026 above the cap have rewritten down.

Rule two: every rule must be falsifiable. "Write good code" is not falsifiable. "All async functions must have a timeout" is. The 2025 practice of soft directives ("consider doing X") has fallen out of favor — the agents either followed the soft rules as if they were hard, or ignored them entirely, and the indeterminacy was worse than either option. Make every rule a hard rule or delete it.

Rule three: pair every rule with its enforcement. A rule that lives only in CLAUDE.md is a wish. The 2026 practice is to have an explicit enforcement layer for every rule — a hook, a CI check, a linter rule, or an explicit "advisory only" tag. The advisory tag is important: it tells the agent (and contributors) that the rule is a preference rather than a guard, which calibrates how strictly to apply it.

Rule four: capture decisions, not behaviors. "Use TypeScript strict mode" describes a current behavior of the project — and the agent can read that from tsconfig.json. "We use TypeScript strict because we got burned by an any cascade in Q3 2024" captures a decision and its reasoning, which the agent cannot read from the code. The 2026 file leans toward the latter.

Rule five: align CLAUDE.md with parallel files. If the project has AGENTS.md, the two files must not contradict each other. The same applies to .cursor/rules, .codex/instructions, and any other harness file. The recommended approach is to make CLAUDE.md the canonical source and have the others either symlink, generate from it, or include it. Manual cross-file maintenance is the failure mode that ends every multi-tool project.

Rule six: explicit communication style. Specify what language the agent should communicate in, what tone to use, and how verbose to be. "English, terse, no preamble before tool calls" is a fully specified directive. Without this, the agent will pick a default that may not match the team's preference, and the cost compounds session by session.

Rule seven: maintain an explicit principles section. Three to seven principles, each with a one-sentence rationale. Principles are the rules that apply when no other rule applies. Without them, the agent's defaults take over, which may or may not match the project. With them, edge cases get handled the way you'd handle them.

Rule eight: review quarterly. A reminder set on a calendar, not a vague intention. The file drifts even when you are not actively editing it because the project around it changes. Quarterly review catches the drift before it becomes an emergency.

Anti-patterns the field has retired

Some practices that were common in 2024 are now considered actively harmful, and projects that still do them are fighting their own files.

The first is the kitchen-sink section — a "Coding Standards" section that runs 200 lines and tries to cover every conceivable pattern. These never get read in full and contain so many soft rules that the strict ones get diluted. The 2026 replacement is a 20-line "Language & Style" section that names only the rules a linter cannot enforce.

The second is the philosophy section — paragraphs about the team's values, mission, and design philosophy. These are pleasant to write and contribute nothing actionable. The 2026 practice is to omit them or move them to a separate values doc.

The third is the historical archive — sections explaining what the project used to do, what it was migrated from, what was tried and abandoned. This is useful for human onboarding and counter-productive for agents, who will sometimes treat the abandoned approach as a current option.

The fourth is the meta-section — rules about CLAUDE.md itself, like "keep this file under 200 lines" or "every rule should be falsifiable." These belong in your team's documentation about CLAUDE.md, not in the file the agent reads at session start. The agent does not benefit from being told the file's authoring rules.

The fifth is the duplicated rule — the same rule appearing in two sections because two people added it without seeing each other's edit. AgentLint catches these now, and most teams running it have removed the duplicates they had accumulated.

A 2026-shaped CLAUDE.md, in skeleton form

Here is the structure that has become the default for files written in 2026:

## Project (1 paragraph)
What it is. Stack. Audience.

## Non-negotiables (5-10 rules)
Hard process rules. Each falsifiable. Each enforced.

## Language & Style (5-15 rules)
Things the linter cannot enforce. Type policy. Imports.
Naming conventions the linter does not cover.

## Operational Notes (5-15 rules)
Surprising facts about the environment. Symlinks. Flaky CI steps.
Tokens that expire. Services that need warm-up.

## Communication (3-5 rules)
Language. Tone. Verbosity. Markdown vs plain.

## Principles (3-7 rules with rationale)
The rules that apply when no other rule applies.

## Enforcement (auto-generated or hand-maintained)
Mapping of rule → hook / CI step / "advisory only" tag.

This is the skeleton AgentLint's --init command emits. Project-specific rules go inside; the skeleton stays under 100 lines on its own.

Where 2027 might go

Two trends are visible enough to flag. The first is machine-readable CLAUDE.md — projects experimenting with structured frontmatter and YAML-tagged sections so the file can be parsed deterministically by tooling. AgentLint and similar linters benefit from this; agents may benefit from it eventually. The risk is that the file becomes a config file in markdown clothing, which loses the prose readability that made CLAUDE.md valuable in the first place.

The second is CLAUDE.md as a contract. Some projects are pinning their CLAUDE.md as part of the PR contract — every PR has to either pass against the current file or amend the file. This treats the file as a versioned commitment rather than a living document. Still early, mixed evidence on whether it is worth the friction.

For now, in 2026, the disciplined defaults are the ones above. Short. Falsifiable. Enforced. Aligned across tools. Reviewed quarterly. AgentLint as the late-mile sweep. The teams that do these things have CLAUDE.md files that visibly carry weight; the teams that do not have files that drift quietly into the dead-letter pile. Pick the side of the divide you want to be on.

Originally posted on agentlint.app/blog/claude-md-best-practices-2026.

"You Don't Need a CLAUDE.md" — A Rebuttal

Mario — Mon, 27 Apr 2026 07:29:08 +0000

"You Don't Need a CLAUDE.md" — A Rebuttal

There is a small but loud school of thought that says CLAUDE.md is unnecessary. The argument runs: if your code is well-written, the agent can figure out the conventions on its own. If your tooling is well-configured, the agent will pick up the rules from the linter and the CI. If your project structure is clear, the agent doesn't need a separate file telling it where things go. CLAUDE.md, in this view, is a workaround for badly-organized projects, and the right move is to fix the project rather than write a file describing how broken it is.

It is a real argument. It is also wrong, and the way it is wrong is worth understanding, because the correct response is not "you're stupid, write a CLAUDE.md." The correct response is to look at what CLAUDE.md actually does and notice that the alternative the rebuttal proposes does not exist.

What the rebuttal gets right

Start with what is true. A messy CLAUDE.md is often a symptom of a messy project. If your CLAUDE.md needs a 60-line section explaining how to run the dev server, your dev server setup is too complicated. If your CLAUDE.md has fifteen rules about how to format imports, your linter is not configured. If your CLAUDE.md describes a layered architecture you wish you had, that is fiction, not documentation.

The rebuttal correctly identifies that a lot of CLAUDE.md content is compensation for tooling gaps. Lint rules belong in the linter. Test conventions belong in the test runner config. Build commands belong in package.json scripts. If you find yourself writing prose where a config file would do, the prose is the wrong tool, and writing more of it just hides the problem.

The rebuttal also correctly notices that agents are getting better at picking up conventions from context. A 2024 agent needed to be told everything; a 2026 agent will read package.json, look at three files in src/, and infer most of the project conventions before you ask. So the marginal value of CLAUDE.md as "convention documentation" is genuinely lower than it used to be.

What the rebuttal gets wrong

But the rebuttal collapses CLAUDE.md to "convention documentation," and that is only one of three things the file does. It also encodes decisions the project has explicitly made that are not visible in the code, and it provides a stable boot sequence the agent reads at session start before it has touched the code at all. Neither of those is replaceable by good code or good tooling.

Decisions invisible in the code are the load-bearing case. The rule "we tried Redux, then Zustand, then settled on Jotai — do not propose moving to Valtio" is not in the code. The code uses Jotai. An agent reading the code learns that Jotai is in use. An agent reading CLAUDE.md learns that Jotai was a deliberate choice and that proposing to move state libraries again is off the table. Without the rule, the agent will eventually look at the project, decide Valtio is more idiomatic, and write a careful proposal for migration. The migration is not what you wanted. The proposal still cost you a session.

Similarly: "this codebase looks like a microservices project but is intentionally a monolith — do not propose splitting." Or: "we use feature flags from LaunchDarkly, not Statsig — both are present in old code, only the LaunchDarkly path is current." Or: "the audit log is required for every database write — do not optimize it away." None of these are deducible from the code. All of them save you sessions where the agent confidently proposes the wrong thing.

The session-boot case is even harder to replace. When an agent starts a new session, it has zero context about your project. It can read files, sure — but reading every file to figure out the conventions costs context budget that you wanted to spend on the actual task. CLAUDE.md is the cheap pre-commit briefing. Replacing CLAUDE.md with "let the agent figure it out" means every session pays the cost of figuring it out, and the figuring-out happens in your context window.

The "but the agent can read the linter" line

A common counterargument: the linter encodes the conventions, so the agent can read the linter config instead of CLAUDE.md. Sometimes this works. Often it does not.

Linter configs encode a subset of conventions that fit linter ergonomics. A linter can enforce "no any in TypeScript." A linter cannot enforce "if you find yourself reaching for any, propose a discriminated union before adding the cast." The first is a rule; the second is the reasoning behind the rule. Agents need both — the rule to follow and the reasoning to apply when an edge case shows up.

Linter configs also have terrible signal-to-noise ratio for an agent reading them cold. A typical eslint config is hundreds of lines of plugin extensions, rule overrides, and ignore patterns. The agent has to parse the entire thing to find the three rules that matter for the current task. A line in CLAUDE.md saying "TypeScript strict, no any, no implicit JSX returns" is about ten times faster to read.

And linters cover a narrow slice. Process rules (PR policy, commit conventions, testing requirements) are not in the linter. Operational notes (this path is a symlink, this CI step is flaky on Tuesdays) are not in the linter. Principles (don't add error handling for cases that can't happen) are not in the linter. The "linter config is enough" claim only holds if your project's rules happen to fit cleanly into linter format, which most don't.

The "but my project is small" line

Another version of the rebuttal: "my project is small enough that the agent can figure it out without help." This is a real edge case — for a 200-file repo with three contributors, the marginal value of CLAUDE.md is genuinely low. But notice the conditions: small, recent, single-language, single-stack, single-team. Most projects break at least two of those conditions within their first year, and the moment they do, the agent's "figure it out from context" approach starts failing in ways that are subtle and hard to debug.

The right policy is not "skip CLAUDE.md until you regret it." It is "write a 50-line CLAUDE.md from day one and grow it as patterns emerge." A 50-line file costs nothing to maintain and pays back the first time someone joins the project or the first time the agent needs to make a decision the code doesn't reveal.

What a CLAUDE.md skeptic should still do

If you are convinced CLAUDE.md is unnecessary, here is the minimal version even you should ship: one paragraph naming the project, three to five hard rules (no direct pushes, lockfile editing policy, test requirement), and one section listing decisions the team has explicitly made that aren't visible in the code. That is a 30-line file. It is not zero, and it is meaningfully different from no file at all.

What it gets you, even at 30 lines, is a place where future explicit decisions can land. The first time someone says "we decided not to use feature X — make sure the agent knows," there is a file to add the rule to. Without the file, the decision lives in someone's head and stays there until the agent proposes feature X six months later and surprises everyone.

The honest assessment

CLAUDE.md, badly done, is worse than no CLAUDE.md. A bloated, contradictory, stale file actively trains the agent to ignore the file. The rebuttal is correctly observing that many CLAUDE.md files in the wild are bad enough to be net-negative. The correct response is not to delete the file — it is to make the file good. Short, correct, machine-checkable, load-bearing.

A good CLAUDE.md is not a workaround for a badly-organized project. It is the explicit memory of a well-organized project, captured in a place the agent reads first. The alternative is letting that memory live only in the team's collective mind, where it drifts faster than the code does and is invisible to every agent that joins the project mid-stream. That is a worse position to be in, and it is the position the rebuttal accidentally argues for.

Write the file. Keep it short. Run AgentLint against it so it stays correct. The argument that you don't need it is, in practice, an argument that you don't need explicit institutional memory — and projects that try to operate without explicit memory always pay for it eventually.

Originally posted on agentlint.app/blog/you-dont-need-a-claude-md-rebuttal.

I Rewrote My CLAUDE.md From Scratch

Mario — Mon, 27 Apr 2026 07:21:07 +0000

I Rewrote My CLAUDE.md From Scratch

This is a first-person account of a CLAUDE.md rewrite. I had a 612-line file that I'd been adding to for fourteen months. The agent's behavior had been getting subtly worse for the last three of those — more clarifying questions, more ignored rules, more drift from the way I would have done things if I were doing them by hand. I decided the only fix was to delete the file and start over with a blank one. Here's what happened.

Why I didn't just edit it

The file had become a layered fossil. Rules from the project's first month, when we were still picking a stack. Rules from the migration to TypeScript. Rules from the post-incident reviews where someone had broken something and wanted a CLAUDE.md line to make sure nobody did it again. Rules from a refactor that never landed. Rules I had added in moments of frustration that I knew were rants more than directives.

I tried editing it twice. Both times I ended up making it worse — adding "exception" clauses that contradicted earlier rules, or rewording a vague rule into a slightly less vague rule that still said nothing falsifiable. The file was so deeply entangled with itself that pulling on any one thread loosened ten others. Editing it was strictly inferior to deleting it.

The blank file

rm CLAUDE.md && touch CLAUDE.md is satisfying in a way few git commands are. The repo had a clean slate at session start. The agent — Claude Code, in this case — would now read an empty file and operate from its training-data defaults. That meant for one session, every decision the agent made was a candidate signal: either the agent did something I didn't want, in which case I had a real rule to add, or the agent did something I wanted, in which case the rule was probably already implicit and didn't need to be written.

I gave myself one constraint: I would not write CLAUDE.md by hand. I would only write it as a reaction to specific agent behaviors over the next two weeks. Every line in the rewritten file would have a corresponding agent action that justified it. No aspirational rules. No "best practice" rules. Only "the agent did X and I didn't want X" rules.

Two weeks of observed behavior

I kept a running log in a scratch file. Every time the agent did something I would have done differently, I noted it. By the end of week one I had thirty-two notes; by the end of week two, fifty-one. They clustered into roughly six categories.

Process violations — the agent pushed something to main directly, or skipped a hook, or amended a published commit. Eight notes. These became the first section: non-negotiables.

Stack-specific drift — the agent reached for a library or pattern that the project had explicitly migrated away from. Ten notes. These became language and style rules.

Operational unawareness — the agent assumed something about the dev environment that wasn't true (a path, a command, a service availability). Twelve notes. These became operational notes.

Investigation depth misjudgment — the agent moved to action when I would have wanted more investigation, or vice versa. Six notes. These became principles.

Communication style — the agent over-explained or under-explained, used formal English when I wanted terse, or hedged where I wanted directness. Nine notes. These mostly became one paragraph in the style section.

False or stale assumptions about other tooling — the agent referenced features of CI or hooks that didn't exist, or assumed a CI step was running that wasn't. Six notes. These became enforcement rules and exposed three actual harness gaps that I fixed before writing the rules.

Drafting from the log

After two weeks I sat down with the log and turned it into a CLAUDE.md. The process was nearly mechanical: each cluster of notes became 3-8 rules, each rule had to point at a specific note from the log, and each rule had to be falsifiable. The first draft came in at 89 lines.

I then ran AgentLint against it. The linter flagged six issues — three vague rules, two missing thresholds, one stale tooling reference (I had written pnpm out of muscle memory but the project uses bun). Twenty minutes of edits and the file passed.

The rewritten CLAUDE.md was 102 lines. The original had been 612. The new one carried more weight, not less, because every line was tied to a specific behavior I'd watched the agent get wrong without that line.

What changed in the agent's behavior

The first session after committing the new CLAUDE.md, the agent felt different in a way I hadn't expected. It asked fewer clarifying questions on the kinds of decisions the rewrite had covered (where to put a new test file, what naming to use, whether to push directly), and it asked more thoughtful questions on the kinds of decisions the rewrite had deliberately left out (what should this new feature actually do, what's the user expectation, what edge cases matter). The agent had stopped wasting its budget on already-decided questions and was using it on the questions that actually needed me.

Over the next two weeks, the rate of "the agent did X and I didn't want X" notes dropped from about four per day to under one per day. The remaining notes were either genuine new patterns (which became new rules, slowly) or one-off oddities that didn't justify a rule. The file stabilized at about 110 lines.

What I'd do differently next time

Three things, looking back.

I should have started the log sooner. I waited until I deleted the file. The two weeks of observation could have happened with the bloated file still in place; I would have learned the same things and the empty-file phase could have been a single weekend. The deletion is the cathartic part, but the learning happens in the observation, not in the deletion.

I should have stripped the operational notes back even more. I included a few rules in the operational section that I had encountered exactly once. With more weeks of observation, those would have been cut. Rule of thumb: a rule needs to come up at least twice in the log before it earns a line.

I should have explicitly tagged the principles section. When I rewrote, I treated principles as load-bearing rules. They are, but they degrade differently than process rules — a stale principle stays plausible long after a stale path command has obviously broken. I now mark principles with a "last reviewed" date and revisit them quarterly.

The rewrite is the easy part

The deletion and rewrite were satisfying. The hard part is preventing the same fourteen-month accretion from happening again. My current discipline is: every CLAUDE.md edit must come with either a deletion of equal weight or a clear log entry justifying the addition. CLAUDE.md cannot grow during routine work. It can grow when a new pattern is observed, and it has to shrink when an old pattern stops mattering.

If your CLAUDE.md is over 300 lines and was last seriously rewritten more than six months ago, consider doing the same exercise. Two weeks of observation, one weekend of drafting, AgentLint as the final sweep. The shorter, sharper file is almost always better than the long, drifting one. The agent will tell you within a week whether the rewrite worked — and if it did, it will be one of the highest-leverage changes you made all quarter.

Originally posted on agentlint.app/blog/i-rewrote-my-claude-md-from-scratch.

Creating the Perfect CLAUDE.md

Mario — Mon, 27 Apr 2026 07:20:32 +0000

Creating the Perfect CLAUDE.md

The phrase "perfect CLAUDE.md" is a trap. There is no Platonic CLAUDE.md hovering above your project waiting to be transcribed. There is only the CLAUDE.md that fits your codebase, your team, your tooling, and your tolerance for ceremony — and the only useful definition of "perfect" is the one where the agent reads it once at the start of every session and behaves the way you would have behaved on a good day. This piece is about how to write that file, starting from a blank touch CLAUDE.md and ending with something that actually carries weight.

Why the blank-file version usually wins

When people ask "what should be in my CLAUDE.md," they often have a 400-line draft in front of them and want to know what to add. That is the wrong starting point. The drafts I've seen tend to bloat in two ways: copying lines from someone else's CLAUDE.md without checking whether they apply, or adding a rule every time something annoys the team without ever pruning. After a year, the file is half archeology and half wishful thinking.

Starting from a blank file forces a different question. Not "what could go in," but "what would the project actually break without?" That filter cuts the file by 70 percent on the first pass. The remaining 30 percent is usually closer to the real CLAUDE.md than any drafting-by-addition approach gets.

What "perfect" means operationally

A perfect CLAUDE.md, in practice, has three measurable properties. It is readable in one pass — somewhere between 80 and 250 lines, depending on project size, with no section over 50 lines. It is fully machine-checkable — every rule has either a check that runs in CI, a hook that runs locally, or a verifier that the agent can run on demand. And it is load-bearing — every rule, if removed, would change the agent's behavior in a way you'd notice within a week.

The last one is the strictest. If you remove a rule and nothing changes for two weeks, that rule was probably never doing anything. Either the behavior was already happening (rule redundant) or the agent was already ignoring it (rule advisory). Both cases are flags to delete.

The seven-step build

Here is the sequence I use when bootstrapping CLAUDE.md for a new project. It takes about an afternoon for a small repo and two days for a complex one.

Step one: write the project intro paragraph. One paragraph, 100 words or fewer. What this project does, who uses it, what stack runs it. No marketing. The agent uses this to disambiguate similar-sounding repos.

Step two: list the non-negotiables. What does the agent absolutely never do — push to main without a PR, edit lockfiles by hand, commit secrets, modify migrations after they're applied. Six to ten rules. Each one is a falsifiable, machine-checkable directive.

Step three: encode the language and style decisions. Which language(s) the project uses, which frameworks are blessed and which are off-limits, what naming conventions hold. Be specific. "Use TypeScript" is too vague — say "TypeScript with strict mode, no any, no implicit JSX returns." Three to six rules.

Step four: capture the operational gotchas. The path that's a symlink. The CI matrix that fails on Windows. The credential that expires every 90 days. The integration test that needs a real database. This is the section that pays back biggest, because each rule represents a specific bug avoided.

Step five: pick three to seven principles. Not aphorisms. Specific design rules with reasoning. "Don't add error handling for cases that can't happen" with a one-sentence rationale. "Trust internal code; validate at boundaries." Hard cap at seven; if you have ten, you have none.

Step six: wire enforcement. For every rule from steps two through four, add the corresponding hook, CI check, or verification command. If a rule cannot be enforced mechanically, mark it explicitly as advisory in the file — don't pretend it's enforceable.

Step seven: run AgentLint and fix what it flags. Stale paths, contradictions, vague rules, missing thresholds. The linter is the last-mile sweep that catches the kind of drift you can't see when you're the one who wrote the file.

What to leave out, deliberately

The single biggest improvement to most CLAUDE.md files is what gets cut, not what gets added. Some categories that almost always belong elsewhere:

Architecture documentation. CLAUDE.md is not the place for ADRs (architecture decision records), system diagrams, or design history. Put those in docs/ and link from CLAUDE.md if needed. Architecture docs read once at session start are mostly wasted context budget.

Onboarding instructions. "Run npm install then npm run dev" belongs in README. CLAUDE.md is for the agent's session, not for a human contributor's first day. The onboarding doc and the agent rules diverge fast.

Aspirational values. "We value collaboration." This isn't actionable. Either turn it into a specific behavioral rule or move it to a values doc.

Background on past decisions. "We tried Redux, then Zustand, then settled on Jotai." Interesting context for humans, dead weight for agents. If the rule is "use Jotai," that's all the agent needs.

Marketing copy or external pitches. If you find yourself writing for an audience other than the agent, you are writing the wrong file.

A worked example

Here is a CLAUDE.md from a real project, before the seven-step rewrite. The original file was 380 lines:

## Overview
We are a passionate team building the next generation of...
[180 lines of mission, history, philosophy]

## Coding Standards
- Write good, clean code
- Follow best practices
- Be careful with errors
- Test thoroughly
[60 lines of similar rules]

## Architecture
[100 lines of system diagrams in ASCII art]

## Onboarding
[40 lines of git clone / npm install instructions]

Same project, after seven-step rewrite. 142 lines:

## Project
A real-time collaboration tool. Users edit shared documents from
their browsers; conflict resolution is CRDT-based. Stack: TypeScript,
React, Yjs, Postgres, Redis, deployed on Fly.io.

## Non-negotiables
- All changes through PR — no direct pushes to main.
- Tests must pass before merge — Vitest unit + Playwright E2E.
- No edits to bun.lock by hand. Use `bun add` or `bun remove`.
- No secrets in commits — `.env*` is gitignored, secrets via 1Password.
- Migrations are append-only — never modify a migration after it's applied.

## Language & Style
- TypeScript strict mode. No `any`. No implicit JSX returns.
- Components are functional with hooks. No class components.
- State: server state via TanStack Query, client state via Zustand.
- File naming: kebab-case for files, PascalCase for components.
- Tests live next to source as `<name>.test.ts`.

## Operational notes
- `apps/web/` builds with Vite, `apps/api/` builds with esbuild.
- Local dev needs Docker (Postgres + Redis) — see `pnpm dev:up`.
- Playwright requires Playwright browsers installed: `pnpm exec playwright install`.
- CI matrix runs on Linux only — local Windows/macOS issues won't fail CI.

## Principles
- Don't add error handling for cases that can't happen.
- Trust internal code; validate at boundaries.
- Three identical lines is better than a premature abstraction.
- If a rule isn't enforced by a hook or CI, it's advisory.
- Comments explain WHY, never WHAT.
- When stuck, write less code, not more.

## Enforcement
- Pre-commit (Husky): lint, typecheck, format on staged files.
- Pre-push: full unit test suite.
- CI: lint + typecheck + unit + Playwright on PR.
- Required status checks: ci-lint, ci-test, ci-e2e.
- AgentLint runs in CI on changes to CLAUDE.md.

Same project, same intent, 60% smaller, 100% machine-checkable. The agent reads this file in under a second and knows exactly what world it is operating in.

Why this is worth the afternoon

The cost of a bad CLAUDE.md is silent. The agent makes 80 decisions per session you never see, and each of them is biased by the file. A 380-line CLAUDE.md full of vague advice doesn't crash; it just produces a slightly worse PR than the 142-line version would have. Multiply that by every session for a year and the gap is large. The file is the boot sequence for an autonomous program. Spending an afternoon getting it right pays back the moment the agent starts a new session — which it does dozens of times a week.

If you want a starting point, AgentLint ships a --init mode that generates a 70-line skeleton matching the seven-step structure above. Run it, fill in the blanks, run AgentLint against the result, and commit. That's the perfect CLAUDE.md for your project — the one that exists, is correct, and the agent actually reads.

Originally posted on agentlint.app/blog/creating-the-perfect-claude-md.

The 33 Checks Every CLAUDE.md Should Pass

Mario — Mon, 27 Apr 2026 07:14:50 +0000

The 33 Checks Every CLAUDE.md Should Pass

When we cataloged the failure modes of CLAUDE.md files at scale, we expected to find ten or fifteen. We ended up with thirty-three. They all share one property: each one is a specific, falsifiable problem the author probably didn't notice, and the agent silently routed around. This post walks through the thirty-three checks AgentLint runs, grouped by the kind of damage they cause, and explains why each one earned its place.

Why thirty-three and not five

The first version of the linter had nine checks. We thought that covered it. Then we ran it against the next thirty CLAUDE.md files we could find — open-source repos, our own internal projects, files contributors had emailed us — and watched the same patterns appear in shapes the original nine couldn't catch. Vague rules wearing the costume of specific rules. Specific rules contradicted by their own examples. Section headings that promised one thing and delivered another. By the time the catalog stabilized, it had thirty-three checks across five dimensions. Not because thirty-three is a magic number, but because that's where new checks stopped catching anything the old ones missed.

The five dimensions, in order of how much pain they cause when violated: correctness, specificity, structure, enforcement, and hygiene. Most CLAUDE.md files fail at specificity first. A few fail at correctness so badly that the agent ignores the file entirely.

Correctness checks (8 of 33)

These are the checks that ask: does the file say true things about the project as it exists right now?

The first is stale path references: every file path mentioned in CLAUDE.md must resolve. If the file says "see scripts/deploy.sh," scripts/deploy.sh has to exist. The check is a one-liner that greps every quoted path and shells out to test -e. It catches the most embarrassing class of drift: rules that reference files deleted six commits ago.

The second is stale command references: every shell command mentioned must be runnable. The check verifies syntax (does shellcheck parse it) and, where possible, runs --help to confirm the binary exists. Rules like "run bun test" silently break when the project migrates to npm.

The third is contradicting rules within the file: pairs of rules where one explicitly forbids what another explicitly requires. The checker uses a small NLI model on rule-shaped sentences and surfaces high-confidence contradictions. The most common pattern: an early rule says "all changes go through PR" and a later rule says "small fixes can go directly to main."

The fourth is CLAUDE.md vs README contradiction: the same project documented two ways in two files. CLAUDE.md says the build command is X, README says it's Y. One of them is wrong. The check picks both files apart, normalizes the claims, and flags conflicts.

The fifth is CLAUDE.md vs CI contradiction: rules whose enforcement is supposed to live in CI, but the CI file does not actually enforce. "Tests must pass before merge" — but the GitHub Actions workflow has no test step.

The sixth is language pinning: if the project uses TypeScript, the file should say so. If it uses Python with uv, the file should say uv not pip. The check parses package manager files (package.json, pyproject.toml, Cargo.toml) and cross-references against the rules.

The seventh is deprecated tooling: rules that mention tools the project has migrated away from. "Use lerna run" in a project that switched to bun workspaces. The check has a small registry of common migrations and surfaces likely cases.

The eighth is wrong default branch: rules referencing "master" when the repo's default is "main," or vice versa.

Specificity checks (10 of 33)

These check whether the rules are concrete enough that the agent can act on them.

Hedge-word density: the count of "should," "may," "consider," "if appropriate" per hundred words. Above a threshold, the file is advisory rather than directive. The fix is to rewrite hedges into specific rules.

Unfalsifiable claims: rules of the form "write good code" or "be careful." The check flags any imperative whose verb is too abstract to verify (good, careful, clean, maintainable, robust, proper).

Missing thresholds: rules that name a quality without quantifying it. "Tests should be fast" without a time budget. "Files shouldn't be too long" without a line limit.

Unverifiable claims: rules that refer to the agent's intentions ("understand the code") or feelings ("feel confident"). The agent cannot verify either.

Stack of synonyms: three or more near-synonyms in a single rule ("clear, concise, readable") which adds words but no specificity.

Unscoped "always" / "never": absolute rules without an exception clause when the codebase obviously has exceptions.

Missing examples: rules where a concrete example would compress the rule by half and double its compliance rate.

Vague references to "best practices": pointer-without-target rules. "Follow JavaScript best practices" — whose best practices, which year, which document.

Aspirational rules: rules that describe what the team wishes were true rather than what is true. "We always write tests first." If you don't, the rule is a lie.

Blanket prohibition without rationale: rules forbidding something without explaining why. The agent will follow the rule until it discovers an edge case that seems to violate the intent — at which point, with no rationale, the agent has to guess.

Structure checks (6 of 33)

These ask whether the file is shaped well enough that the agent can navigate it cheaply.

Section length cap: any single section over a configurable budget (default 80 lines). Long sections are read-skipped by both humans and agents.

Total length cap: the file as a whole. Default 300 lines. AgentLint warns at the cap and errors well above.

Required sections: configurable list of section headings the file must include. Defaults to the five sections from blog 01: project intro, how-we-work, language/style, operational notes, principles.

Heading hierarchy: H1 → H2 → H3 without skipping levels. Common drift: H1 then H4 because someone copy-pasted a snippet.

Duplicate headings: the same heading appearing twice usually means two people added the same section without seeing each other's edit.

Orphan paragraphs: text outside any section. The check flags paragraphs that appear before the first heading or after the last heading without belonging to either.

Enforcement checks (5 of 33)

These ask: for the rules that are supposed to be machine-enforced, does the enforcement actually exist?

Pre-commit hook coverage: rules that say "run lint before commit" need a .husky/pre-commit or equivalent that actually runs lint.

CI step coverage: rules that say "tests must pass" need a CI step that runs them, with required-status protection.

Lockfile rule coverage: if the file says "do not edit lockfiles by hand," there should be a CI guard or pre-commit that blocks lockfile edits outside npm install.

Secret-scanning rule coverage: if the file says "no secrets in commits," there should be a .gitignore covering .env* and a secret scanner in CI.

License-header rule coverage: if the file mandates license headers in source files, the check verifies a percentage of source files actually have them.

Hygiene checks (4 of 33)

The smallest dimension and the one with the lowest false-positive rate.

TODO without owner or date: TODO comments inside CLAUDE.md without a name or date eventually become permanent.

Trailing whitespace and mixed line endings: small but corrosive.

Inconsistent fence style: triple-backtick code fences with inconsistent language tags or fence widths.

Stale "as of" dates: rules with a date stamp ("as of 2024-12") that haven't been touched in over six months.

A worked example: one file, eleven failures

We ran the linter against an open-source repo with a 240-line CLAUDE.md. The report came back with eleven distinct findings. Three were correctness (one stale path, two stale commands), four were specificity (three hedge clusters and one unfalsifiable claim), two were structure (one section over the length cap, one heading hierarchy skip), one was enforcement (a "tests must pass" rule with no CI step), and one was hygiene (a stale "as of 2024-09" stamp). Fixing all eleven took the maintainer about forty minutes and removed about sixty lines from the file. The agent's behavior on the project, measured by a small benchmark of agent-completed PRs, became visibly more consistent the next week.

The point of the linter is not to score CLAUDE.md files. It is to remove the most common failure modes before the agent has to navigate them. A file that passes all thirty-three checks is not necessarily a great CLAUDE.md — but it is unlikely to silently mislead an agent. That is the bar.

Where the catalog goes next

Thirty-three is the current number, but we add checks as new patterns emerge. The next two we are testing are around AGENTS.md (which is becoming the multi-tool successor to CLAUDE.md) and around .cursor/rules files (which have their own failure modes specific to Cursor's rule-loading semantics). If you find a pattern that AgentLint doesn't catch and you think it should, the contribution path is one issue, one rule definition, one test fixture. The catalog grows by encountering files in the wild — and the more files we run it against, the better it gets.

Originally posted on agentlint.app/blog/the-33-checks-every-claude-md-should-pass.

Writing a Good CLAUDE.md

Mario — Mon, 27 Apr 2026 07:14:14 +0000

Writing a Good CLAUDE.md

If you have ever opened a CLAUDE.md written six months ago and immediately wished you hadn't, you already understand the problem. The file is supposed to be the durable memory of your project — the thing your AI agent reads at the start of every session before it touches a single line of code. In practice it tends to drift. Rules get added in moments of frustration, never reviewed. Sections grow into 300-line walls that nobody actually reads. Half the lines are aspirational, half are obsolete, and the agent quietly ignores the difference.

This piece is about what makes a CLAUDE.md actually work: how to structure it, what to put in, what to leave out, and why the shortest CLAUDE.md you can write that still encodes your real constraints will always outperform the long one you keep meaning to clean up.

What a CLAUDE.md is for

The framing that finally clicked for me came from a 2026 conversation where Mitchell Hashimoto, OpenAI, and the LangChain team converged on the same definition: an agent is a model plus a harness. The model is the LLM. The harness is everything that wraps it — tools, state, feedback loops, and the persistent rules it reads at session start. CLAUDE.md is one of the load-bearing pieces of that harness. So are AGENTS.md, .cursor/rules, your CI configuration, your pre-commit hooks, and your .gitignore. They all do the same thing: they tell the agent what world it is operating in, before the agent has to figure that out from scratch.

The reason this framing matters is that it changes what you optimize for. You are not writing a beautifully phrased manifesto. You are writing the boot sequence for an autonomous program. Every line is either load-bearing or it is dead weight. If a rule is not specific enough that the agent can act on it without asking a follow-up question, it will be ignored — and worse, the agent will quietly invent its own interpretation. Vague rules are not "safe defaults." They are silent licenses to drift.

What "good" looks like

A good CLAUDE.md has three properties, in this order: it is correct, it is readable in one session, and it is enforced by the harness around it.

Correctness comes first. If the file says "use bun" but your CI uses npm, the file is wrong. If the file says "all changes go through PR" but you push to main yourself when you are in a hurry, the file is wrong. Wrong rules are not just neutral — they actively train your agent to ignore the file, because the agent will see the contradiction with the actual project state and learn that CLAUDE.md is unreliable. The fastest way to make your agent stop following CLAUDE.md is to leave one obviously broken rule in it for two weeks.

Readability is the second constraint, and it is brutal. The agent reads CLAUDE.md at the start of every session, and the cost of reading is paid in context window. Every line you add to CLAUDE.md is a line that competes with the actual code, the actual diff, and the actual question you are trying to answer. A 600-line CLAUDE.md is a luxury you can rarely afford. A 150-line CLAUDE.md that encodes the same constraints is almost always achievable if you are willing to delete.

Enforcement is the third leg, and the one most people skip. A rule that lives only in CLAUDE.md is a wish. A rule that lives in CLAUDE.md and a Husky pre-commit hook, a GitHub Actions check, or a linter step in your CI pipeline is a fact. The agent will follow rules that are also enforced by the harness, because those rules have teeth — the agent learns very quickly that pushing code with console.log left in fails the pre-push check, and adjusts. Rules that exist only in prose are advisory. Use both layers.

The sections every CLAUDE.md needs

Across about a hundred CLAUDE.md files I've audited, the same five sections appear in every file that actually works. Names vary; the contents do not.

The first is what this project is — one paragraph, no marketing. What the codebase does, who it is for, what stack it runs on. The agent needs this to disambiguate similar-sounding files later.

The second is how we work — the small set of process rules that are non-negotiable. Branching strategy, commit conventions, PR policy, testing expectations. Six to ten lines. This is where the agent learns whether it can push directly, whether tests are mandatory, whether commits need to be atomic.

The third is language and style — what languages this project uses, what frameworks are off-limits, what naming conventions to follow. This is also where you encode language for communication: do you want the agent to talk to you in English, in Chinese, terse, verbose. Pick one and write it down.

The fourth is operational notes — the surprising facts about your environment. The path that looks normal but is actually a symlink. The CI check that fails on Windows. The token that expires every Tuesday. This is the section that saves you most often, because it captures the implicit knowledge a new contributor would otherwise have to discover by breaking something.

The fifth is principles — the small set of design rules you want the agent to internalize. Not "write good code." Specific principles like "don't add error handling for cases that can't happen" or "never modify lockfiles by hand." Three to seven principles, each two sentences max. If you have twenty, you have none.

What goes wrong

The most common failure I see is vagueness. A rule like "follow best practices" or "be careful with errors" is a wishful sticky note. The agent cannot act on it. Replace it with a specific, falsifiable rule: "wrap every external API call in a timeout of 5 seconds and retry once on transient errors." Now the agent knows exactly what to do, and you know how to verify it.

The second is contradiction. Two rules that disagree, often added months apart, sit in the same file. "All commits go through PR" and "you can push directly to main for typos" cannot both be true; the agent will pick one and ignore the other. AgentLint flags this kind of contradiction as a hard rule, because contradictions train the agent to treat the file as advisory.

The third is stale references. The file mentions a script that was deleted, a service that was renamed, a directory that was moved. Each stale reference is a small lie the agent has to discover and work around. Run a grep before every CLAUDE.md edit: every path, every command, every URL still resolves.

The fourth is the unbounded section. Someone starts a "Principles" list with five items. Six months later it has thirty. Half are duplicates of each other. None are enforced. Hard cap your principle count and treat it as a permanent budget — to add a new one, retire an old one.

The fifth is language drift. The file mixes English process language with Chinese commit messages with copy-pasted Slack threads. Pick one language for code and rules. Pick another (or the same) for communication. Be consistent. The agent's parser does not care, but a human reviewer will, and the inconsistency is a tell that nobody is curating the file.

A worked example

Here is a before/after I use in workshops. The "before" is a real CLAUDE.md from a real project, lightly anonymized:

## How we work
- We try to write good code
- Tests are important
- Use TypeScript when possible
- Don't break things
- All changes should be reviewed
- Commit often

Six bullets, zero falsifiable rules. "Try" is not a commitment. "Important" is not a behavior. "When possible" is the agent's permission slip to skip TypeScript. "Don't break things" is unactionable. "Should" is advisory.

The "after," same project, same intent:

## How we work
- All TypeScript. No new JavaScript files. Existing JS files migrate when touched.
- Vitest for unit, Playwright for integration. Both must pass before merge.
- Branch from main, open PR, squash-merge. No direct pushes to main.
- One logical change per commit. If you can't summarize the diff in one sentence, split it.
- Pre-push hook runs lint + typecheck + unit tests. Don't disable it.

Same six lines. Now the agent can act on every one of them, and you can write a check that detects violations.

Why the harness needs the linter

The reason we built AgentLint is that even careful authors drift. You write a good CLAUDE.md on Monday, ship a feature on Tuesday, add three rules in a frustrated session on Wednesday, and by Friday the file has a contradiction you didn't notice. The 33 checks AgentLint ships are the failure modes I kept seeing across audits — vague rules, contradictions, stale references, unbounded sections, language drift, missing enforcement, unverifiable claims. Each check cites the primary source it is checking against, so when the linter flags something, you can trace exactly why. The file does not need to be perfect. It needs to be not silently broken, and that is a checkable property.

If you are starting a CLAUDE.md from scratch, write the five sections above, keep it under 200 lines, and run AgentLint against it. If you have an existing one that has drifted, run AgentLint first and fix what it flags. The point is not to get a clean report — the point is to make sure the most expensive rule in the file, the one your agent reads every session, is actually saying what you want.

Originally posted on agentlint.app/blog/writing-a-good-claude-md.

Self-Hosting AutoSearch for Deep Research

Mario — Sun, 26 Apr 2026 07:27:57 +0000

Self-Hosting AutoSearch for Deep Research

The target keyword is self hosting AutoSearch deep research. The intent is operational: teams want to run open-source deep research infrastructure on their own systems, connect it to agent hosts through MCP, and keep control over source workflows. AutoSearch is designed for that shape with 40 channels, 10+ Chinese sources, and an LLM-decoupled architecture.

Self-hosting is not necessary for every user. It becomes attractive when teams care about deployment boundaries, observability, repeatable workflows, or integrating research into internal agent systems.

Why self-host

Self-hosting gives you control over where the tool runs, how it is monitored, and how agent hosts connect to it. For engineering teams, that can make MCP tooling easier to standardize. For research teams, it can make recurring tasks more reproducible.

It also preserves flexibility. AutoSearch handles retrieval, while the host chooses the model. If your model strategy changes, your source workflow does not need to be rebuilt.

Deployment shape

Start small. Install AutoSearch from install, connect one host, and verify a single research workflow. Then decide whether it should run on a developer machine, shared workstation, internal service, or managed environment.

The key is to keep the MCP boundary clear. The host calls AutoSearch. AutoSearch returns source material. The host synthesizes and acts.

Channel control

Self-hosted workflows should still route channels deliberately. The 40 channels include developer, academic, social, video, web, and Chinese sources such as Zhihu, WeChat, Xiaohongshu, Weibo, and Bilibili. Not every workflow needs all of them.

Define allowed source plans for common tasks: competitor scan, paper digest, Chinese product research, similar OSS discovery, and sentiment summary. This makes agent behavior easier to review.

Security notes

Do not print secrets into prompts, logs, or reports. Keep credentials in environment variables when a channel requires configuration. Treat source output as untrusted external content. The agent should summarize and cite it, not execute it.

LLM-decoupled architecture helps here because retrieval and reasoning remain separate. You can monitor tool calls independently from model output.

Install

Use MCP setup after installation, then test with a narrow task from examples. A good first test is one that needs both English and Chinese sources, because it shows why self-hosting a broad research tool is different from adding a simple web query. Self-hosting AutoSearch gives teams practical control over deep research without tying that control to one LLM.

Before expanding usage, decide what logs and metrics matter. Useful signals include tool-call volume, channel mix, failed requests, latency, and which workflows produce accepted outputs. Avoid storing sensitive prompt content unless your policy allows it. The point is to understand whether the retrieval tier is helping agents make better decisions. With that visibility, self-hosting becomes an operational practice rather than just a deployment preference.

Start narrow, measure honestly, and expand only where the research workflow proves useful.

That path keeps infrastructure work tied to visible agent outcomes instead of abstract platform preference.

It also makes later governance reviews easier because the team can show what changed.

Agent Host Retrieval Tier Pattern

Mario — Sun, 26 Apr 2026 07:27:21 +0000

Agent Host Retrieval Tier Pattern

The long-tail keyword for this article is agent host retrieval tier pattern. It describes a useful architecture for AI agents: keep retrieval as a clear tool boundary between the host and external sources. AutoSearch implements this pattern with open-source, MCP-native deep research across 40 channels, including 10+ Chinese sources, while the host remains responsible for model choice and task flow.

The retrieval tier should not be hidden inside a prompt. It should be something the agent can call, inspect, and verify.

Pattern overview

An agent host usually manages conversation, files, planning, permissions, and model calls. A retrieval tier manages source access. Separating those responsibilities prevents a single model prompt from becoming the place where every integration and behavior is buried.

AutoSearch fits as that retrieval tier. It can return source material from web, developer, academic, social, video, and Chinese channels. The host can then decide whether to summarize, ask follow-up questions, or change files.

Tool boundary

MCP gives the boundary a standard shape. Follow MCP setup, expose AutoSearch to the host, and write prompts that request evidence before conclusions. The host should know when it is using external sources and should show enough source context for review.

This boundary also keeps the stack LLM-decoupled. The retrieval tier does not care which model the host uses. The host does not need to implement every source connector itself.

Channel routing

Routing is where the pattern becomes useful. A technical question may need docs, GitHub, Hacker News, and papers. A China market question may need Zhihu, WeChat, Xiaohongshu, Weibo, and Bilibili. A product sentiment task may need Reddit, reviews, and official pages.

Use channels to make the source plan explicit. Querying everything creates noise. Querying the right channels creates evidence.

Failure modes

The main failure modes are overbroad prompts, weak source labeling, and synthesis that outruns evidence. Require the agent to name channels, preserve source types, and label uncertainty. If the answer affects code, run local verification. If it affects strategy, ask for missing evidence.

The examples page is useful for report structures that keep these checks visible.

Implementation

Start with install, connect AutoSearch, and test one agent task that needs outside context. The retrieval tier pattern works when source access is explicit, model reasoning is separate, and the human can inspect how the answer was built. AutoSearch gives that pattern a practical MCP-native implementation.

For production use, document the expected tool contract in plain language. The agent should know when to call retrieval, which source families are allowed for the task, how many results are enough, and what format evidence should return in. This small amount of structure avoids hidden coupling between prompts and source behavior. It also helps reviewers spot overreach. If an answer cites only fast social signals, it should not be accepted as product truth. If a code recommendation cites only community comments, it still needs docs or repository evidence before implementation.

That review rule is simple, but it prevents the most common agent failure: fluent synthesis from weak inputs.