<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: John Nichev</title>
    <description>The latest articles on Forem by John Nichev (@johnnichev).</description>
    <link>https://forem.com/johnnichev</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3866641%2F8639e0cb-f404-407e-bc9f-e8e1b7212087.jpg</url>
      <title>Forem: John Nichev</title>
      <link>https://forem.com/johnnichev</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/johnnichev"/>
    <language>en</language>
    <item>
      <title>I Spent Four Weeks Reading 200+ Sources on Context Engineering. Here's What I Built.</title>
      <dc:creator>John Nichev</dc:creator>
      <pubDate>Wed, 08 Apr 2026 22:13:17 +0000</pubDate>
      <link>https://forem.com/johnnichev/i-spent-four-weeks-reading-200-sources-on-context-engineering-heres-what-i-built-2fem</link>
      <guid>https://forem.com/johnnichev/i-spent-four-weeks-reading-200-sources-on-context-engineering-heres-what-i-built-2fem</guid>
      <description>&lt;p&gt;A launch post for &lt;a href="https://skills.nichevlabs.com" rel="noopener noreferrer"&gt;nv:context&lt;/a&gt;, a Claude Code skill that sets up context engineering for any repository in three minutes.&lt;/p&gt;

&lt;h2&gt;
  
  
  The wall I kept hitting
&lt;/h2&gt;

&lt;p&gt;I build production Python services with AI coding agents. Claude Code, Cursor, Copilot, the whole rotation. And no matter how carefully I wrote my &lt;a href="http://CLAUDE.md" rel="noopener noreferrer"&gt;CLAUDE.md&lt;/a&gt; files, I kept hitting the same wall: the agent would forget rules mid-session, run the wrong test command, or touch files it shouldn't.&lt;/p&gt;

&lt;p&gt;I did what most people do. I wrote longer &lt;a href="http://CLAUDE.md" rel="noopener noreferrer"&gt;CLAUDE.md&lt;/a&gt; files. Added more "don't do X" instructions. Tried &lt;code&gt;/init&lt;/code&gt;. Nothing clicked.&lt;/p&gt;

&lt;p&gt;Eventually I sat down to figure out why. Four weeks later, I had read 200+ sources on what the research calls &lt;strong&gt;context engineering&lt;/strong&gt;. The picture was clearer than I expected, and uglier. Here's the punchline:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Bad context doesn't just not help. It actively hurts.&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;ETH Zurich&lt;/strong&gt; found that auto-generated agent config files &lt;em&gt;reduce&lt;/em&gt; success by 3% and increase costs 20%+&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;METR&lt;/strong&gt; ran a controlled study on experienced developers and found they were &lt;strong&gt;19% slower&lt;/strong&gt; with AI tools when context was poorly managed, despite &lt;em&gt;feeling&lt;/em&gt; 24% faster&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;FlowHunt / LongMemEval&lt;/strong&gt; showed that a focused 300-token context outperforms an unfocused 113K-token context on the same task&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Dex Horthy&lt;/strong&gt; has shown that using 40% of the context window outperforms using 90%&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Anthropic and Manus&lt;/strong&gt; production data: below 60% context utilization is safe. At 70%, precision drops. At 85%, hallucinations begin.&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;

&lt;p&gt;The thing that shifted my thinking was Philipp Schmid's line:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"Most agent failures are not model failures. They are context failures."&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  The 8 laws that came out of it
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Less is more.&lt;/strong&gt; Every line in your context competes with the actual task for attention.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Landmines, not maps.&lt;/strong&gt; Document what agents can't discover by reading the code.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Commands beat prose.&lt;/strong&gt; One snippet showing &lt;code&gt;npm run test -- --coverage --maxWorkers=2&lt;/code&gt; beats three paragraphs.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Context is finite.&lt;/strong&gt; Frontier LLMs follow roughly 150 to 200 instructions consistently.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Progressive disclosure.&lt;/strong&gt; Layer it: root &lt;a href="http://CLAUDE.md" rel="noopener noreferrer"&gt;CLAUDE.md&lt;/a&gt;, subdirectory &lt;a href="http://CLAUDE.md" rel="noopener noreferrer"&gt;CLAUDE.md&lt;/a&gt;, skills, MCP tools.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hooks for determinism.&lt;/strong&gt; If a rule MUST be followed 100% of the time, use a hook.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Negative instructions backfire.&lt;/strong&gt; "Don't use moment.js" makes models more likely to use moment.js. Say "MUST use date-fns" instead.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Compact proactively.&lt;/strong&gt; Don't wait for Claude to compact at 95%. Update &lt;a href="http://HANDOFF.md" rel="noopener noreferrer"&gt;HANDOFF.md&lt;/a&gt;, run &lt;code&gt;/clear&lt;/code&gt;, start fresh.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  The hierarchy of leverage
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Priority  Layer                    Compliance  Cost to set up
───────────────────────────────────────────────────
   1      Verification             100%        Medium
   2      CLAUDE.md / AGENTS.md    90-95%      Low
   3      Hooks                    100%        Low
   4      Skills                   ~79%        Medium
   5      Subagent patterns        Variable    Medium
   6      Session management       Manual      Low
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Most people optimize from the bottom up. The best engineers start at the top.&lt;/p&gt;

&lt;h2&gt;
  
  
  What nv:context does
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Interviews you&lt;/strong&gt; about your tools, pain points, landmines, and workflow preferences&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Scans your codebase with parallel subagents&lt;/strong&gt; to find non-obvious patterns&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Scores your setup&lt;/strong&gt; on all six leverage layers (0-10 per layer, 0-60 overall)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Generates tailored configs&lt;/strong&gt; for only the tools you actually use&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Sets up hooks&lt;/strong&gt; for deterministic enforcement&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Creates session management infrastructure&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Installs compounding engineering&lt;/strong&gt; (optional GitHub Action)&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Production proof
&lt;/h2&gt;

&lt;h3&gt;
  
  
  selectools (Python SDK, 4,612 tests)
&lt;/h3&gt;

&lt;p&gt;Starting state: L3 maturity, 49/60 leverage score, 440-line &lt;a href="http://CLAUDE.md" rel="noopener noreferrer"&gt;CLAUDE.md&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;After: L5-L6, 58/60. &lt;a href="http://CLAUDE.md" rel="noopener noreferrer"&gt;CLAUDE.md&lt;/a&gt; went from 440 lines to 67 (-85%). Token budget dropped 53%.&lt;/p&gt;

&lt;h3&gt;
  
  
  nichevlabs (multi-product SaaS)
&lt;/h3&gt;

&lt;p&gt;Starting state: L4 maturity, 17/60 leverage score.&lt;/p&gt;

&lt;p&gt;The smoking gun: a 805-line &lt;a href="http://SESSION.md" rel="noopener noreferrer"&gt;SESSION.md&lt;/a&gt; that got loaded on every session start. 17,000 tokens. On every conversation. nv:context's token budget report made it impossible to ignore.&lt;/p&gt;

&lt;p&gt;After: L6, 49/60 (up 32 points). &lt;a href="http://SESSION.md" rel="noopener noreferrer"&gt;SESSION.md&lt;/a&gt; went from 805 lines to 59 (-93%). Saved 15,800 tokens per session. A parallel bug-hunt subagent surfaced 81 real bugs while it was analyzing the codebase.&lt;/p&gt;

&lt;h3&gt;
  
  
  sheriff (Python + TypeScript)
&lt;/h3&gt;

&lt;p&gt;Already-strong setup. L4 maturity, 36/60 leverage score going in.&lt;/p&gt;

&lt;p&gt;After: L5, 42/60 (+6 points). Smaller delta than the others. Incremental polish, not a rewrite.&lt;/p&gt;

&lt;p&gt;The through-line across all three: the skill is not a template generator. Same methodology, radically different outputs.&lt;/p&gt;

&lt;h2&gt;
  
  
  What it works with
&lt;/h2&gt;

&lt;p&gt;The generated &lt;a href="http://AGENTS.md" rel="noopener noreferrer"&gt;AGENTS.md&lt;/a&gt; is read by 25+ AI coding tools including Claude Code, Cursor, GitHub Copilot, Aider, Codeium, Continue, Windsurf, Zed, Gemini CLI, Cline. Tool-specific files only get generated for the tools you actually use.&lt;/p&gt;

&lt;h2&gt;
  
  
  Install
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx skills add johnnichev/nv-context &lt;span class="nt"&gt;-g&lt;/span&gt; &lt;span class="nt"&gt;-y&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then open any project and run &lt;code&gt;/nv-context&lt;/code&gt;. Three-minute interview, thirty seconds of parallel analysis, done.&lt;/p&gt;

&lt;h2&gt;
  
  
  The research library
&lt;/h2&gt;

&lt;p&gt;Full research library: &lt;a href="https://skills.nichevlabs.com/research" rel="noopener noreferrer"&gt;https://skills.nichevlabs.com/research&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Full synthesis (10 laws, 4 operations, 7-component context stack): &lt;a href="https://skills.nichevlabs.com/synthesis" rel="noopener noreferrer"&gt;https://skills.nichevlabs.com/synthesis&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Primary sources include Anthropic engineering blog, Google DeepMind research, OpenAI Agents docs, ETH Zurich agent config paper, METR controlled developer study, JetBrains NeurIPS 2025 paper, Manus production data, GitHub's analysis of 2,500 public &lt;a href="http://AGENTS.md" rel="noopener noreferrer"&gt;AGENTS.md&lt;/a&gt; files, Boris Cherny, and Dex Horthy.&lt;/p&gt;

&lt;h2&gt;
  
  
  Honest caveats
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Three production repos is a small sample.&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;60% token overhead on first run.&lt;/strong&gt; First-run benchmark: 100% pass rate vs 45.8% baseline.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Research coverage is Python and JavaScript heavy.&lt;/strong&gt; Rust, Go, Kotlin, and Elixir are thinner.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;The skill is opinionated.&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  If you build AI coding agents for a living
&lt;/h2&gt;

&lt;p&gt;Context engineering is the discipline that separates AI tools that work in demos from AI tools that work in production. If you have been writing longer &lt;a href="http://CLAUDE.md" rel="noopener noreferrer"&gt;CLAUDE.md&lt;/a&gt; files and things keep not quite working, try nv:context on your repo.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Repo: &lt;a href="https://github.com/johnnichev/nv-context" rel="noopener noreferrer"&gt;https://github.com/johnnichev/nv-context&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Landing: &lt;a href="https://skills.nichevlabs.com" rel="noopener noreferrer"&gt;https://skills.nichevlabs.com&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Install: &lt;code&gt;npx skills add johnnichev/nv-context -g -y&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Two things launch alongside nv:context today. First, &lt;a href="https://github.com/johnnichev/selectools" rel="noopener noreferrer"&gt;selectools&lt;/a&gt;, the Python agent framework I built that taught me I needed a methodology. Second, the landing page you just read about this methodology was built entirely with &lt;a href="https://github.com/johnnichev/nv-design" rel="noopener noreferrer"&gt;nv:design&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>llm</category>
      <category>ai</category>
      <category>claude</category>
      <category>agentskills</category>
    </item>
    <item>
      <title>SELECTOOLS: Multi-agent graphs, tool calling, RAG, 50 evaluators, PII redaction. All in one pip install.</title>
      <dc:creator>John Nichev</dc:creator>
      <pubDate>Tue, 07 Apr 2026 23:34:20 +0000</pubDate>
      <link>https://forem.com/johnnichev/selectools-multi-agent-graphs-tool-calling-rag-50-evaluators-pii-redaction-all-in-one-pip-bnm</link>
      <guid>https://forem.com/johnnichev/selectools-multi-agent-graphs-tool-calling-rag-50-evaluators-pii-redaction-all-in-one-pip-bnm</guid>
      <description>&lt;p&gt;Releasing v0.20.1 of &lt;a href="https://github.com/johnnichev/selectools" rel="noopener noreferrer"&gt;selectools&lt;/a&gt;, an open-source (Apache-2.0) Python framework for AI agent systems. Supports OpenAI, Anthropic, Gemini, and Ollama.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;selectools
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  The technical hook: how interrupts work after a human pause
&lt;/h3&gt;

&lt;p&gt;LangGraph's &lt;code&gt;interrupt()&lt;/code&gt; mechanism re-executes the entire node body on resume. This is by-design and falls out of LangGraph's checkpoint-replay model. The official guidance is to make pre-interrupt side effects idempotent, place expensive work after the &lt;code&gt;interrupt()&lt;/code&gt; call, or split side effects into a separate downstream node. It works, but every node that needs human input has to be structured around the resume semantics. It's a leaky abstraction.&lt;/p&gt;

&lt;p&gt;Selectools uses Python generators instead. The node yields an &lt;code&gt;InterruptRequest&lt;/code&gt;. The graph resumes at the exact yield point via &lt;code&gt;generator.send()&lt;/code&gt;. Expensive work runs exactly once, with no idempotency contortions required.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;review_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;analysis&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;expensive_llm_analysis&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;draft&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;  &lt;span class="c1"&gt;# runs once
&lt;/span&gt;    &lt;span class="n"&gt;decision&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;yield&lt;/span&gt; &lt;span class="nc"&gt;InterruptRequest&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Approve?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;analysis&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;approved&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;decision&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;yes&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# resumes here
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The v0.18.0 changelog documents the contrast directly: &lt;em&gt;"Resumes at the exact yield point (LangGraph restarts the whole node)."&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Multi-Agent Orchestration
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;AgentGraph&lt;/code&gt; is a directed graph executor for agent nodes. Routing is plain Python functions, no learned router, no DSL. This is deliberate: in production agent systems, you generally want deterministic control flow with LLMs doing the reasoning within nodes, not deciding the graph topology.&lt;/p&gt;

&lt;p&gt;Key design choices:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;ContextMode&lt;/strong&gt; controls what history flows between nodes: &lt;code&gt;LAST_MESSAGE&lt;/code&gt; (default), &lt;code&gt;LAST_N&lt;/code&gt;, &lt;code&gt;FULL&lt;/code&gt;, &lt;code&gt;SUMMARY&lt;/code&gt;, &lt;code&gt;CUSTOM&lt;/code&gt;. Prevents context explosion where downstream agents get drowned in irrelevant upstream conversation.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Parallel execution&lt;/strong&gt; with &lt;code&gt;MergePolicy&lt;/code&gt; (LAST_WINS, FIRST_WINS, APPEND) for fan-out/fan-in patterns.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Loop and stall detection&lt;/strong&gt; via state hashing. The graph tracks whether state is actually changing.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  SupervisorAgent
&lt;/h3&gt;

&lt;p&gt;Four coordination strategies:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Strategy&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;th&gt;Best for&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;plan_and_execute&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;LLM generates a JSON plan, agents execute sequentially&lt;/td&gt;
&lt;td&gt;Structured tasks&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;round_robin&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Agents take turns, supervisor checks completion each round&lt;/td&gt;
&lt;td&gt;Iterative refinement&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;dynamic&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;LLM router selects best agent per step&lt;/td&gt;
&lt;td&gt;Heterogeneous tasks&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;magentic&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Magentic-One: Task/Progress Ledgers + auto-replan&lt;/td&gt;
&lt;td&gt;Autonomous research&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The &lt;code&gt;magentic&lt;/code&gt; strategy implements the Magentic-One pattern from Microsoft Research. &lt;code&gt;ModelSplit&lt;/code&gt; lets you use expensive models for planning and cheap models for execution (70-90% cost reduction).&lt;/p&gt;

&lt;h3&gt;
  
  
  Built-in Eval Framework
&lt;/h3&gt;

&lt;p&gt;50 evaluators ship with the library (no paid service required): 30 deterministic + 20 LLM-as-judge. Plus A/B pairwise comparison, regression detection, JUnit XML for CI, and HTML reports.&lt;/p&gt;

&lt;h3&gt;
  
  
  Engineering Rigor: Autonomous Bug Hunts + Pre-Launch Security Audit
&lt;/h3&gt;

&lt;p&gt;The bug-hunting story is the part of this project I'm proudest of, and every claim below is in the public CHANGELOG.md.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;v0.19.1 Ralph Loop Bug Hunt.&lt;/strong&gt; Autonomous convergence system that runs 8 passes across all 7 modules until 3 consecutive clean passes. Result: ~90 bugs fixed and 254 new regression tests added (tests went from 2,664 to 2,918). Selected fixes from the changelog: &lt;code&gt;_tool_executor.py&lt;/code&gt; ThreadPoolExecutor singleton for deadlock prevention, &lt;code&gt;_provider_caller.py&lt;/code&gt; async observer events on LLM cache hits, &lt;code&gt;_openai_compat.py&lt;/code&gt; tool call deltas flushed after stream end (Ollama compat), &lt;code&gt;fallback.py&lt;/code&gt; mid-stream fallback corruption, &lt;code&gt;bm25.py&lt;/code&gt; atomic snapshot under lock for concurrent clear/add safety, &lt;code&gt;evals/llm_evaluators.py&lt;/code&gt; prompt injection fencing on user-controlled fields with &lt;code&gt;&amp;lt;&amp;lt;&amp;lt;BEGIN_USER_CONTENT&amp;gt;&amp;gt;&amp;gt;&lt;/code&gt; delimiters.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;v0.19.1 RAG Adversarial Bug Hunt.&lt;/strong&gt; Eight edge-case fixes including ChromaVectorStore &lt;code&gt;n_results&lt;/code&gt; clamping for empty collections, HybridSearcher &lt;code&gt;None&lt;/code&gt; handling for &lt;code&gt;vector_top_k&lt;/code&gt;/&lt;code&gt;keyword_top_k&lt;/code&gt;, ContextualChunker prompt template validation, PDFLoader &lt;code&gt;PdfReadError&lt;/code&gt; raised as &lt;code&gt;ValueError&lt;/code&gt; for encrypted PDFs, BM25 &lt;code&gt;top_k &amp;lt; 1&lt;/code&gt; immediate validation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pre-Launch 5-Agent Parallel Security Audit (v0.20.0).&lt;/strong&gt; 5 Claude subagents ran in parallel against the whole codebase, each focused on a different subsystem (concurrency, None guards, injection, path traversal, crash safety). 56 total findings, 9 critical security fixes shipped: score injection in eval extractors, ReDoS in custom regex, path traversal in &lt;code&gt;ToolLoader&lt;/code&gt;, Anthropic multi-tool message merging, Redis session key collision, async output guardrails, Redis/Supabase error handling. Full audit is published in &lt;code&gt;docs/SECURITY.md&lt;/code&gt; with every &lt;code&gt;# nosec&lt;/code&gt; annotation reviewed individually.&lt;/p&gt;

&lt;p&gt;Some of these patterns came from reading the LangChain, CrewAI, AutoGen, and LlamaIndex source while building the migration guides. The LangGraph HITL pattern (entire node restarts on resume) is the clearest example. Selectools uses Python generators instead, and the v0.18.0 changelog literally documents the contrast: &lt;em&gt;"Resumes at the exact yield point (LangGraph restarts the whole node)."&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Advanced Agent Patterns
&lt;/h3&gt;

&lt;p&gt;Four high-level patterns ship in &lt;code&gt;selectools.patterns&lt;/code&gt;:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Pattern&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;PlanAndExecuteAgent&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;LLM generates a plan, executes subtasks sequentially&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;ReflectiveAgent&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Self-critique loop with configurable quality threshold&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;DebateAgent&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Two-agent adversarial debate + synthesis&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;TeamLeadAgent&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Lead agent coordinates specialists with load balancing&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Enterprise Hardening
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Stability markers:&lt;/strong&gt; &lt;code&gt;@stable&lt;/code&gt;, &lt;code&gt;@beta&lt;/code&gt;, &lt;code&gt;@deprecated(since, replacement)&lt;/code&gt; decorators for public API signalling. Introspect via &lt;code&gt;obj.__stability__&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Trace HTML viewer:&lt;/strong&gt; &lt;code&gt;trace_to_html(trace)&lt;/code&gt; renders any &lt;code&gt;AgentTrace&lt;/code&gt; as a standalone waterfall HTML timeline. No JS framework, no external deps.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;SBOM:&lt;/strong&gt; &lt;code&gt;sbom.json&lt;/code&gt; (CycloneDX 1.6) with all core production dependencies.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Compatibility matrix:&lt;/strong&gt; Python 3.9-3.13 × provider SDK × optional deps in &lt;code&gt;docs/COMPATIBILITY.md&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Serve &amp;amp; Deploy
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;selectools serve agent.yaml&lt;/code&gt; starts a Starlette ASGI server with a playground UI. Define agents in YAML, pick from 5 templates (customer_support, data_analyst, research_assistant, code_reviewer, rag_chatbot). Production additions: PostgresCheckpointStore, TraceStore (3 backends), &lt;code&gt;compose()&lt;/code&gt; for tool chaining, &lt;code&gt;retry()&lt;/code&gt; / &lt;code&gt;cache_step()&lt;/code&gt; pipeline wrappers, type-safe step contracts, and streaming composition.&lt;/p&gt;

&lt;h3&gt;
  
  
  Tests + Coverage
&lt;/h3&gt;

&lt;p&gt;4,612 tests (95% coverage) across Python 3.9-3.13, with real-API evaluations against OpenAI, Anthropic, and Gemini. Includes 28 Hypothesis property-based tests, 15 thread-safety smoke tests (10 threads × 20 ops with synchronized start), and 16 production integration simulations.&lt;/p&gt;

&lt;h3&gt;
  
  
  Also new in v0.20.x
&lt;/h3&gt;

&lt;p&gt;An early visual agent graph builder at &lt;a href="https://selectools.dev/builder/" rel="noopener noreferrer"&gt;https://selectools.dev/builder/&lt;/a&gt; (49KB self-contained HTML, exports to YAML or Python). Works but I'm still polishing edges, so &lt;code&gt;pip install&lt;/code&gt; is the recommended path right now.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Links:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;GitHub: &lt;a href="https://github.com/johnnichev/selectools" rel="noopener noreferrer"&gt;https://github.com/johnnichev/selectools&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Quickstart: &lt;a href="https://selectools.dev/QUICKSTART/" rel="noopener noreferrer"&gt;https://selectools.dev/QUICKSTART/&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Changelog: &lt;a href="https://github.com/johnnichev/selectools/blob/main/CHANGELOG.md" rel="noopener noreferrer"&gt;https://github.com/johnnichev/selectools/blob/main/CHANGELOG.md&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;PyPI: &lt;code&gt;pip install selectools&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>python</category>
      <category>machinelearning</category>
      <category>agents</category>
    </item>
    <item>
      <title>Why I Built Selectools (and What I Learned Along the Way)</title>
      <dc:creator>John Nichev</dc:creator>
      <pubDate>Tue, 07 Apr 2026 23:22:56 +0000</pubDate>
      <link>https://forem.com/johnnichev/why-i-built-selectools-and-what-i-learned-along-the-way-59fd</link>
      <guid>https://forem.com/johnnichev/why-i-built-selectools-and-what-i-learned-along-the-way-59fd</guid>
      <description>&lt;p&gt;Every AI agent framework makes the same promise: "connect your LLM to tools and go." Then you start building.&lt;/p&gt;

&lt;p&gt;You discover that LangChain needs 5 packages to do what should take 1. That LCEL's &lt;code&gt;|&lt;/code&gt; operator hides a &lt;code&gt;Runnable&lt;/code&gt; protocol that breaks your debugger. That LangSmith costs money to see what your own code is doing. That when your agent graph pauses for human input, LangGraph restarts the entire node from scratch.&lt;/p&gt;

&lt;p&gt;I hit every one of these at work. We were building AI agents for real users, not demos, not prototypes, but production systems handling actual customer requests. The existing frameworks weren't built for this.&lt;/p&gt;

&lt;p&gt;So I built selectools.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I actually needed
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Tool calling that just works.&lt;/strong&gt; Define a function, the LLM calls it. No adapter layers, no schema gymnastics. Works the same across OpenAI, Anthropic, Gemini, and Ollama.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Traces without a SaaS.&lt;/strong&gt; Every &lt;code&gt;run()&lt;/code&gt; should tell me exactly what happened, which tools were called, why, how long each step took, what it cost. Not "sign up for our platform to see your own logs."&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Guardrails that ship with the agent.&lt;/strong&gt; PII detection, injection defense, topic blocking, configured once, enforced everywhere. Not a separate package to evaluate.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Multi-agent orchestration in plain Python.&lt;/strong&gt; When I need 3 agents to collaborate, I want Python routing functions. Not a state graph DSL, not a compile step, not Pregel channels.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;One command to deploy.&lt;/strong&gt; &lt;code&gt;selectools serve agent.yaml&lt;/code&gt; gives me HTTP endpoints, SSE streaming, and a chat playground. Not "install FastAPI, create an app, add routes, configure CORS, handle SSE..."&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  What selectools looks like today
&lt;/h2&gt;

&lt;p&gt;A 3-agent pipeline is one line:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;AgentGraph&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;chain&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;planner&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;writer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;reviewer&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Write a blog post&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A composable pipeline uses &lt;code&gt;|&lt;/code&gt; on plain functions:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;pipeline&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;summarize&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;translate&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="nb"&gt;format&lt;/span&gt;
&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pipeline&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Long article text...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Human-in-the-loop pauses at the yield point and resumes there:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;review&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;analysis&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;expensive_work&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# runs once, not twice
&lt;/span&gt;    &lt;span class="n"&gt;decision&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;yield&lt;/span&gt; &lt;span class="nc"&gt;InterruptRequest&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Approve?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;approved&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;decision&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;yes&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Deploy with one command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;selectools serve agent.yaml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  The numbers
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;4,612 tests at 95% coverage across Python 3.9-3.13&lt;/li&gt;
&lt;li&gt;9 critical security bugs fixed in a pre-launch audit (5-agent parallel bug hunt, 56 total findings)&lt;/li&gt;
&lt;li&gt;44 interactive module docs with runnable examples, stability badges, and Copy Markdown buttons&lt;/li&gt;
&lt;li&gt;40 real-API evaluations against OpenAI, Anthropic, and Gemini&lt;/li&gt;
&lt;li&gt;76 runnable examples&lt;/li&gt;
&lt;li&gt;50 built-in evaluators (no paid service needed)&lt;/li&gt;
&lt;li&gt;152 model definitions with pricing data&lt;/li&gt;
&lt;li&gt;Apache-2.0 license&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The latest milestone: a visual agent builder
&lt;/h2&gt;

&lt;p&gt;The newest addition is a visual agent builder that runs entirely in your browser. Drag and drop nodes, wire up edges, configure models and tools, then export to YAML or Python. It's deployed on GitHub Pages at &lt;a href="https://selectools.dev/builder/" rel="noopener noreferrer"&gt;https://selectools.dev/builder/&lt;/a&gt; with zero install required. No paid desktop app, no subscription. Just open the URL and start building.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I'd tell you honestly
&lt;/h2&gt;

&lt;p&gt;selectools is smaller than LangChain. The community is young. If you need 50 integrations and a managed platform today, LangChain is the safer bet.&lt;/p&gt;

&lt;p&gt;But if you want a library that stays out of your way, where routing is a Python function, errors are Python tracebacks, and you don't need a paid service to see what your agent did, give it a try.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;selectools
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;GitHub: &lt;a href="https://github.com/johnnichev/selectools" rel="noopener noreferrer"&gt;https://github.com/johnnichev/selectools&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Docs: &lt;a href="https://selectools.dev" rel="noopener noreferrer"&gt;https://selectools.dev&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Cookbook: &lt;a href="https://selectools.dev/COOKBOOK/" rel="noopener noreferrer"&gt;https://selectools.dev/COOKBOOK/&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>machinelearning</category>
      <category>python</category>
    </item>
  </channel>
</rss>
