<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Kacper Włodarczyk</title>
    <description>The latest articles on Forem by Kacper Włodarczyk (@deenuu1).</description>
    <link>https://forem.com/deenuu1</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F997289%2F50074490-7c28-44da-9a80-f389f20d3691.jpeg</url>
      <title>Forem: Kacper Włodarczyk</title>
      <link>https://forem.com/deenuu1</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/deenuu1"/>
    <language>en</language>
    <item>
      <title>24 Claude Code Skills to Fix Your AI Stack: Introducing production-stack-skills and content-skills</title>
      <dc:creator>Kacper Włodarczyk</dc:creator>
      <pubDate>Mon, 20 Apr 2026 11:59:05 +0000</pubDate>
      <link>https://forem.com/deenuu1/24-claude-code-skills-to-fix-your-ai-stack-introducing-production-stack-skills-and-content-skills-80m</link>
      <guid>https://forem.com/deenuu1/24-claude-code-skills-to-fix-your-ai-stack-introducing-production-stack-skills-and-content-skills-80m</guid>
      <description>&lt;p&gt;78% of Fortune 500 companies are adopting AI coding assistants. 45% of that generated code ships with security vulnerabilities. On the content side, 76% of readers identify AI-written text within three seconds, and engagement drops around 47% when they do.&lt;/p&gt;

&lt;p&gt;Those three numbers describe the same problem from two angles: &lt;strong&gt;AI outputs need guardrails, whether the output is code or writing.&lt;/strong&gt; Today we're shipping two Claude Code skill packs that sit on those guardrails.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;A Claude Code skill is a packaged slash command that augments your AI coding agent with a specific expertise, readable via &lt;code&gt;SKILL.md&lt;/code&gt; and invocable from any AGENTS.md-compatible runtime.&lt;/strong&gt; The Skills Wave is two of those packs, released the same day because the failure modes on both sides of the AI workflow deserve the same fix.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;I'm Kacper, AI Engineer at &lt;a href="https://vstorm.co" rel="noopener noreferrer"&gt;Vstorm&lt;/a&gt;, an Applied Agentic AI Engineering Consultancy. We've shipped 30+ production AI agent implementations and open-source our tooling at &lt;a href="https://github.com/vstorm-co" rel="noopener noreferrer"&gt;github.com/vstorm-co&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  production-stack-skills: 10 Claude Code Skills for Production-Ready AI Code
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;production-stack-skills is a 10-skill pack that audits AI-generated code across six weighted categories and hands back a 0 to 100 production-readiness score with a prioritized action plan.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The flagship command is &lt;code&gt;/production check&lt;/code&gt;. You point it at a repo, it reads the FastAPI routes, Postgres migrations, Dockerfiles, and config, and returns:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A headline score (e.g., "Production Readiness: 34/100")&lt;/li&gt;
&lt;li&gt;Six category scores: security, error handling, observability, deployment, data layer, code quality&lt;/li&gt;
&lt;li&gt;A Quick Wins section with point deltas&lt;/li&gt;
&lt;li&gt;An Action Plan sorted by weighted impact&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In first internal runs across client repos, Quick Wins alone consistently moves the score about 30 points. That's the number I'd quote if a CTO asked "what does running this for a morning actually buy me".&lt;/p&gt;

&lt;p&gt;The other nine skills split by surface area: &lt;code&gt;/production review&lt;/code&gt;, &lt;code&gt;/production planner&lt;/code&gt;, &lt;code&gt;/production fastapi&lt;/code&gt;, &lt;code&gt;/production postgres&lt;/code&gt;, &lt;code&gt;/production docker&lt;/code&gt;, &lt;code&gt;/production deploy&lt;/code&gt;, &lt;code&gt;/production monitoring&lt;/code&gt;, &lt;code&gt;/production security&lt;/code&gt;, &lt;code&gt;/production error-handling&lt;/code&gt;. Each is a focused slash command rather than a sub-mode of a monolithic agent.&lt;/p&gt;

&lt;h2&gt;
  
  
  content-skills: 14 Brand-First Skills That Kill AI Slop
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;content-skills is a 14-skill pack with a &lt;code&gt;/brand/&lt;/code&gt; directory at its core, so every piece of content generated after a five-minute brand interview reads your BRAND.md, VOICE.md, VISUAL.md, and voice samples before writing a word.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Run &lt;code&gt;/content setup&lt;/code&gt; once. Five questions. It writes &lt;code&gt;/brand/&lt;/code&gt;. From that moment, every content skill auto-reads that directory on every invocation.&lt;/p&gt;

&lt;p&gt;The exit point is &lt;code&gt;/content audit&lt;/code&gt;. Score any piece of content 0 to 100 on voice consistency, anti-slop markers, visual consistency, and brand alignment.&lt;/p&gt;

&lt;p&gt;Between setup and audit sit 12 production skills: strategy, calendar, blog, twitter, linkedin, reddit, hackernews, presentation, infographic, image, video, repurpose.&lt;/p&gt;

&lt;h2&gt;
  
  
  One Install, Dual-CLI, Uninstall-Safe
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-fsSL&lt;/span&gt; https://raw.githubusercontent.com/vstorm-co/production-stack-skills/main/install.sh | bash
curl &lt;span class="nt"&gt;-fsSL&lt;/span&gt; https://raw.githubusercontent.com/vstorm-co/content-skills/main/install.sh | bash
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each install mirrors the skills into both &lt;code&gt;~/.claude/&lt;/code&gt; (Claude Code) and &lt;code&gt;~/.agents/&lt;/code&gt; (Codex, Amp, and anything AGENTS.md-compatible). You don't pick the runtime up front. A skill written for Claude Code works identically in Codex.&lt;/p&gt;

&lt;p&gt;Uninstall is boring by design. &lt;code&gt;/content-skills uninstall&lt;/code&gt; removes the skills. Your &lt;code&gt;/brand/&lt;/code&gt; stays.&lt;/p&gt;

&lt;p&gt;Both repos are MIT.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Two Packs Shipped the Same Day
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;In 30+ production AI agent deployments at Vstorm, the failures cluster into two shapes: code that passes demo but fails the first prod incident, and content that sounds like the AI wrote it because the AI wrote it.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;One pack addresses the first. The other addresses the second. Both use the same architecture because the lesson that skills beat monolithic agents applies to both. You don't want "one AI that does everything". You want 24 small, composable, auditable slash commands that you can swap, tune, or remove when they stop pulling weight.&lt;/p&gt;

&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;production-stack-skills&lt;/strong&gt; ships 10 skills with a 0-100 scorer and Quick Wins section that typically moves scores +30 in under five minutes.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;content-skills&lt;/strong&gt; ships 14 skills built around a &lt;code&gt;/brand/&lt;/code&gt; directory auto-read on every invocation.&lt;/li&gt;
&lt;li&gt;Both packs install via one &lt;code&gt;curl&lt;/code&gt; command, mirror into &lt;code&gt;~/.claude/&lt;/code&gt; and &lt;code&gt;~/.agents/&lt;/code&gt;, work on Claude Code, Codex, and any AGENTS.md-compatible runtime.&lt;/li&gt;
&lt;li&gt;Skill-first architecture beats monolithic agents for auditability and local updates.&lt;/li&gt;
&lt;li&gt;Both repos are MIT.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Frequently Asked Questions
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What are Claude Code skills and how do I install them?
&lt;/h3&gt;

&lt;p&gt;Claude Code skills are packaged slash commands backed by a &lt;code&gt;SKILL.md&lt;/code&gt; file that extends a coding agent with specific expertise. You install a skill pack with a single &lt;code&gt;curl&lt;/code&gt; command that mirrors files into &lt;code&gt;~/.claude/&lt;/code&gt; and &lt;code&gt;~/.agents/&lt;/code&gt;. After install, slash commands become available immediately without restart.&lt;/p&gt;

&lt;h3&gt;
  
  
  What's the difference between production-stack-skills and content-skills?
&lt;/h3&gt;

&lt;p&gt;production-stack-skills audits and hardens AI-generated code across security, error handling, observability, deployment, data, and code quality, returning a 0-100 score. content-skills audits and produces on-brand content using a &lt;code&gt;/brand/&lt;/code&gt; directory you set up once, returning voice-consistency and anti-slop scores.&lt;/p&gt;

&lt;h3&gt;
  
  
  Do these skills work with Codex or only Claude Code?
&lt;/h3&gt;

&lt;p&gt;Both. Each install script mirrors files into both &lt;code&gt;~/.claude/&lt;/code&gt; and &lt;code&gt;~/.agents/&lt;/code&gt;, so the same skill works in Claude Code, Codex, Amp, and any AGENTS.md-compatible runtime without modification.&lt;/p&gt;

&lt;h3&gt;
  
  
  How long does &lt;code&gt;/production check&lt;/code&gt; take on a real repo?
&lt;/h3&gt;

&lt;p&gt;On a typical FastAPI + Postgres repo of a few thousand lines, about a minute. The Quick Wins section is what you act on first, usually under five minutes to apply.&lt;/p&gt;

&lt;h3&gt;
  
  
  When should I use a Claude Code skill instead of writing a full agent?
&lt;/h3&gt;

&lt;p&gt;Use a skill when the job is scoped expertise invoked explicitly (audit this, write a post in my voice). Use a full agent when the job is open-ended, multi-step, and requires planning across tools. Skills are composable building blocks; agents orchestrate them.&lt;/p&gt;

&lt;h2&gt;
  
  
  Get Started
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;production-stack-skills:&lt;/strong&gt; &lt;a href="https://github.com/vstorm-co/production-stack-skills" rel="noopener noreferrer"&gt;github.com/vstorm-co/production-stack-skills&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;content-skills:&lt;/strong&gt; &lt;a href="https://github.com/vstorm-co/content-skills" rel="noopener noreferrer"&gt;github.com/vstorm-co/content-skills&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Full writeup:&lt;/strong&gt; &lt;a href="https://oss.vstorm.co/blog/skills-wave-launch-claude-code-skills/" rel="noopener noreferrer"&gt;oss.vstorm.co/blog/skills-wave-launch-claude-code-skills&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Follow me on &lt;a href="https://www.linkedin.com/in/kacper-wlodarczyk/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt; for the follow-ups: Wednesday's production deep-dive, Thursday's content walkthrough, Friday's "8 Lessons from Shipping 24 Claude Code Skills".&lt;/p&gt;

</description>
      <category>claudecode</category>
      <category>ai</category>
      <category>python</category>
      <category>opensource</category>
    </item>
    <item>
      <title>Full Observability in AI Agents: What We Added to the pydantic-deepagents TUI</title>
      <dc:creator>Kacper Włodarczyk</dc:creator>
      <pubDate>Sat, 18 Apr 2026 09:58:43 +0000</pubDate>
      <link>https://forem.com/deenuu1/full-observability-in-ai-agents-what-we-added-to-the-pydantic-deepagents-tui-l02</link>
      <guid>https://forem.com/deenuu1/full-observability-in-ai-agents-what-we-added-to-the-pydantic-deepagents-tui-l02</guid>
      <description>&lt;h1&gt;
  
  
  Full Observability in AI Agents: What We Added to the pydantic-deepagents TUI
&lt;/h1&gt;

&lt;blockquote&gt;
&lt;p&gt;This is a cross-post. Canonical version: &lt;a href="https://oss.vstorm.co/blog/ai-agent-tui-observability-pydantic-deep/" rel="noopener noreferrer"&gt;oss.vstorm.co/blog/ai-agent-tui-observability-pydantic-deep/&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This week I covered how pydantic-deepagents handles stuck loops, context window blindness, and frictionless installation. Today: what you actually see when all of that runs.&lt;/p&gt;




&lt;h2&gt;
  
  
  The problem with invisible agents
&lt;/h2&gt;

&lt;p&gt;When an AI agent runs, a lot happens between your prompt and the response. The model reasons. It calls tools. Each action burns tokens and costs money. Without observability, you're flying blind — you can't debug, optimize, or trust what's happening.&lt;/p&gt;

&lt;p&gt;pydantic-deepagents v0.3.5 — the modular agent runtime for Python — reworks the TUI to surface everything.&lt;/p&gt;




&lt;h2&gt;
  
  
  What changed
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Per-turn token usage
&lt;/h3&gt;

&lt;p&gt;Below every assistant response:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;in:2.1K · out:412 · total:2.5K · reqs:3
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;in&lt;/code&gt; = input tokens, &lt;code&gt;out&lt;/code&gt; = output tokens, &lt;code&gt;total&lt;/code&gt; = turn total, &lt;code&gt;reqs&lt;/code&gt; = API calls in this turn.&lt;/p&gt;

&lt;h3&gt;
  
  
  Cumulative cost in the header
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;pydantic-deepagents  in:45K out:3K · $0.12
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Updates after each response. You always know the running total.&lt;/p&gt;

&lt;h3&gt;
  
  
  Thinking streamed live → collapsed
&lt;/h3&gt;

&lt;p&gt;Model reasoning appears as dimmed text while running. Collapses to a one-line summary when done. Watch the agent reason without drowning in it.&lt;/p&gt;

&lt;h3&gt;
  
  
  Side panel on startup
&lt;/h3&gt;

&lt;p&gt;Opens automatically when terminal ≥100 chars wide. Shows subagents before any task:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Subagents:
• planner (idle)
• research (idle)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Status updates as agents are delegated work.&lt;/p&gt;

&lt;h3&gt;
  
  
  All tool calls visible
&lt;/h3&gt;

&lt;p&gt;Todo tools (&lt;code&gt;read_todos&lt;/code&gt;, &lt;code&gt;write_todos&lt;/code&gt;, &lt;code&gt;add_todo&lt;/code&gt;, &lt;code&gt;update_todo_status&lt;/code&gt;, &lt;code&gt;remove_todo&lt;/code&gt;) were previously hidden. Now surfaced. Every agent action is visible.&lt;/p&gt;

&lt;h3&gt;
  
  
  Session saved on crash
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;_save_session()&lt;/code&gt; is now in a &lt;code&gt;finally&lt;/code&gt; block. Crash, exception, keyboard interrupt — &lt;code&gt;messages.json&lt;/code&gt; is always written. No more lost sessions.&lt;/p&gt;

&lt;h3&gt;
  
  
  Subagent logs: 20K chars (was 2K)
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;tool_log.jsonl&lt;/code&gt; now stores full subagent output. Critical for &lt;code&gt;/improve&lt;/code&gt; — the pipeline that extracts learnings from sessions (more on that tomorrow).&lt;/p&gt;




&lt;h2&gt;
  
  
  The full layout
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;┌─────────────────────────────────┬──────────────────┐
│ pydantic-deepagents  in:45K out:3K · $0.12         │
├─────────────────────────────────┼──────────────────┤
│ [thinking... dimmed text]       │ Subagents:       │
│ [collapsed to summary]          │ • planner (idle) │
│                                 │ • research (idle)│
│ Agent response here...          │                  │
│ in:2.1K · out:412 · $0.04       │                  │
└─────────────────────────────────┴──────────────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-fsSL&lt;/span&gt; https://oss.vstorm.co/install.sh | bash
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;GitHub: &lt;a href="https://github.com/vstorm-co/pydantic-deep" rel="noopener noreferrer"&gt;github.com/vstorm-co/pydantic-deep&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Observability is how you debug, optimize, and trust your agent. A black box is a liability.&lt;/p&gt;

&lt;p&gt;What's the first metric you check when debugging an agent run?&lt;/p&gt;

</description>
      <category>python</category>
      <category>ai</category>
      <category>devtools</category>
      <category>opensource</category>
    </item>
    <item>
      <title>Browser Automation + /improve: AI Agents That Browse the Web and Fix Themselves</title>
      <dc:creator>Kacper Włodarczyk</dc:creator>
      <pubDate>Fri, 17 Apr 2026 12:20:41 +0000</pubDate>
      <link>https://forem.com/deenuu1/browser-automation-improve-ai-agents-that-browse-the-web-and-fix-themselves-57i8</link>
      <guid>https://forem.com/deenuu1/browser-automation-improve-ai-agents-that-browse-the-web-and-fix-themselves-57i8</guid>
      <description>&lt;p&gt;This week I shipped 5 versions of pydantic-deepagents — the modular agent runtime for Python. Today: the two features that close the loop — browser automation and session-based self-improvement.&lt;/p&gt;

&lt;h2&gt;
  
  
  Part 1: BrowserCapability — 9 Playwright Tools
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="s1"&gt;'pydantic-deep[browser]'&lt;/span&gt;
playwright &lt;span class="nb"&gt;install &lt;/span&gt;chromium
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pydantic_deep.capabilities&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BrowserCapability&lt;/span&gt;

&lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;create_deep_agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;anthropic:claude-opus-4-6&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;extra_capabilities&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nc"&gt;BrowserCapability&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;allowed_domains&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;github.com&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;docs.python.org&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="n"&gt;auto_screenshot&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The 9 tools: &lt;code&gt;navigate&lt;/code&gt;, &lt;code&gt;click&lt;/code&gt;, &lt;code&gt;type_text&lt;/code&gt;, &lt;code&gt;get_text&lt;/code&gt;, &lt;code&gt;screenshot&lt;/code&gt;, &lt;code&gt;scroll&lt;/code&gt;, &lt;code&gt;go_back&lt;/code&gt;, &lt;code&gt;go_forward&lt;/code&gt;, &lt;code&gt;execute_js&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Safety design:&lt;/strong&gt; Single-tab (predictable state), domain allowlist (agent can't navigate outside allowed domains), automatic popup interception, content truncation to prevent context overflow.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Browser lifecycle:&lt;/strong&gt; Chromium starts before the agent run, stops after — whether the run succeeds, fails, or is cancelled. No orphaned processes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;CLI:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pydantic-deep tui &lt;span class="nt"&gt;--browser&lt;/span&gt; &lt;span class="nt"&gt;--browser-headed&lt;/span&gt;   &lt;span class="c"&gt;# visible window&lt;/span&gt;
pydantic-deep run &lt;span class="s2"&gt;"research X on GitHub"&lt;/span&gt; &lt;span class="nt"&gt;--browser&lt;/span&gt; &lt;span class="nt"&gt;--sandbox&lt;/span&gt; docker
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Bug fix:&lt;/strong&gt; Browser tools now force &lt;code&gt;kind='function'&lt;/code&gt; — they never trigger approval dialogs mid-task.&lt;/p&gt;

&lt;h2&gt;
  
  
  Part 2: /improve — Session-Based Self-Improvement
&lt;/h2&gt;

&lt;p&gt;After each session, &lt;code&gt;/improve&lt;/code&gt; analyzes the full run and extracts:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;UserFactInsight&lt;/code&gt;&lt;/strong&gt; — what the agent learned about you and your preferences&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;AgentLearningInsight&lt;/code&gt;&lt;/strong&gt; — strategies that worked, failure modes encountered&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Both write to MEMORY.md. Next session loads MEMORY.md automatically.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key finding:&lt;/strong&gt; We tested summaries vs raw tool traces as input to the synthesis step. Raw traces performed significantly better — summaries compress away the signal that matters. &lt;code&gt;/improve&lt;/code&gt; reads from &lt;code&gt;tool_log.jsonl&lt;/code&gt; (written per session), not from a summary.&lt;/p&gt;

&lt;p&gt;The loop: agent runs → &lt;code&gt;/improve&lt;/code&gt; extracts insights → MEMORY.md grows → next run starts smarter.&lt;/p&gt;

&lt;h2&gt;
  
  
  This Week's Full Stack
&lt;/h2&gt;

&lt;p&gt;Monday: StuckLoopDetection | Tuesday: LimitWarnerCapability | Wednesday: curl install | Thursday: Docker sandbox | &lt;strong&gt;Today: browser + /improve&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;An agent that detects loops, knows its context limits, installs in 30s, runs in Docker, browses the web, and learns from every session.&lt;/p&gt;

&lt;p&gt;Full breakdown: &lt;a href="https://oss.vstorm.co/blog/browser-automation-improve-ai-agents-pydantic-deep/" rel="noopener noreferrer"&gt;https://oss.vstorm.co/blog/browser-automation-improve-ai-agents-pydantic-deep/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;GitHub: &lt;a href="https://github.com/vstorm-co/pydantic-deep" rel="noopener noreferrer"&gt;https://github.com/vstorm-co/pydantic-deep&lt;/a&gt;&lt;/p&gt;

</description>
      <category>python</category>
      <category>playwright</category>
      <category>agents</category>
      <category>opensource</category>
    </item>
    <item>
      <title>curl | bash for AI Agents: One-Command Install for pydantic-deep</title>
      <dc:creator>Kacper Włodarczyk</dc:creator>
      <pubDate>Wed, 15 Apr 2026 11:57:47 +0000</pubDate>
      <link>https://forem.com/deenuu1/curl-bash-for-ai-agents-one-command-install-for-pydantic-deep-36jh</link>
      <guid>https://forem.com/deenuu1/curl-bash-for-ai-agents-one-command-install-for-pydantic-deep-36jh</guid>
      <description>&lt;p&gt;The standard Python AI tool install experience:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Install Python (which version?)&lt;/li&gt;
&lt;li&gt;Create a venv&lt;/li&gt;
&lt;li&gt;pip install&lt;/li&gt;
&lt;li&gt;&lt;code&gt;ModuleNotFoundError: No module named 'textual'&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;pip install again with correct extras&lt;/li&gt;
&lt;li&gt;Figure out PATH&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Seven steps. Fifteen minutes. That's before you've even seen the tool.&lt;/p&gt;

&lt;p&gt;We fixed this for &lt;a href="https://github.com/vstorm-co/pydantic-deep" rel="noopener noreferrer"&gt;pydantic-deep&lt;/a&gt; — the modular agent runtime for Python:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-fsSL&lt;/span&gt; https://raw.githubusercontent.com/vstorm-co/pydantic-deep/main/install.sh | bash
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it.&lt;/p&gt;




&lt;h2&gt;
  
  
  What install.sh does
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Detects whether &lt;code&gt;uv&lt;/code&gt; is installed&lt;/li&gt;
&lt;li&gt;If not: installs uv via the official Astral installer&lt;/li&gt;
&lt;li&gt;Runs &lt;code&gt;uv tool install "pydantic-deep[cli]"&lt;/code&gt; — isolated environment, binary available globally&lt;/li&gt;
&lt;li&gt;Verifies the install&lt;/li&gt;
&lt;li&gt;Prints PATH fix instructions if needed&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;No virtual environment management. No extras guessing. The &lt;code&gt;cli&lt;/code&gt; extras group includes everything including &lt;code&gt;textual&lt;/code&gt; (which was the original bug — it was missing from the base install).&lt;/p&gt;

&lt;h2&gt;
  
  
  Self-update
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pydantic-deep update
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Uses &lt;code&gt;uv tool upgrade&lt;/code&gt; if available, falls back to pip. One command to stay current.&lt;/p&gt;

&lt;h2&gt;
  
  
  Startup notifications
&lt;/h2&gt;

&lt;p&gt;Every invocation checks PyPI for updates silently (2-second timeout, 24-hour cache):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Update available: v0.3.6 → v0.3.7  Run: pydantic-deep update
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Never blocks startup.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why uv?
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;uv tool install&lt;/code&gt; is designed exactly for this use case — isolated tool environments, global binary access, no activation required. Fast, well-maintained, increasingly standard for Python CLI tooling.&lt;/p&gt;

&lt;p&gt;Alternatives considered: pipx (slower, needs separate install), Homebrew tap (maintenance overhead), native binary (too brittle for dynamic imports).&lt;/p&gt;




&lt;p&gt;Full write-up with implementation details: &lt;a href="https://oss.vstorm.co/blog/pydantic-deep-one-command-install-curl-bash/" rel="noopener noreferrer"&gt;oss.vstorm.co/blog/pydantic-deep-one-command-install-curl-bash&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;GitHub: &lt;a href="https://github.com/vstorm-co/pydantic-deep" rel="noopener noreferrer"&gt;github.com/vstorm-co/pydantic-deep&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;What's the worst install experience you've had with an AI/ML tool?&lt;/p&gt;

</description>
      <category>python</category>
      <category>opensource</category>
      <category>devex</category>
      <category>cli</category>
    </item>
    <item>
      <title>Context Window Blindness: Why Your AI Agent Doesn't Know It's Running Out of Space</title>
      <dc:creator>Kacper Włodarczyk</dc:creator>
      <pubDate>Tue, 14 Apr 2026 11:05:25 +0000</pubDate>
      <link>https://forem.com/deenuu1/context-window-blindness-why-your-ai-agent-doesnt-know-its-running-out-of-space-4dji</link>
      <guid>https://forem.com/deenuu1/context-window-blindness-why-your-ai-agent-doesnt-know-its-running-out-of-space-4dji</guid>
      <description>&lt;p&gt;On Monday I showed how agents waste tokens by getting stuck in loops — repeating the same tool call dozens of times, burning money on nothing. Today — a quieter problem that costs just as much, and is far harder to spot.&lt;/p&gt;

&lt;p&gt;Your AI agent has been going blind. It had no idea its context window was 90% full.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Problem: Two Different Realities
&lt;/h2&gt;

&lt;p&gt;When you run a long agent task, here's what you see: a status bar showing "Context: 87% used." It's right there in the TUI. You can see the agent is almost out of space.&lt;/p&gt;

&lt;p&gt;But the model can't see the status bar. It has no idea.&lt;/p&gt;

&lt;p&gt;From the model's perspective, every message it writes, every tool call it makes, every plan it sketches — all of that just continues normally. It has no signal that the conversation history is filling up. It keeps producing long responses, initiating multi-step plans, making tool calls that generate pages of output.&lt;/p&gt;

&lt;p&gt;Then at 90%: auto-compression kicks in. The model's working memory gets force-compressed. It loses the thread of what it did 40 messages ago. It starts contradicting its earlier decisions.&lt;/p&gt;

&lt;p&gt;This is context window blindness: the gap between what the user sees and what the model knows about its own situation.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Fix: LimitWarnerCapability
&lt;/h2&gt;

&lt;p&gt;In &lt;a href="https://github.com/vstorm-co/pydantic-deep" rel="noopener noreferrer"&gt;pydantic-deep v0.3.8&lt;/a&gt; — the modular agent runtime for Python — we added &lt;code&gt;LimitWarnerCapability&lt;/code&gt;. The solution: inject usage information directly into the conversation as a user message, at two thresholds.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;At ~70% usage:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;You are approaching the context limit. Begin wrapping up your current task. Avoid starting new complex subtasks.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;At ~85% usage:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;CRITICAL: Your context window is almost full. Use /compact NOW before continuing.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;These are injected as user messages — not system prompt modifications. The model treats them as authoritative input.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pydantic_deep&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;DeepAgent&lt;/span&gt;

&lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;DeepAgent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claude-opus-4-5&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;context_manager&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# default — enables LimitWarnerCapability
&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Auto-enabled by default. No configuration needed.&lt;/p&gt;

&lt;h2&gt;
  
  
  BM25 History Search
&lt;/h2&gt;

&lt;p&gt;We also rewrote &lt;code&gt;search_conversation_history&lt;/code&gt; from naive substring to BM25:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Before v0.3.8
&lt;/span&gt;&lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;msg&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;msg&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;history&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="c1"&gt;# After v0.3.8 — BM25 ranked, zero external deps
&lt;/span&gt;&lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;search_conversation_history&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;history&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;explain the authentication flow&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# Rare terms rank higher. Multi-word queries tokenized properly.
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Pure Python. Zero external dependencies. Standard Lucene BM25 formula.&lt;/p&gt;

&lt;h2&gt;
  
  
  EvictionCapability
&lt;/h2&gt;

&lt;p&gt;Large tool outputs are intercepted via the &lt;code&gt;after_tool_execute&lt;/code&gt; hook &lt;strong&gt;before&lt;/strong&gt; they enter message history — not trimmed after. The difference matters on long tasks.&lt;/p&gt;

&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Models have no intrinsic awareness of context usage — that info lives in the orchestration layer&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;LimitWarnerCapability&lt;/code&gt; bridges that gap with runtime user message injection at 70%/85%&lt;/li&gt;
&lt;li&gt;BM25 replaces naive substring search for conversation history&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;EvictionCapability&lt;/code&gt; prevents large outputs from entering history at all&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Full write-up: &lt;a href="https://oss.vstorm.co/blog/context-window-blindness-ai-agents-limit-warner/" rel="noopener noreferrer"&gt;oss.vstorm.co/blog/context-window-blindness-ai-agents-limit-warner&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;GitHub: &lt;a href="https://github.com/vstorm-co/pydantic-deep" rel="noopener noreferrer"&gt;github.com/vstorm-co/pydantic-deep&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Have you hit this? How did the hallucinations manifest?&lt;/p&gt;

</description>
      <category>python</category>
      <category>agents</category>
      <category>llm</category>
      <category>opensource</category>
    </item>
    <item>
      <title>StuckLoopDetection: How We Stopped an Agent Burning $12 on 47 Identical Calls</title>
      <dc:creator>Kacper Włodarczyk</dc:creator>
      <pubDate>Mon, 13 Apr 2026 12:20:49 +0000</pubDate>
      <link>https://forem.com/deenuu1/stuckloopdetection-how-we-stopped-an-agent-burning-12-on-47-identical-calls-52ac</link>
      <guid>https://forem.com/deenuu1/stuckloopdetection-how-we-stopped-an-agent-burning-12-on-47-identical-calls-52ac</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;TL;DR:&lt;/strong&gt; Most agent loops aren't model failures — they're mechanical repetitions that the model itself doesn't recognize. pydantic-deep v0.3.8 introduces StuckLoopDetection, a capability that catches three loop patterns before they waste tokens.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;em&gt;This is post 1/3 in the "Self-Aware Agents" series. &lt;a href="https://oss.vstorm.co/blog/pydantic-deep-two-weeks-five-versions/" rel="noopener noreferrer"&gt;Overview of all 5 releases here.&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;Here's the incident that made this necessary.&lt;/p&gt;

&lt;p&gt;A coding agent was working on a refactor task overnight. It hit a file with an unusual import pattern, couldn't parse the result, and defaulted to reading the file again.&lt;/p&gt;

&lt;p&gt;By morning: 47 calls to &lt;code&gt;read_file&lt;/code&gt; on the same path. $12 in API costs. Zero progress.&lt;/p&gt;

&lt;p&gt;The model wasn't broken. Each call looked locally reasonable. From outside: it was stuck.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Prompting Isn't Enough
&lt;/h2&gt;

&lt;p&gt;"Don't repeat tool calls" in a system prompt works sometimes. The problems:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The model often doesn't recognize loops as loops — each repeated call looks locally justified&lt;/li&gt;
&lt;li&gt;Prompt compliance degrades under cognitive load (long tasks, many tools, complex context)&lt;/li&gt;
&lt;li&gt;You have to add the instruction to every agent separately&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Detection at the capability level fixes all three.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Three Loop Patterns
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Pattern 1: Repeated Identical Calls
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="err"&gt;Turn&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;read_file(path=&lt;/span&gt;&lt;span class="s2"&gt;"src/config.json"&lt;/span&gt;&lt;span class="err"&gt;)&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="err"&gt;→&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"imports"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="err"&gt;...&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"unknown_field"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;...&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;Turn&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;13&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;read_file(path=&lt;/span&gt;&lt;span class="s2"&gt;"src/config.json"&lt;/span&gt;&lt;span class="err"&gt;)&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="err"&gt;→&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"imports"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="err"&gt;...&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"unknown_field"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;...&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;Turn&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;14&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;read_file(path=&lt;/span&gt;&lt;span class="s2"&gt;"src/config.json"&lt;/span&gt;&lt;span class="err"&gt;)&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="err"&gt;→&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;same&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;result&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Agent can't process the result, has no fallback, tries again. Default threshold: 3 calls.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pattern 2: A-B-A-B Alternating
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Turn 8:  list_directory(path="src/")
Turn 9:  read_file(path="src/main.py")
Turn 10: list_directory(path="src/")
Turn 11: read_file(path="src/main.py")
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Tool A suggests Tool B, Tool B suggests Tool A. Looks like progress — it's not.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pattern 3: No-Op Loops
&lt;/h3&gt;

&lt;p&gt;Same call, same result, keeps going. Common with writes, status checks, verification calls.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Implementation
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pydantic_deep&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;create_deep_agent&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pydantic_deep.capabilities&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;StuckLoopDetection&lt;/span&gt;

&lt;span class="c1"&gt;# Default: enabled with threshold=3
&lt;/span&gt;&lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;create_deep_agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;anthropic:claude-opus-4-6&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;stuck_loop_detection&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Custom config
&lt;/span&gt;&lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;create_deep_agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;anthropic:claude-opus-4-6&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;capabilities&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="nc"&gt;StuckLoopDetection&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;max_repeated&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;action&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;warn&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;   &lt;span class="c1"&gt;# "warn" = ModelRetry, "error" = StuckLoopError
&lt;/span&gt;        &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  action="warn" (default)
&lt;/h3&gt;

&lt;p&gt;Triggers &lt;code&gt;ModelRetry&lt;/code&gt;. The model gets a message:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;You have called read_file(path="src/config.json") 3 times with identical arguments
and received the same result. This indicates a stuck loop. Try a different approach.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Most of the time, the model pivots. If it doesn't — the threshold triggers again.&lt;/p&gt;

&lt;h3&gt;
  
  
  action="error"
&lt;/h3&gt;

&lt;p&gt;Raises &lt;code&gt;StuckLoopError&lt;/code&gt;. Clean failure for automated pipelines.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pydantic_deep.capabilities&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;StuckLoopDetection&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;StuckLoopError&lt;/span&gt;

&lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;refactor the imports in src/&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="n"&gt;StuckLoopError&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Agent got stuck: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pattern&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; pattern detected&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Per-Run Isolation
&lt;/h3&gt;

&lt;p&gt;Parallel &lt;code&gt;agent.run()&lt;/code&gt; calls don't share stuck-detection state. Each run is isolated via &lt;code&gt;for_run()&lt;/code&gt; — no leaked state between concurrent tasks.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Safe to run concurrently with a shared agent instance
&lt;/span&gt;&lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;gather&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;analyze src/module_a.py&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;analyze src/module_b.py&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  The Business Case
&lt;/h2&gt;

&lt;p&gt;A 47-call loop at Claude Opus pricing: ~$12. Same task with detection: ~$0.50 + one ModelRetry.&lt;/p&gt;

&lt;p&gt;Cost of &lt;code&gt;stuck_loop_detection=True&lt;/code&gt;: zero API calls, negligible latency, enabled by default.&lt;/p&gt;

&lt;p&gt;Even false positives are cheap: one ModelRetry message, then the model tries a different approach.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Tomorrow:&lt;/strong&gt; LimitWarnerCapability — teaching agents to know their context window is almost full.&lt;/p&gt;

&lt;h2&gt;
  
  
  Get Started
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-fsSL&lt;/span&gt; https://raw.githubusercontent.com/vstorm-co/pydantic-deep/main/install.sh | bash
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;GitHub: &lt;a href="https://github.com/vstorm-co/pydantic-deep" rel="noopener noreferrer"&gt;github.com/vstorm-co/pydantic-deep&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;OSS portal: &lt;a href="https://oss.vstorm.co" rel="noopener noreferrer"&gt;oss.vstorm.co&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>programming</category>
      <category>ai</category>
      <category>python</category>
      <category>softwareengineering</category>
    </item>
    <item>
      <title>Pydantic Deep Agents 0.3.3: ACP, Thinking, Lifecycle Hooks, and Opinionated Defaults</title>
      <dc:creator>Kacper Włodarczyk</dc:creator>
      <pubDate>Thu, 02 Apr 2026 15:29:56 +0000</pubDate>
      <link>https://forem.com/deenuu1/pydantic-deep-agents-033-acp-thinking-lifecycle-hooks-and-opinionated-defaults-32g4</link>
      <guid>https://forem.com/deenuu1/pydantic-deep-agents-033-acp-thinking-lifecycle-hooks-and-opinionated-defaults-32g4</guid>
      <description>&lt;p&gt;We just released &lt;a href="https://github.com/vstorm-co/pydantic-deepagents" rel="noopener noreferrer"&gt;pydantic-deep 0.3.3&lt;/a&gt; — and this is the biggest release since we open-sourced the project. ACP support, deep subagents by default, thinking, Anthropic caching, lifecycle hooks, skills as slash commands, and a provider setup wizard.&lt;/p&gt;

&lt;p&gt;Let's walk through the changes.&lt;/p&gt;

&lt;h2&gt;
  
  
  ACP: Your Agent in Any Editor
&lt;/h2&gt;

&lt;p&gt;ACP (Agent Client Protocol) is a standardized protocol that lets AI agents run inside editors. Think of it like LSP, but for AI agents instead of language servers.&lt;/p&gt;

&lt;p&gt;Our new &lt;code&gt;apps/acp/&lt;/code&gt; adapter exposes any pydantic-deep agent as an ACP-compatible server. The same agent you run in your terminal now runs in Zed with zero code changes.&lt;/p&gt;

&lt;p&gt;What you get:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Streaming text deltas&lt;/strong&gt; — real-time output, not waiting for completion&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tool call visibility&lt;/strong&gt; — see tool names, arguments, and results (not a black box)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Model switching&lt;/strong&gt; — change from Claude to GPT mid-session&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Session management&lt;/strong&gt; — conversation persistence across editor restarts&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Auto-detect provider&lt;/strong&gt; — reads your API keys, no manual config&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The adapter wraps the same &lt;code&gt;create_deep_agent()&lt;/code&gt; you already use. No new API to learn.&lt;/p&gt;

&lt;p&gt;Because ACP is a protocol (not a Zed-specific plugin), it'll work with any editor that adopts it. We expect more editors to support ACP in the coming months.&lt;/p&gt;

&lt;h2&gt;
  
  
  Subagents Are Now Deep Agents by Default
&lt;/h2&gt;

&lt;p&gt;This is the change that affects the most users.&lt;/p&gt;

&lt;p&gt;Previously, subagents were plain pydantic-ai Agents — lightweight, but limited. They couldn't read files, search the web, or remember things between runs.&lt;/p&gt;

&lt;p&gt;Now, every subagent (built-in and custom) is created via &lt;code&gt;create_deep_agent()&lt;/code&gt; with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Filesystem access (read, write, edit, grep, glob)&lt;/li&gt;
&lt;li&gt;Web search and web fetch&lt;/li&gt;
&lt;li&gt;Persistent memory&lt;/li&gt;
&lt;li&gt;Large output eviction (auto-save to files when output exceeds 20K tokens)&lt;/li&gt;
&lt;li&gt;Orphaned tool call patching&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If your custom subagent doesn't specify &lt;code&gt;agent&lt;/code&gt; or &lt;code&gt;agent_factory&lt;/code&gt;, it automatically gets the full deep agent factory. You don't have to change anything — your subagents just got more capable.&lt;/p&gt;

&lt;p&gt;We also replaced &lt;code&gt;include_general_purpose_subagent&lt;/code&gt; with &lt;code&gt;include_builtin_subagents&lt;/code&gt;, which adds a "research" deep agent for codebase exploration and web research.&lt;/p&gt;

&lt;h2&gt;
  
  
  Thinking Enabled by Default
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;thinking="high"&lt;/code&gt; is now the default. This enables model reasoning via pydantic-ai's &lt;code&gt;Thinking&lt;/code&gt; capability.&lt;/p&gt;

&lt;p&gt;We support 7 levels:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;create_deep_agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;anthropic:claude-opus-4-6&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;thinking&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;high&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# default
&lt;/span&gt;    &lt;span class="c1"&gt;# Options: True, False, "minimal", "low", "medium", "high", "xhigh"
&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For models that don't support thinking (like GPT-4.1), the parameter is silently ignored.&lt;/p&gt;

&lt;h2&gt;
  
  
  Anthropic Prompt Caching — On by Default
&lt;/h2&gt;

&lt;p&gt;Three new defaults enabled automatically:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;anthropic_cache_instructions&lt;/code&gt; — cache system prompt&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;anthropic_cache_tool_definitions&lt;/code&gt; — cache tool schemas&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;anthropic_cache_messages&lt;/code&gt; — cache conversation history&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This reduces token costs and latency for Anthropic models significantly. For non-Anthropic models, these settings are silently ignored.&lt;/p&gt;

&lt;h2&gt;
  
  
  5 New Lifecycle Hooks
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;HookEvent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Enum&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;BEFORE_RUN&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;before_run&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;AFTER_RUN&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;after_run&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;RUN_ERROR&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;run_error&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;BEFORE_MODEL_REQUEST&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;before_model_request&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;AFTER_MODEL_REQUEST&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;after_model_request&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;These map directly to pydantic-ai's lifecycle hooks. Use cases:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Session tracking&lt;/strong&gt; — log when an agent run starts and ends&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LLM call logging&lt;/strong&gt; — capture every model request for debugging&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Error alerts&lt;/strong&gt; — get notified when a run fails&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cost monitoring&lt;/strong&gt; — track token usage per request&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Skills as Slash Commands
&lt;/h2&gt;

&lt;p&gt;Skills now work as slash commands in the CLI. Type &lt;code&gt;/code-review&lt;/code&gt; and the skill activates directly from the picker.&lt;/p&gt;

&lt;p&gt;Discovery follows a 3-tier hierarchy:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Built-in&lt;/strong&gt; (&lt;code&gt;apps/cli/skills/&lt;/code&gt;) — ships with pydantic-deep&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;User&lt;/strong&gt; (&lt;code&gt;~/.pydantic-deep/skills/&lt;/code&gt;) — your personal skills&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Project&lt;/strong&gt; (&lt;code&gt;.pydantic-deep/skills/&lt;/code&gt;) — project-specific skills&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Later sources override earlier ones by name, so you can customize built-in skills per project.&lt;/p&gt;

&lt;h2&gt;
  
  
  Other Notable Changes
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;compact_conversation&lt;/code&gt; tool&lt;/strong&gt; — the agent can manually trigger context compression with an optional focus topic&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Provider setup wizard&lt;/strong&gt; — first-run auto-detects missing API keys and guides through provider selection&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;/provider&lt;/code&gt; slash command&lt;/strong&gt; — switch AI provider and model mid-session&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;/config&lt;/code&gt; slash command&lt;/strong&gt; — view and change settings interactively&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;approve_tools&lt;/code&gt; config&lt;/strong&gt; — choose which tools need user approval (default: &lt;code&gt;["execute"]&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Enhanced BASE_PROMPT&lt;/strong&gt; — Claude Code-inspired sections for code quality, careful execution, and formatting&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Context files simplified&lt;/strong&gt; to &lt;code&gt;AGENTS.md&lt;/code&gt; and &lt;code&gt;SOUL.md&lt;/code&gt; only&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Opinionated Defaults
&lt;/h2&gt;

&lt;p&gt;The philosophy behind 0.3.3: &lt;strong&gt;make the powerful thing the default thing.&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Setting&lt;/th&gt;
&lt;th&gt;Before&lt;/th&gt;
&lt;th&gt;After&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Memory&lt;/td&gt;
&lt;td&gt;off&lt;/td&gt;
&lt;td&gt;on&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Thinking&lt;/td&gt;
&lt;td&gt;off&lt;/td&gt;
&lt;td&gt;&lt;code&gt;"high"&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Prompt caching&lt;/td&gt;
&lt;td&gt;off&lt;/td&gt;
&lt;td&gt;on (Anthropic)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Subagent type&lt;/td&gt;
&lt;td&gt;plain Agent&lt;/td&gt;
&lt;td&gt;deep agent&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Max nesting depth&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Eviction limit&lt;/td&gt;
&lt;td&gt;none&lt;/td&gt;
&lt;td&gt;20K tokens&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Patch tool calls&lt;/td&gt;
&lt;td&gt;off&lt;/td&gt;
&lt;td&gt;on&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;You get a capable, production-ready agent with &lt;code&gt;create_deep_agent("anthropic:claude-opus-4-6")&lt;/code&gt;. No configuration needed.&lt;/p&gt;

&lt;h2&gt;
  
  
  Get Started
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;pydantic-deep-agents
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or try the CLI:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;pydantic-deep-agents[cli]
pydantic-deep
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Full changelog: &lt;a href="https://github.com/vstorm-co/pydantic-deepagents/blob/main/CHANGELOG.md" rel="noopener noreferrer"&gt;CHANGELOG.md&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Repo: &lt;a href="https://github.com/vstorm-co/pydantic-deepagents" rel="noopener noreferrer"&gt;github.com/vstorm-co/pydantic-deepagents&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;We build open-source AI agent tooling at &lt;a href="https://vstorm.co" rel="noopener noreferrer"&gt;Vstorm&lt;/a&gt;. pydantic-deep is our framework — modular, type-safe, production-tested across 30+ deployments.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>opensource</category>
      <category>python</category>
    </item>
    <item>
      <title>Pydantic AI Capabilities, Hooks &amp; Agent Specs - What Changed and How Our Libraries Migrated</title>
      <dc:creator>Kacper Włodarczyk</dc:creator>
      <pubDate>Mon, 30 Mar 2026 14:38:10 +0000</pubDate>
      <link>https://forem.com/deenuu1/pydantic-ai-capabilities-hooks-agent-specs-what-changed-and-how-our-libraries-migrated-4i18</link>
      <guid>https://forem.com/deenuu1/pydantic-ai-capabilities-hooks-agent-specs-what-changed-and-how-our-libraries-migrated-4i18</guid>
      <description>&lt;p&gt;Pydantic AI just shipped the biggest API change since launch. Capabilities, hooks, and agent specs landed in v1.71+, and they fundamentally change how you extend agents.&lt;/p&gt;

&lt;p&gt;We maintain 5 open-source libraries built on top of Pydantic AI: pydantic-ai-shields (formerly pydantic-ai-middleware), pydantic-ai-subagents, pydantic-ai-summarization, pydantic-ai-backend, and the full-stack AI agent template. All five have been migrated. This article covers what changed, why it matters, and real before/after code from our repos.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Are Capabilities?
&lt;/h2&gt;

&lt;p&gt;Capabilities are reusable, composable units of agent behavior. Instead of threading multiple configuration arguments separately -- tools here, instructions there, model settings somewhere else -- a capability bundles everything into a single &lt;code&gt;capabilities&lt;/code&gt; parameter on the &lt;code&gt;Agent&lt;/code&gt; constructor.&lt;/p&gt;

&lt;p&gt;Each capability can provide:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Tools&lt;/strong&gt; (via &lt;code&gt;get_toolset()&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Instructions&lt;/strong&gt; (static strings or dynamic callables)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Model settings&lt;/strong&gt; (per-step configuration)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Lifecycle hooks&lt;/strong&gt; (before/after/wrap patterns for runs, model requests, tool calls)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tool preparation&lt;/strong&gt; (filter or modify tool definitions per step)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The base class is &lt;code&gt;AbstractCapability&lt;/code&gt;. You subclass it, override the methods you need, and pass instances to the agent:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pydantic_ai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Agent&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pydantic_ai.capabilities&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;AbstractCapability&lt;/span&gt;

&lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;openai:gpt-4.1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;capabilities&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nc"&gt;MyCapability&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="nc"&gt;AnotherCapability&lt;/span&gt;&lt;span class="p"&gt;()],&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Multiple capabilities compose automatically. Before-hooks fire in order (cap1 then cap2), after-hooks fire reversed (cap2 then cap1), and wrap-hooks nest as middleware layers. This is not something we had to build -- the framework handles it.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Are Hooks?
&lt;/h2&gt;

&lt;p&gt;Hooks are the lifecycle interception points within capabilities. Pydantic AI provides hooks at four levels:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Run hooks&lt;/strong&gt; -- &lt;code&gt;before_run&lt;/code&gt;, &lt;code&gt;wrap_run&lt;/code&gt;, &lt;code&gt;after_run&lt;/code&gt;, &lt;code&gt;on_run_error&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Node hooks&lt;/strong&gt; -- &lt;code&gt;before_node_run&lt;/code&gt;, &lt;code&gt;wrap_node_run&lt;/code&gt;, &lt;code&gt;after_node_run&lt;/code&gt;, &lt;code&gt;on_node_run_error&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Model request hooks&lt;/strong&gt; -- &lt;code&gt;before_model_request&lt;/code&gt;, &lt;code&gt;wrap_model_request&lt;/code&gt;, &lt;code&gt;after_model_request&lt;/code&gt;, &lt;code&gt;on_model_request_error&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tool hooks&lt;/strong&gt; -- split into validation and execution phases, each with before/wrap/after/error variants&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Plus &lt;code&gt;prepare_tools&lt;/code&gt; for filtering tool visibility per step.&lt;/p&gt;

&lt;p&gt;That's roughly 20 hook points across 4 lifecycle levels. Error hooks use a neat pattern: &lt;strong&gt;raise to propagate, return to recover&lt;/strong&gt;. If your error handler raises the original exception, it propagates unchanged. Raise a different exception to transform the error. Return a result to suppress it entirely.&lt;/p&gt;

&lt;p&gt;For simple use cases, the &lt;code&gt;Hooks&lt;/code&gt; capability gives you decorator-based registration without subclassing:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pydantic_ai.capabilities&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Hooks&lt;/span&gt;

&lt;span class="n"&gt;hooks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Hooks&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="nd"&gt;@hooks.on.before_model_request&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;log_request&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;request_context&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Sending &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request_context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;request_context&lt;/span&gt;

&lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;openai:gpt-4.1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;capabilities&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;hooks&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  What Are Agent Specs?
&lt;/h2&gt;

&lt;p&gt;Agent specs separate agent configuration from code entirely. You define your agent in YAML or JSON:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;anthropic:claude-opus-4-6&lt;/span&gt;
&lt;span class="na"&gt;instructions&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;You are a helpful assistant.&lt;/span&gt;
&lt;span class="na"&gt;capabilities&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;WebSearch&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;Thinking&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;effort&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;high&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then load it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_file&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;agent.yaml&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Capabilities that implement &lt;code&gt;get_serialization_name()&lt;/code&gt; and &lt;code&gt;from_spec()&lt;/code&gt; are automatically available. This means your custom capabilities can be YAML-driven too.&lt;/p&gt;

&lt;h2&gt;
  
  
  How Our Libraries Migrated
&lt;/h2&gt;

&lt;h3&gt;
  
  
  pydantic-ai-middleware to pydantic-ai-shields
&lt;/h3&gt;

&lt;p&gt;This was the most dramatic change. Our middleware library had grown to include &lt;code&gt;MiddlewareAgent&lt;/code&gt;, &lt;code&gt;MiddlewareChain&lt;/code&gt;, &lt;code&gt;ParallelMiddleware&lt;/code&gt;, &lt;code&gt;ConditionalMiddleware&lt;/code&gt;, &lt;code&gt;PipelineSpec&lt;/code&gt;, config loaders, a compiler -- a whole parallel abstraction layer.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;We deleted all of it.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The v0.3.0 release renamed the package to &lt;code&gt;pydantic-ai-shields&lt;/code&gt; and rebuilt everything as capabilities.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Before (middleware era):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pydantic_ai_middleware&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;MiddlewareAgent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;CostTrackingMiddleware&lt;/span&gt;

&lt;span class="n"&gt;middleware_agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;MiddlewareAgent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;middlewares&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nc"&gt;CostTrackingMiddleware&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;budget_limit_usd&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;5.0&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;middleware_agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Hello&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;After (capabilities era):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pydantic_ai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Agent&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pydantic_ai_shields&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;CostTracking&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;PromptInjection&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;PiiDetector&lt;/span&gt;

&lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;openai:gpt-4.1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;capabilities&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="nc"&gt;CostTracking&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;budget_usd&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;5.0&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="nc"&gt;PromptInjection&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sensitivity&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;high&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="nc"&gt;PiiDetector&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;detect&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;email&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ssn&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;credit_card&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]),&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Hello&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No wrapper agent. No middleware chain. The shields are just capabilities that hook into &lt;code&gt;before_run&lt;/code&gt;, &lt;code&gt;after_run&lt;/code&gt;, &lt;code&gt;prepare_tools&lt;/code&gt;, and &lt;code&gt;before_tool_execute&lt;/code&gt; as needed.&lt;/p&gt;

&lt;p&gt;The new package ships 10 capabilities: &lt;code&gt;CostTracking&lt;/code&gt;, &lt;code&gt;ToolGuard&lt;/code&gt;, &lt;code&gt;InputGuard&lt;/code&gt;, &lt;code&gt;OutputGuard&lt;/code&gt;, &lt;code&gt;AsyncGuardrail&lt;/code&gt; for infrastructure, plus &lt;code&gt;PromptInjection&lt;/code&gt;, &lt;code&gt;PiiDetector&lt;/code&gt;, &lt;code&gt;SecretRedaction&lt;/code&gt;, &lt;code&gt;BlockedKeywords&lt;/code&gt;, and &lt;code&gt;NoRefusals&lt;/code&gt; as zero-dependency content shields.&lt;/p&gt;

&lt;h3&gt;
  
  
  pydantic-ai-subagents
&lt;/h3&gt;

&lt;p&gt;The subagents library now exposes &lt;code&gt;SubAgentCapability&lt;/code&gt; that bundles the subagent toolset and dynamic instructions:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pydantic_ai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Agent&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;subagents_pydantic_ai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;SubAgentCapability&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;SubAgentConfig&lt;/span&gt;

&lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;openai:gpt-4.1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;capabilities&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nc"&gt;SubAgentCapability&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;subagents&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="nc"&gt;SubAgentConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;researcher&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Researches topics&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;instructions&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a research assistant.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="p"&gt;)],&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The capability provides tools via &lt;code&gt;get_toolset()&lt;/code&gt; and injects instructions via &lt;code&gt;get_instructions()&lt;/code&gt;. It also supports agent spec serialization.&lt;/p&gt;

&lt;h3&gt;
  
  
  pydantic-ai-summarization
&lt;/h3&gt;

&lt;p&gt;Four capabilities replace the old middleware-based context management:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;SummarizationCapability&lt;/code&gt;&lt;/strong&gt; -- triggers LLM summarization when thresholds are reached&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;SlidingWindowCapability&lt;/code&gt;&lt;/strong&gt; -- zero-cost alternative that discards oldest messages&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;LimitWarnerCapability&lt;/code&gt;&lt;/strong&gt; -- injects warnings when limits approach&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;ContextManagerCapability&lt;/code&gt;&lt;/strong&gt; -- full package: token tracking, auto-compression, tool output truncation
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pydantic_ai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Agent&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pydantic_ai_summarization.capability&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ContextManagerCapability&lt;/span&gt;

&lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;openai:gpt-4.1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;capabilities&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nc"&gt;ContextManagerCapability&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;100_000&lt;/span&gt;&lt;span class="p"&gt;)],&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  pydantic-ai-backend
&lt;/h3&gt;

&lt;p&gt;Our filesystem toolkit became &lt;code&gt;ConsoleCapability&lt;/code&gt; -- bundling tools, instructions, and permission enforcement:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pydantic_ai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Agent&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pydantic_ai_backends&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ConsoleCapability&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pydantic_ai_backends.permissions&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;READONLY_RULESET&lt;/span&gt;

&lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;openai:gpt-4.1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;capabilities&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nc"&gt;ConsoleCapability&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;permissions&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;READONLY_RULESET&lt;/span&gt;&lt;span class="p"&gt;)],&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;1. Delete your abstraction layers.&lt;/strong&gt; We removed thousands of lines of middleware code. The framework does it better now.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Composition is free.&lt;/strong&gt; Multiple capabilities stack without you writing any merge logic.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Agent specs change deployment.&lt;/strong&gt; Define agent behavior in YAML, deploy by changing a config file.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. The migration is mechanical.&lt;/strong&gt; For each middleware hook, there's a direct capability equivalent.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5. Think in capabilities, not agents.&lt;/strong&gt; The old pattern: build a specialized agent. The new pattern: build a capability, attach it to any agent.&lt;/p&gt;

&lt;h2&gt;
  
  
  Links
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://ai.pydantic.dev/capabilities/" rel="noopener noreferrer"&gt;Pydantic AI capabilities docs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://ai.pydantic.dev/hooks/" rel="noopener noreferrer"&gt;Pydantic AI hooks docs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/vstorm-co/pydantic-ai-middleware" rel="noopener noreferrer"&gt;pydantic-ai-shields&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/vstorm-co/pydantic-ai-subagents" rel="noopener noreferrer"&gt;pydantic-ai-subagents&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/vstorm-co/pydantic-ai-summarization" rel="noopener noreferrer"&gt;pydantic-ai-summarization&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/vstorm-co/pydantic-ai-backend" rel="noopener noreferrer"&gt;pydantic-ai-backend&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/vstorm-co/full-stack-fastapi-nextjs-llm-template" rel="noopener noreferrer"&gt;full-stack AI agent template&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
    </item>
    <item>
      <title>Full RAG Pipeline: 4 Vector Stores, Hybrid Search, and Reranking in One Template</title>
      <dc:creator>Kacper Włodarczyk</dc:creator>
      <pubDate>Wed, 25 Mar 2026 01:25:12 +0000</pubDate>
      <link>https://forem.com/deenuu1/full-rag-pipeline-4-vector-stores-hybrid-search-and-reranking-in-one-template-1ef0</link>
      <guid>https://forem.com/deenuu1/full-rag-pipeline-4-vector-stores-hybrid-search-and-reranking-in-one-template-1ef0</guid>
      <description>&lt;h1&gt;
  
  
  We Added Full RAG to Our Open-Source AI Template: 4 Vector Stores, Hybrid Search, and Reranking
&lt;/h1&gt;

&lt;p&gt;&lt;em&gt;One template, every RAG decision already made — from vector store to reranking strategy.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;You know the drill. You want to add RAG to your AI app. So you start: pick a vector database, write an embedding pipeline, figure out chunking, wire up retrieval, add it to your agent as a tool, build a frontend to manage documents...&lt;/p&gt;

&lt;p&gt;Three weeks later you have a working prototype. Then someone asks "can we try Qdrant instead of Milvus?" and you realize your vector store is hardcoded in 14 places.&lt;/p&gt;

&lt;p&gt;We just shipped v0.2.2 of our open-source full-stack AI template, and RAG was the biggest addition. Not a toy demo — a production pipeline with 4 vector stores, 4 embedding providers, hybrid search, reranking, document versioning, and a management dashboard. All configurable. All swappable.&lt;/p&gt;

&lt;p&gt;Here's what we built and why.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;I'm Kacper, AI Engineer at &lt;a href="https://vstorm.co" rel="noopener noreferrer"&gt;Vstorm&lt;/a&gt; — an Applied Agentic AI Engineering Consultancy. We've shipped 30+ production AI agent implementations and open-source our tooling at &lt;a href="https://github.com/vstorm-co" rel="noopener noreferrer"&gt;github.com/vstorm-co&lt;/a&gt;. Connect with me on &lt;a href="https://www.linkedin.com/in/kacper-wlodarczyk/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The Architecture: 5 Steps, Every One Configurable
&lt;/h2&gt;

&lt;p&gt;Every RAG system does the same thing: &lt;strong&gt;parse → chunk → embed → store → search&lt;/strong&gt;. The difference is how many decisions you have to make at each step.&lt;/p&gt;

&lt;p&gt;In our template, each step is a pluggable abstraction:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Document Upload
  │
  ├── Parse: PyMuPDF (default) | LlamaParse (130+ formats) | python-docx
  │
  ├── Chunk: recursive (default) | markdown | fixed
  │     └── chunk_size=512, overlap=50 (configurable via env vars)
  │
  ├── Embed: OpenAI | Voyage | Gemini (multimodal) | SentenceTransformers (local)
  │     └── dimensions auto-derived from model name
  │
  ├── Store: Milvus | Qdrant | ChromaDB | pgvector
  │
  └── Search: vector | hybrid (BM25 + vector + RRF) | + reranking (Cohere | CrossEncoder)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You pick your stack during project generation. The template wires everything up. No glue code.&lt;/p&gt;

&lt;h2&gt;
  
  
  4 Vector Stores, 1 Interface
&lt;/h2&gt;

&lt;p&gt;The biggest design decision was making vector stores swappable. We implemented &lt;code&gt;BaseVectorStore&lt;/code&gt; with four backends:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;BaseVectorStore&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ABC&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;insert_document&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;collection_name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;document&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Document&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;collection_name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;limit&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;SearchResult&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;delete_document&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;collection_name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;document_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_collection_info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;collection_name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;CollectionInfo&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Milvus&lt;/strong&gt; — production-grade, runs as 3 Docker services (etcd + MinIO + Milvus). Best for large-scale deployments. Cosine similarity with IVF_FLAT indexing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Qdrant&lt;/strong&gt; — single Docker service, great balance of performance and simplicity. Our default recommendation for most teams.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;ChromaDB&lt;/strong&gt; — embedded mode, zero Docker required. Perfect for prototyping and local development. Just &lt;code&gt;pip install chromadb&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;pgvector&lt;/strong&gt; — uses your existing PostgreSQL. No new infrastructure. HNSW indexing. If you already have Postgres, this is the lowest-friction option.&lt;/p&gt;

&lt;p&gt;Switching between them? One environment variable:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# In your .env:&lt;/span&gt;
&lt;span class="nv"&gt;VECTOR_STORE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;qdrant    &lt;span class="c"&gt;# or: milvus, chromadb, pgvector&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The template handles connection strings, Docker services, schema creation, and index configuration automatically.&lt;/p&gt;

&lt;h2&gt;
  
  
  Hybrid Search: Why Vector-Only Isn't Enough
&lt;/h2&gt;

&lt;p&gt;Pure vector search works well for semantic queries ("documents about building safety"). It fails on exact matches ("find contract #2024-0847") because embeddings don't preserve exact strings.&lt;/p&gt;

&lt;p&gt;Our hybrid search combines both:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;retrieve&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;collection_name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;limit&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# Step 1: Vector search (semantic)
&lt;/span&gt;    &lt;span class="n"&gt;raw_results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;store&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;collection_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;limit&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;limit&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;fetch_multiplier&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Step 2: BM25 keyword search
&lt;/span&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_hybrid_enabled&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;bm25_results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_bm25_search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;collection_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;limit&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;fetch_multiplier&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;bm25_results&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;raw_results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_rrf_fuse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;raw_results&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;bm25_results&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Step 3: Rerank (optional)
&lt;/span&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;should_rerank&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;rerank_service&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;rerank_service&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;rerank&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;raw_results&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;top_k&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;limit&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="n"&gt;limit&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The fusion uses &lt;strong&gt;Reciprocal Rank Fusion (RRF)&lt;/strong&gt; — a simple but effective algorithm that combines rankings from multiple sources:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@staticmethod&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_rrf_fuse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;vector_results&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;bm25_results&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;60&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;scores&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;rank&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;enumerate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;vector_results&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="n"&gt;scores&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;scores&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mf"&gt;1.0&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;k&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;rank&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;rank&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;enumerate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;bm25_results&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="n"&gt;scores&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;scores&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mf"&gt;1.0&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;k&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;rank&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;sorted_by_score&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;scores&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Enable it with one env var: &lt;code&gt;RAG_HYBRID_SEARCH=true&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Reranking: The Quality Multiplier
&lt;/h2&gt;

&lt;p&gt;Initial retrieval casts a wide net. Reranking narrows it down. We support two options:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cohere Reranker&lt;/strong&gt; (API) — the fastest way to improve retrieval quality. Send your results + query, get them re-scored by a model trained specifically for relevance ranking:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;rerank&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;documents&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rerank-v3.5&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;top_n&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;top_k&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;CrossEncoder&lt;/strong&gt; (local) — runs a SentenceTransformers cross-encoder model locally. No API calls, no data leaves your infrastructure:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;pairs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[[&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;scores&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;predict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pairs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Runs locally on CPU/GPU
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The pipeline is: retrieve 3× more results than needed → rerank → return top-k. This consistently improves precision without touching your embeddings or vector store.&lt;/p&gt;

&lt;h2&gt;
  
  
  Document Versioning: SHA256 Dedup
&lt;/h2&gt;

&lt;p&gt;Re-ingesting a document shouldn't create duplicates. Our pipeline uses content hashing:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;ingest_file&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;filepath&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;collection_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;replace&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;document&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;processor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;process_file&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;filepath&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Check for existing version by source path or content hash
&lt;/span&gt;    &lt;span class="n"&gt;existing_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_find_existing_by_source&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;collection_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;source_path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;existing_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;existing_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_find_existing_by_hash&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;collection_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;document&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content_hash&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Replace old chunks with new ones
&lt;/span&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;existing_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;store&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;delete_document&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;collection_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;existing_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;store&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;insert_document&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;collection_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;document&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Google Drive sync? Same logic — changed files get re-embedded, unchanged files skip.&lt;/p&gt;

&lt;h2&gt;
  
  
  4 Embedding Providers
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Provider&lt;/th&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Dimensions&lt;/th&gt;
&lt;th&gt;API Key?&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;OpenAI&lt;/td&gt;
&lt;td&gt;text-embedding-3-small&lt;/td&gt;
&lt;td&gt;1536&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Voyage&lt;/td&gt;
&lt;td&gt;voyage-3&lt;/td&gt;
&lt;td&gt;1024&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Gemini&lt;/td&gt;
&lt;td&gt;gemini-embedding-exp-03-07&lt;/td&gt;
&lt;td&gt;3072&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SentenceTransformers&lt;/td&gt;
&lt;td&gt;all-MiniLM-L6-v2&lt;/td&gt;
&lt;td&gt;384&lt;/td&gt;
&lt;td&gt;No (local)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Dimensions are auto-derived from the model name — no manual configuration:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;EMBEDDING_DIMENSIONS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text-embedding-3-small&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1536&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;voyage-3&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1024&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gemini-embedding-exp-03-07&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;3072&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;all-MiniLM-L6-v2&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;384&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Gemini is the interesting one — it supports &lt;strong&gt;multimodal embeddings&lt;/strong&gt;. Text and images in the same vector space. We use it for image description extraction from PDFs.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Agent Integration
&lt;/h2&gt;

&lt;p&gt;RAG becomes an agent tool — &lt;code&gt;search_knowledge_base&lt;/code&gt; — available to all 5 AI frameworks (Pydantic AI, LangChain, LangGraph, CrewAI, DeepAgents):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;search_knowledge_base&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;collection&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;documents&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;collections&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# Multi-collection search
&lt;/span&gt;    &lt;span class="n"&gt;top_k&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Search with automatic reranking &amp;amp; hybrid search if enabled.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Results include source attribution: filename, page number, chunk number, and similarity score. The agent's system prompt instructs it to cite sources with &lt;code&gt;[1]&lt;/code&gt;, &lt;code&gt;[2]&lt;/code&gt; references.&lt;/p&gt;

&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;RAG is a pipeline of 5 decisions&lt;/strong&gt; (parse, chunk, embed, store, search) — our template makes each one configurable without code changes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Vector-only search misses exact matches&lt;/strong&gt; — hybrid (BM25 + vector + RRF) catches both semantic and keyword queries&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reranking is the cheapest quality improvement&lt;/strong&gt; — 3× over-retrieve + rerank consistently beats tuning embeddings&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Document versioning prevents duplicate chunks&lt;/strong&gt; — SHA256 content hash + source path tracking&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;One env var switches everything&lt;/strong&gt; — &lt;code&gt;VECTOR_STORE=pgvector&lt;/code&gt;, &lt;code&gt;RAG_HYBRID_SEARCH=true&lt;/code&gt;, &lt;code&gt;EMBEDDING_MODEL=voyage-3&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Try it yourself
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://github.com/vstorm-co/full-stack-ai-agent-template" rel="noopener noreferrer"&gt;full-stack-ai-agent-template&lt;/a&gt; — generates production-ready FastAPI + Next.js AI apps with full RAG pipeline&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;fastapi-fullstack
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Related:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://oss.vstorm.co/tools/ai-agent-configurator/" rel="noopener noreferrer"&gt;AI Agent Configurator&lt;/a&gt; — configure 75+ options visually, download as ZIP&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://oss.vstorm.co/guides/" rel="noopener noreferrer"&gt;Step-by-step guides&lt;/a&gt; — 50 tutorials across 5 frameworks&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;More from Vstorm's open-source ecosystem:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://oss.vstorm.co" rel="noopener noreferrer"&gt;All our open-source projects&lt;/a&gt; — 13 packages for the Pydantic AI ecosystem&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/vstorm-co/awesome-pydantic-ai" rel="noopener noreferrer"&gt;awesome-pydantic-ai&lt;/a&gt; — curated list of Pydantic AI resources and tools&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://vstorm.co" rel="noopener noreferrer"&gt;vstorm.co&lt;/a&gt; — our consultancy (30+ AI agent implementations)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If this was useful, follow me on &lt;a href="https://www.linkedin.com/in/kacper-wlodarczyk/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt; for daily AI agent insights.&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>ai</category>
      <category>programming</category>
      <category>beginners</category>
    </item>
    <item>
      <title>It's been a while since my last post, but I'm back with new content for you.</title>
      <dc:creator>Kacper Włodarczyk</dc:creator>
      <pubDate>Tue, 17 Mar 2026 12:38:26 +0000</pubDate>
      <link>https://forem.com/deenuu1/its-been-a-while-since-my-last-post-but-im-back-with-new-content-for-you-24bo</link>
      <guid>https://forem.com/deenuu1/its-been-a-while-since-my-last-post-but-im-back-with-new-content-for-you-24bo</guid>
      <description>&lt;div class="ltag__link"&gt;
  &lt;a href="/deenuu1" class="ltag__link__link"&gt;
    &lt;div class="ltag__link__pic"&gt;
      &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F997289%2F50074490-7c28-44da-9a80-f389f20d3691.jpeg" alt="deenuu1"&gt;
    &lt;/div&gt;
  &lt;/a&gt;
  &lt;a href="https://dev.to/deenuu1/from-0-to-production-ai-agent-in-30-minutes-full-stack-template-with-5-ai-frameworks-3b4o" class="ltag__link__link"&gt;
    &lt;div class="ltag__link__content"&gt;
      &lt;h2&gt;From 0 to Production AI Agent in 30 Minutes — Full-Stack Template with 5 AI Frameworks&lt;/h2&gt;
      &lt;h3&gt;Kacper Włodarczyk ・ Mar 17&lt;/h3&gt;
      &lt;div class="ltag__link__taglist"&gt;
        &lt;span class="ltag__link__tag"&gt;#ai&lt;/span&gt;
        &lt;span class="ltag__link__tag"&gt;#programming&lt;/span&gt;
        &lt;span class="ltag__link__tag"&gt;#python&lt;/span&gt;
        &lt;span class="ltag__link__tag"&gt;#softwareengineering&lt;/span&gt;
      &lt;/div&gt;
    &lt;/div&gt;
  &lt;/a&gt;
&lt;/div&gt;


</description>
      <category>ai</category>
      <category>programming</category>
      <category>python</category>
      <category>softwareengineering</category>
    </item>
    <item>
      <title>From 0 to Production AI Agent in 30 Minutes — Full-Stack Template with 5 AI Frameworks</title>
      <dc:creator>Kacper Włodarczyk</dc:creator>
      <pubDate>Tue, 17 Mar 2026 12:37:41 +0000</pubDate>
      <link>https://forem.com/deenuu1/from-0-to-production-ai-agent-in-30-minutes-full-stack-template-with-5-ai-frameworks-3b4o</link>
      <guid>https://forem.com/deenuu1/from-0-to-production-ai-agent-in-30-minutes-full-stack-template-with-5-ai-frameworks-3b4o</guid>
      <description>&lt;p&gt;Every AI project starts the same way.&lt;/p&gt;

&lt;p&gt;You need a FastAPI backend. Then authentication — JWT tokens, refresh logic, user management. Then a database — PostgreSQL, migrations, async connections. Then WebSocket streaming for real-time AI responses. Then a frontend — Next.js, state management, chat UI. Then Docker. Then CI/CD.&lt;/p&gt;

&lt;p&gt;Three days of boilerplate before you write a single line of AI code.&lt;/p&gt;

&lt;p&gt;I've set up this stack from scratch more times than I'd like to admit. After the third project where I copy-pasted the same auth middleware, the same WebSocket handler, the same Docker Compose config — I decided to build a generator that does all of it in one command.&lt;/p&gt;

&lt;p&gt;The result: [[full-stack-ai-agent-template]] — an open-source full-stack template with 5 AI frameworks, 75+ configuration options, and a web configurator that generates your entire project in minutes.&lt;/p&gt;

&lt;p&gt;614 stars on GitHub. Used by teams at NVIDIA, Pfizer, TikTok, and others. And you can go from zero to a running production AI agent in about 30 minutes.&lt;/p&gt;

&lt;p&gt;Let me walk you through exactly how.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;I'm Kacper, AI Engineer at &lt;a href="https://vstorm.co" rel="noopener noreferrer"&gt;Vstorm&lt;/a&gt; — an Applied Agentic AI Engineering Consultancy. We've shipped 30+ production AI agent implementations and open-source our tooling at &lt;a href="https://github.com/vstorm-co" rel="noopener noreferrer"&gt;github.com/vstorm-co&lt;/a&gt;. Connect with me on &lt;a href="https://www.linkedin.com/in/kacper-wlodarczyk/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 1: Open the Web Configurator
&lt;/h2&gt;

&lt;p&gt;Go to &lt;a href="https://oss.vstorm.co/full-stack-ai-agent-template/configurator/" rel="noopener noreferrer"&gt;oss.vstorm.co/full-stack-ai-agent-template/configurator/&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;No CLI installation needed. No pip. Just a browser.&lt;/p&gt;

&lt;p&gt;The configurator gives you a visual interface to pick every option for your project. Database, auth, AI framework, background tasks, observability, frontend — all of it. You see the full config before you generate anything.&lt;/p&gt;

&lt;p&gt;Alternatively, if you prefer the terminal:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;fastapi-fullstack
fastapi-fullstack
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This launches the interactive wizard that walks you through the same options.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 2: Pick a Preset (or Go Custom)
&lt;/h2&gt;

&lt;p&gt;The template ships with three presets that cover the most common use cases:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Preset&lt;/th&gt;
&lt;th&gt;What you get&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;--minimal&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Bare FastAPI app — no database, no auth, no extras&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;--preset ai-agent&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;PostgreSQL + JWT auth + AI agent + WebSocket streaming + conversation persistence + Redis&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;--preset production&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Full production setup — Redis, caching, rate limiting, Sentry, Prometheus, Kubernetes&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;For this walkthrough, I'll use the &lt;strong&gt;AI Agent&lt;/strong&gt; preset with Pydantic AI — the most common starting point for AI applications:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;fastapi-fullstack create my_ai_app &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--preset&lt;/span&gt; ai-agent &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--ai-framework&lt;/span&gt; pydantic_ai &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--frontend&lt;/span&gt; nextjs
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That single command generates a full-stack project with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;FastAPI backend with async PostgreSQL&lt;/li&gt;
&lt;li&gt;JWT authentication with user management&lt;/li&gt;
&lt;li&gt;Pydantic AI agent with WebSocket streaming&lt;/li&gt;
&lt;li&gt;Conversation persistence (chat history saved to DB)&lt;/li&gt;
&lt;li&gt;Redis for caching and sessions&lt;/li&gt;
&lt;li&gt;Next.js 15 frontend with React 19 and Tailwind CSS v4&lt;/li&gt;
&lt;li&gt;Docker Compose for the full stack&lt;/li&gt;
&lt;li&gt;GitHub Actions CI/CD&lt;/li&gt;
&lt;li&gt;Logfire observability&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Step 3: Look at What You Got
&lt;/h2&gt;

&lt;p&gt;The generated project follows a clean layered architecture — Repository + Service pattern, inspired by real production codebases:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;my_ai_app/
├── backend/
│   ├── app/
│   │   ├── main.py              # FastAPI app with lifespan
│   │   ├── api/routes/v1/       # Versioned API endpoints
│   │   ├── core/                # Config, security, middleware
│   │   ├── db/models/           # SQLAlchemy models
│   │   ├── schemas/             # Pydantic schemas
│   │   ├── repositories/        # Data access layer
│   │   ├── services/            # Business logic
│   │   ├── agents/              # AI agents (this is where your code goes)
│   │   └── commands/            # Django-style CLI commands
│   ├── cli/                     # Project CLI
│   ├── tests/                   # pytest test suite
│   └── alembic/                 # Database migrations
├── frontend/
│   ├── src/
│   │   ├── app/                 # Next.js App Router
│   │   ├── components/          # React components (chat UI included)
│   │   ├── hooks/               # useChat, useWebSocket
│   │   └── stores/              # Zustand state management
├── docker-compose.yml
├── Makefile
├── CLAUDE.md                    # AI coding assistant context
└── AGENTS.md                    # Multi-agent project guide
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Notice the &lt;code&gt;CLAUDE.md&lt;/code&gt; and &lt;code&gt;AGENTS.md&lt;/code&gt; files — the generated project is optimized for AI coding assistants like Claude Code, Cursor, and Copilot. It follows progressive disclosure best practices so your AI assistant understands the project structure immediately.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 4: Start Everything with Docker
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cd &lt;/span&gt;my_ai_app
make docker-up        &lt;span class="c"&gt;# Backend + PostgreSQL + Redis&lt;/span&gt;
make docker-frontend  &lt;span class="c"&gt;# Next.js frontend&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it. Two commands. The entire stack is running:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;API&lt;/strong&gt;: &lt;a href="http://localhost:8000" rel="noopener noreferrer"&gt;http://localhost:8000&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;API Docs&lt;/strong&gt;: &lt;a href="http://localhost:8000/docs" rel="noopener noreferrer"&gt;http://localhost:8000/docs&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Frontend&lt;/strong&gt;: &lt;a href="http://localhost:3000" rel="noopener noreferrer"&gt;http://localhost:3000&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Admin Panel&lt;/strong&gt;: &lt;a href="http://localhost:8000/admin" rel="noopener noreferrer"&gt;http://localhost:8000/admin&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you prefer running without Docker, the template generates a Makefile with shortcuts:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;make &lt;span class="nb"&gt;install&lt;/span&gt;       &lt;span class="c"&gt;# Install Python + Node dependencies&lt;/span&gt;
make docker-db     &lt;span class="c"&gt;# Start just PostgreSQL&lt;/span&gt;
make db-migrate    &lt;span class="c"&gt;# Create initial migration&lt;/span&gt;
make db-upgrade    &lt;span class="c"&gt;# Apply migrations&lt;/span&gt;
make create-admin  &lt;span class="c"&gt;# Create admin user&lt;/span&gt;
make run           &lt;span class="c"&gt;# Start backend&lt;/span&gt;
&lt;span class="nb"&gt;cd &lt;/span&gt;frontend &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; bun dev  &lt;span class="c"&gt;# Start frontend&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Step 5: Your AI Agent Is Already Working
&lt;/h2&gt;

&lt;p&gt;Open &lt;code&gt;http://localhost:3000&lt;/code&gt;, log in, and start chatting. The AI agent is already wired up — WebSocket streaming, conversation history, tool calls — all functional out of the box.&lt;/p&gt;

&lt;p&gt;Here's what the generated agent looks like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# app/agents/assistant.py
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pydantic_ai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;RunContext&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;dataclasses&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;dataclass&lt;/span&gt;

&lt;span class="nd"&gt;@dataclass&lt;/span&gt;
&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Deps&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
    &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;AsyncSession&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;

&lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Deps&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;](&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;openai:gpt-4o-mini&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;system_prompt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a helpful assistant.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nd"&gt;@agent.tool&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;search_database&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;RunContext&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Deps&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Search the database for relevant information.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="c1"&gt;# Access user context and database via ctx.deps
&lt;/span&gt;    &lt;span class="bp"&gt;...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Type-safe. Dependency injection built in. Tool calling with full context access. This isn't a toy example — it's the same pattern we use in production at [[Vstorm]].&lt;/p&gt;

&lt;p&gt;The WebSocket endpoint handles streaming automatically:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@router.websocket&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/ws&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;agent_ws&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;websocket&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;WebSocket&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;websocket&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;accept&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_input&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;websocket&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;send_json&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text_delta&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;
        &lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Step 6: Customize the AI Layer
&lt;/h2&gt;

&lt;p&gt;Here's the key insight: &lt;strong&gt;everything except the AI agent is production-ready infrastructure that you don't need to touch&lt;/strong&gt;. Auth works. Database works. Streaming works. Frontend works.&lt;/p&gt;

&lt;p&gt;You modify one directory: &lt;code&gt;app/agents/&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Want to change from OpenAI to Anthropic? Update the model string:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Deps&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;](&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;anthropic:claude-sonnet-4-5&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;system_prompt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a helpful assistant.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Want to add a tool? Add a function:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@agent.tool&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_weather&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;RunContext&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Deps&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;city&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Get current weather for a city.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;httpx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;AsyncClient&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;resp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://api.weather.com/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;city&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;resp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;summary&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Want to switch to LangChain or CrewAI entirely? Regenerate the project with a different &lt;code&gt;--ai-framework&lt;/code&gt; flag. The rest of the stack stays the same.&lt;/p&gt;

&lt;h2&gt;
  
  
  5 AI Frameworks, One Template
&lt;/h2&gt;

&lt;p&gt;The template supports five AI frameworks, all with the same backend infrastructure:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Framework&lt;/th&gt;
&lt;th&gt;Best for&lt;/th&gt;
&lt;th&gt;Observability&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Pydantic AI&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Type-safe agents, dependency injection&lt;/td&gt;
&lt;td&gt;Logfire&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;LangChain&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Chains, existing LangChain tooling&lt;/td&gt;
&lt;td&gt;LangSmith&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;LangGraph&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Complex multi-step workflows, ReAct agents&lt;/td&gt;
&lt;td&gt;LangSmith&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;CrewAI&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Multi-agent crews, role-based agents&lt;/td&gt;
&lt;td&gt;LangSmith&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;DeepAgents&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Claude Code-style agentic coding, HITL&lt;/td&gt;
&lt;td&gt;LangSmith&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;You pick the framework when generating the project. The WebSocket streaming, conversation persistence, auth, and frontend all work the same way regardless of which framework you choose.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Generate with LangGraph&lt;/span&gt;
fastapi-fullstack create my_app &lt;span class="nt"&gt;--preset&lt;/span&gt; ai-agent &lt;span class="nt"&gt;--ai-framework&lt;/span&gt; langgraph &lt;span class="nt"&gt;--frontend&lt;/span&gt; nextjs

&lt;span class="c"&gt;# Generate with CrewAI&lt;/span&gt;
fastapi-fullstack create my_app &lt;span class="nt"&gt;--preset&lt;/span&gt; ai-agent &lt;span class="nt"&gt;--ai-framework&lt;/span&gt; crewai &lt;span class="nt"&gt;--frontend&lt;/span&gt; nextjs
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  75+ Configuration Options
&lt;/h2&gt;

&lt;p&gt;Beyond AI frameworks, the template covers the full spectrum of production needs:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Databases:&lt;/strong&gt; PostgreSQL (async), MongoDB (async), SQLite&lt;br&gt;
&lt;strong&gt;ORMs:&lt;/strong&gt; SQLAlchemy, SQLModel&lt;br&gt;
&lt;strong&gt;Auth:&lt;/strong&gt; JWT + refresh tokens, API keys, Google OAuth&lt;br&gt;
&lt;strong&gt;Background tasks:&lt;/strong&gt; Celery, Taskiq, ARQ&lt;br&gt;
&lt;strong&gt;Observability:&lt;/strong&gt; Logfire, LangSmith, Sentry, Prometheus&lt;br&gt;
&lt;strong&gt;Infrastructure:&lt;/strong&gt; Docker, Kubernetes, GitHub Actions, GitLab CI, Traefik, Nginx&lt;br&gt;
&lt;strong&gt;Frontend:&lt;/strong&gt; Next.js 15 with React 19, TypeScript, Tailwind CSS v4, dark mode, i18n&lt;br&gt;
&lt;strong&gt;Extras:&lt;/strong&gt; Redis caching, rate limiting, SQLAdmin panel, webhooks, S3 file storage, RAG with Milvus&lt;/p&gt;

&lt;p&gt;Every option is a boolean flag. No Jinja template hacking. No post-generation cleanup. The generator produces clean code that only includes what you selected.&lt;/p&gt;
&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;The web configurator at &lt;a href="https://oss.vstorm.co/full-stack-ai-agent-template/configurator/" rel="noopener noreferrer"&gt;oss.vstorm.co&lt;/a&gt; lets you visually configure and download a full-stack AI project&lt;/strong&gt; — no CLI needed.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Three presets (minimal, ai-agent, production) cover 90% of use cases&lt;/strong&gt; — customize from there.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;5 AI frameworks share the same infrastructure&lt;/strong&gt; — switch frameworks without rewriting your backend.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The generated code is production-grade, not a prototype&lt;/strong&gt; — layered architecture, async everywhere, type-safe.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You modify &lt;code&gt;app/agents/&lt;/code&gt; and nothing else&lt;/strong&gt; — auth, streaming, persistence, frontend are done.&lt;/li&gt;
&lt;/ul&gt;


&lt;h2&gt;
  
  
  Try it yourself
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://github.com/vstorm-co/full-stack-ai-agent-template" rel="noopener noreferrer"&gt;full-stack-ai-agent-template&lt;/a&gt; — Production-ready full-stack AI agent template with 5 frameworks and 75+ options.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;fastapi-fullstack
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or use the &lt;a href="https://oss.vstorm.co/full-stack-ai-agent-template/configurator/" rel="noopener noreferrer"&gt;Web Configurator&lt;/a&gt; — no installation needed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;More from Vstorm's open-source ecosystem:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://oss.vstorm.co" rel="noopener noreferrer"&gt;All our open-source projects&lt;/a&gt; — 13 packages for the Pydantic AI ecosystem&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/vstorm-co/awesome-pydantic-ai" rel="noopener noreferrer"&gt;awesome-pydantic-ai&lt;/a&gt; — curated list of Pydantic AI resources and tools&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://vstorm.co" rel="noopener noreferrer"&gt;vstorm.co&lt;/a&gt; — our consultancy (30+ AI agent implementations)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If this was useful, follow me on &lt;a href="https://www.linkedin.com/in/kacper-wlodarczyk/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt; for daily AI agent insights.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>python</category>
      <category>softwareengineering</category>
    </item>
    <item>
      <title>Pydantic-DeepAgents: A Lightweight, Production-Ready Framework for Building Autonomous AI Agents</title>
      <dc:creator>Kacper Włodarczyk</dc:creator>
      <pubDate>Mon, 22 Dec 2025 01:19:27 +0000</pubDate>
      <link>https://forem.com/deenuu1/pydantic-deepagents-a-lightweight-production-ready-framework-for-building-autonomous-ai-agents-2l3i</link>
      <guid>https://forem.com/deenuu1/pydantic-deepagents-a-lightweight-production-ready-framework-for-building-autonomous-ai-agents-2l3i</guid>
      <description>&lt;p&gt;&lt;em&gt;Inspired by LangChain deepagents — but simpler, type-safe, and with Docker sandboxing built-in&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;In 2025, autonomous AI agents are no longer just research prototypes — they’re powering real-world automation, code generation tools, data pipelines, and intelligent assistants. However, many popular agent frameworks come with heavy dependencies, complex graphs, and a steep learning curve that makes production deployment challenging.&lt;/p&gt;

&lt;p&gt;That’s why we at &lt;strong&gt;Vstorm&lt;/strong&gt; built &lt;strong&gt;Pydantic-DeepAgents&lt;/strong&gt; — a minimal yet powerful open-source framework that extends &lt;strong&gt;Pydantic-AI&lt;/strong&gt; with everything you need to create reliable, production-grade agents.&lt;/p&gt;

&lt;p&gt;GitHub repository: &lt;a href="https://github.com/vstorm-co/pydantic-deepagents" rel="noopener noreferrer"&gt;https://github.com/vstorm-co/pydantic-deepagents&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz36u3scp2j9q9veuk8o6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz36u3scp2j9q9veuk8o6.png" alt=" " width="800" height="409"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  What makes Pydantic-DeepAgents different?
&lt;/h3&gt;

&lt;p&gt;We were heavily inspired by LangChain’s excellent &lt;a href="https://github.com/langchain-ai/deepagents" rel="noopener noreferrer"&gt;deepagents&lt;/a&gt; project — a clean implementation of “deep agent” patterns including planning loops, tool calling, subagent delegation, and human-in-the-loop workflows.&lt;/p&gt;

&lt;p&gt;Instead of reinventing the wheel, we asked: &lt;em&gt;What if we built the same powerful patterns, but fully in the Pydantic-AI ecosystem?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The result is a framework that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Keeps dependencies lightweight (no LangGraph, no massive ecosystem)&lt;/li&gt;
&lt;li&gt;Leverages Pydantic’s native type-safety and validation for structured outputs&lt;/li&gt;
&lt;li&gt;Adds production-focused features missing from many alternatives&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Core Features
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Planning &amp;amp; Reasoning&lt;/strong&gt; — TodoToolset for autonomous task breakdown and self-correction&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Filesystem Access&lt;/strong&gt; — Full read/write operations with FilesystemToolset&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Subagent Delegation&lt;/strong&gt; — Break complex tasks into specialized subagents (SubAgentToolset)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Extensible Skills System&lt;/strong&gt; — Define new agent capabilities with simple Markdown prompts (perfect for rapid iteration)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multiple Backends&lt;/strong&gt; — In-memory, persistent filesystem, secure &lt;strong&gt;DockerSandbox&lt;/strong&gt; (isolated code execution), and CompositeBackend&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;File Uploads&lt;/strong&gt; — Seamless processing of uploaded files via &lt;code&gt;run_with_files()&lt;/code&gt; or &lt;code&gt;deps.upload_file()&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Context Management&lt;/strong&gt; — Automatic summarization for long-running conversations&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Human-in-the-Loop&lt;/strong&gt; — Built-in confirmation workflows for critical actions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Streaming Support&lt;/strong&gt; — Token-by-token responses for responsive UIs&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Structured Outputs&lt;/strong&gt; — Type-safe Pydantic models via &lt;code&gt;output_type&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  See It in Action
&lt;/h3&gt;

&lt;p&gt;We’ve included a complete full-stack demo application (FastAPI backend + streaming web UI) that demonstrates:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Live agent reasoning traces&lt;/li&gt;
&lt;li&gt;File uploads and processing&lt;/li&gt;
&lt;li&gt;Human approval steps&lt;/li&gt;
&lt;li&gt;Streaming responses&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Demo app: &lt;a href="https://github.com/vstorm-co/pydantic-deepagents/tree/main/examples/full_app" rel="noopener noreferrer"&gt;https://github.com/vstorm-co/pydantic-deepagents/tree/main/examples/full_app&lt;/a&gt;&lt;br&gt;&lt;br&gt;
Quick video walkthrough: &lt;a href="https://drive.google.com/file/d/1hqgXkbAgUrsKOWpfWdF48cqaxRht-8od/view?usp=sharing" rel="noopener noreferrer"&gt;https://drive.google.com/file/d/1hqgXkbAgUrsKOWpfWdF48cqaxRht-8od/view?usp=sharing&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  When to choose Pydantic-DeepAgents?
&lt;/h3&gt;

&lt;p&gt;Choose it when you want:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A clean, maintainable agent architecture without framework bloat&lt;/li&gt;
&lt;li&gt;Strong guarantees around data validation and structured responses&lt;/li&gt;
&lt;li&gt;Secure execution (Docker sandbox out of the box)&lt;/li&gt;
&lt;li&gt;Fast prototyping with Markdown-defined skills&lt;/li&gt;
&lt;li&gt;Easy deployment in production environments&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It’s particularly great if you’re already using Pydantic-AI, prefer minimalism, or need agents that interact safely with files and external tools.&lt;/p&gt;

&lt;h3&gt;
  
  
  Get Started Today
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;pydantic-deep
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Check out the repository, star it if you find it useful, and feel free to open issues or PRs — we’d love contributions!&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/vstorm-co/pydantic-deepagents" rel="noopener noreferrer"&gt;https://github.com/vstorm-co/pydantic-deepagents&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We’re excited to see what you build with it.&lt;/p&gt;

&lt;p&gt;— Team at Vstorm (&lt;a href="https://vstorm.co" rel="noopener noreferrer"&gt;https://vstorm.co&lt;/a&gt;)&lt;/p&gt;

</description>
      <category>programming</category>
      <category>python</category>
      <category>langchain</category>
      <category>agents</category>
    </item>
  </channel>
</rss>
