Forem: Kacper Włodarczyk

24 Claude Code Skills to Fix Your AI Stack: Introducing production-stack-skills and content-skills

Kacper Włodarczyk — Mon, 20 Apr 2026 11:59:05 +0000

78% of Fortune 500 companies are adopting AI coding assistants. 45% of that generated code ships with security vulnerabilities. On the content side, 76% of readers identify AI-written text within three seconds, and engagement drops around 47% when they do.

Those three numbers describe the same problem from two angles: AI outputs need guardrails, whether the output is code or writing. Today we're shipping two Claude Code skill packs that sit on those guardrails.

A Claude Code skill is a packaged slash command that augments your AI coding agent with a specific expertise, readable via SKILL.md and invocable from any AGENTS.md-compatible runtime. The Skills Wave is two of those packs, released the same day because the failure modes on both sides of the AI workflow deserve the same fix.

I'm Kacper, AI Engineer at Vstorm, an Applied Agentic AI Engineering Consultancy. We've shipped 30+ production AI agent implementations and open-source our tooling at github.com/vstorm-co.

production-stack-skills: 10 Claude Code Skills for Production-Ready AI Code

production-stack-skills is a 10-skill pack that audits AI-generated code across six weighted categories and hands back a 0 to 100 production-readiness score with a prioritized action plan.

The flagship command is /production check. You point it at a repo, it reads the FastAPI routes, Postgres migrations, Dockerfiles, and config, and returns:

A headline score (e.g., "Production Readiness: 34/100")
Six category scores: security, error handling, observability, deployment, data layer, code quality
A Quick Wins section with point deltas
An Action Plan sorted by weighted impact

In first internal runs across client repos, Quick Wins alone consistently moves the score about 30 points. That's the number I'd quote if a CTO asked "what does running this for a morning actually buy me".

The other nine skills split by surface area: /production review, /production planner, /production fastapi, /production postgres, /production docker, /production deploy, /production monitoring, /production security, /production error-handling. Each is a focused slash command rather than a sub-mode of a monolithic agent.

content-skills: 14 Brand-First Skills That Kill AI Slop

content-skills is a 14-skill pack with a /brand/ directory at its core, so every piece of content generated after a five-minute brand interview reads your BRAND.md, VOICE.md, VISUAL.md, and voice samples before writing a word.

Run /content setup once. Five questions. It writes /brand/. From that moment, every content skill auto-reads that directory on every invocation.

The exit point is /content audit. Score any piece of content 0 to 100 on voice consistency, anti-slop markers, visual consistency, and brand alignment.

Between setup and audit sit 12 production skills: strategy, calendar, blog, twitter, linkedin, reddit, hackernews, presentation, infographic, image, video, repurpose.

One Install, Dual-CLI, Uninstall-Safe

curl -fsSL https://raw.githubusercontent.com/vstorm-co/production-stack-skills/main/install.sh | bash
curl -fsSL https://raw.githubusercontent.com/vstorm-co/content-skills/main/install.sh | bash

Each install mirrors the skills into both ~/.claude/ (Claude Code) and ~/.agents/ (Codex, Amp, and anything AGENTS.md-compatible). You don't pick the runtime up front. A skill written for Claude Code works identically in Codex.

Uninstall is boring by design. /content-skills uninstall removes the skills. Your /brand/ stays.

Both repos are MIT.

Why Two Packs Shipped the Same Day

In 30+ production AI agent deployments at Vstorm, the failures cluster into two shapes: code that passes demo but fails the first prod incident, and content that sounds like the AI wrote it because the AI wrote it.

One pack addresses the first. The other addresses the second. Both use the same architecture because the lesson that skills beat monolithic agents applies to both. You don't want "one AI that does everything". You want 24 small, composable, auditable slash commands that you can swap, tune, or remove when they stop pulling weight.

Key Takeaways

production-stack-skills ships 10 skills with a 0-100 scorer and Quick Wins section that typically moves scores +30 in under five minutes.
content-skills ships 14 skills built around a /brand/ directory auto-read on every invocation.
Both packs install via one curl command, mirror into ~/.claude/ and ~/.agents/, work on Claude Code, Codex, and any AGENTS.md-compatible runtime.
Skill-first architecture beats monolithic agents for auditability and local updates.
Both repos are MIT.

Frequently Asked Questions

What are Claude Code skills and how do I install them?

Claude Code skills are packaged slash commands backed by a SKILL.md file that extends a coding agent with specific expertise. You install a skill pack with a single curl command that mirrors files into ~/.claude/ and ~/.agents/. After install, slash commands become available immediately without restart.

What's the difference between production-stack-skills and content-skills?

production-stack-skills audits and hardens AI-generated code across security, error handling, observability, deployment, data, and code quality, returning a 0-100 score. content-skills audits and produces on-brand content using a /brand/ directory you set up once, returning voice-consistency and anti-slop scores.

Do these skills work with Codex or only Claude Code?

Both. Each install script mirrors files into both ~/.claude/ and ~/.agents/, so the same skill works in Claude Code, Codex, Amp, and any AGENTS.md-compatible runtime without modification.

How long does `/production check` take on a real repo?

On a typical FastAPI + Postgres repo of a few thousand lines, about a minute. The Quick Wins section is what you act on first, usually under five minutes to apply.

When should I use a Claude Code skill instead of writing a full agent?

Use a skill when the job is scoped expertise invoked explicitly (audit this, write a post in my voice). Use a full agent when the job is open-ended, multi-step, and requires planning across tools. Skills are composable building blocks; agents orchestrate them.

Get Started

production-stack-skills: github.com/vstorm-co/production-stack-skills
content-skills: github.com/vstorm-co/content-skills
Full writeup: oss.vstorm.co/blog/skills-wave-launch-claude-code-skills

Follow me on LinkedIn for the follow-ups: Wednesday's production deep-dive, Thursday's content walkthrough, Friday's "8 Lessons from Shipping 24 Claude Code Skills".

Full Observability in AI Agents: What We Added to the pydantic-deepagents TUI

Kacper Włodarczyk — Sat, 18 Apr 2026 09:58:43 +0000

Full Observability in AI Agents: What We Added to the pydantic-deepagents TUI

This is a cross-post. Canonical version: oss.vstorm.co/blog/ai-agent-tui-observability-pydantic-deep/

This week I covered how pydantic-deepagents handles stuck loops, context window blindness, and frictionless installation. Today: what you actually see when all of that runs.

The problem with invisible agents

When an AI agent runs, a lot happens between your prompt and the response. The model reasons. It calls tools. Each action burns tokens and costs money. Without observability, you're flying blind — you can't debug, optimize, or trust what's happening.

pydantic-deepagents v0.3.5 — the modular agent runtime for Python — reworks the TUI to surface everything.

What changed

Per-turn token usage

Below every assistant response:

in:2.1K · out:412 · total:2.5K · reqs:3

in = input tokens, out = output tokens, total = turn total, reqs = API calls in this turn.

Cumulative cost in the header

pydantic-deepagents  in:45K out:3K · $0.12

Updates after each response. You always know the running total.

Thinking streamed live → collapsed

Model reasoning appears as dimmed text while running. Collapses to a one-line summary when done. Watch the agent reason without drowning in it.

Side panel on startup

Opens automatically when terminal ≥100 chars wide. Shows subagents before any task:

Subagents:
• planner (idle)
• research (idle)

Status updates as agents are delegated work.

All tool calls visible

Todo tools (read_todos, write_todos, add_todo, update_todo_status, remove_todo) were previously hidden. Now surfaced. Every agent action is visible.

Session saved on crash

_save_session() is now in a finally block. Crash, exception, keyboard interrupt — messages.json is always written. No more lost sessions.

Subagent logs: 20K chars (was 2K)

tool_log.jsonl now stores full subagent output. Critical for /improve — the pipeline that extracts learnings from sessions (more on that tomorrow).

The full layout

┌─────────────────────────────────┬──────────────────┐
│ pydantic-deepagents  in:45K out:3K · $0.12         │
├─────────────────────────────────┼──────────────────┤
│ [thinking... dimmed text]       │ Subagents:       │
│ [collapsed to summary]          │ • planner (idle) │
│                                 │ • research (idle)│
│ Agent response here...          │                  │
│ in:2.1K · out:412 · $0.04       │                  │
└─────────────────────────────────┴──────────────────┘

Try it

curl -fsSL https://oss.vstorm.co/install.sh | bash

GitHub: github.com/vstorm-co/pydantic-deep

Observability is how you debug, optimize, and trust your agent. A black box is a liability.

What's the first metric you check when debugging an agent run?

Browser Automation + /improve: AI Agents That Browse the Web and Fix Themselves

Kacper Włodarczyk — Fri, 17 Apr 2026 12:20:41 +0000

This week I shipped 5 versions of pydantic-deepagents — the modular agent runtime for Python. Today: the two features that close the loop — browser automation and session-based self-improvement.

Part 1: BrowserCapability — 9 Playwright Tools

pip install 'pydantic-deep[browser]'
playwright install chromium

from pydantic_deep.capabilities import BrowserCapability

agent = create_deep_agent(
    model="anthropic:claude-opus-4-6",
    extra_capabilities=[BrowserCapability(
        allowed_domains=["github.com", "docs.python.org"],
        auto_screenshot=True,
    )]
)

The 9 tools: navigate, click, type_text, get_text, screenshot, scroll, go_back, go_forward, execute_js.

Safety design: Single-tab (predictable state), domain allowlist (agent can't navigate outside allowed domains), automatic popup interception, content truncation to prevent context overflow.

Browser lifecycle: Chromium starts before the agent run, stops after — whether the run succeeds, fails, or is cancelled. No orphaned processes.

CLI:

pydantic-deep tui --browser --browser-headed   # visible window
pydantic-deep run "research X on GitHub" --browser --sandbox docker

Bug fix: Browser tools now force kind='function' — they never trigger approval dialogs mid-task.

Part 2: /improve — Session-Based Self-Improvement

After each session, /improve analyzes the full run and extracts:

UserFactInsight — what the agent learned about you and your preferences
AgentLearningInsight — strategies that worked, failure modes encountered

Both write to MEMORY.md. Next session loads MEMORY.md automatically.

Key finding: We tested summaries vs raw tool traces as input to the synthesis step. Raw traces performed significantly better — summaries compress away the signal that matters. /improve reads from tool_log.jsonl (written per session), not from a summary.

The loop: agent runs → /improve extracts insights → MEMORY.md grows → next run starts smarter.

This Week's Full Stack

Monday: StuckLoopDetection | Tuesday: LimitWarnerCapability | Wednesday: curl install | Thursday: Docker sandbox | Today: browser + /improve

An agent that detects loops, knows its context limits, installs in 30s, runs in Docker, browses the web, and learns from every session.

Full breakdown: https://oss.vstorm.co/blog/browser-automation-improve-ai-agents-pydantic-deep/

GitHub: https://github.com/vstorm-co/pydantic-deep

curl | bash for AI Agents: One-Command Install for pydantic-deep

Kacper Włodarczyk — Wed, 15 Apr 2026 11:57:47 +0000

The standard Python AI tool install experience:

Install Python (which version?)
Create a venv
pip install
ModuleNotFoundError: No module named 'textual'
pip install again with correct extras
Figure out PATH

Seven steps. Fifteen minutes. That's before you've even seen the tool.

We fixed this for pydantic-deep — the modular agent runtime for Python:

curl -fsSL https://raw.githubusercontent.com/vstorm-co/pydantic-deep/main/install.sh | bash

That's it.

What install.sh does

Detects whether uv is installed
If not: installs uv via the official Astral installer
Runs uv tool install "pydantic-deep[cli]" — isolated environment, binary available globally
Verifies the install
Prints PATH fix instructions if needed

No virtual environment management. No extras guessing. The cli extras group includes everything including textual (which was the original bug — it was missing from the base install).

Self-update

pydantic-deep update

Uses uv tool upgrade if available, falls back to pip. One command to stay current.

Startup notifications

Every invocation checks PyPI for updates silently (2-second timeout, 24-hour cache):

Update available: v0.3.6 → v0.3.7  Run: pydantic-deep update

Never blocks startup.

Why uv?

uv tool install is designed exactly for this use case — isolated tool environments, global binary access, no activation required. Fast, well-maintained, increasingly standard for Python CLI tooling.

Alternatives considered: pipx (slower, needs separate install), Homebrew tap (maintenance overhead), native binary (too brittle for dynamic imports).

Full write-up with implementation details: oss.vstorm.co/blog/pydantic-deep-one-command-install-curl-bash

GitHub: github.com/vstorm-co/pydantic-deep

What's the worst install experience you've had with an AI/ML tool?

Context Window Blindness: Why Your AI Agent Doesn't Know It's Running Out of Space

Kacper Włodarczyk — Tue, 14 Apr 2026 11:05:25 +0000

On Monday I showed how agents waste tokens by getting stuck in loops — repeating the same tool call dozens of times, burning money on nothing. Today — a quieter problem that costs just as much, and is far harder to spot.

Your AI agent has been going blind. It had no idea its context window was 90% full.

The Problem: Two Different Realities

When you run a long agent task, here's what you see: a status bar showing "Context: 87% used." It's right there in the TUI. You can see the agent is almost out of space.

But the model can't see the status bar. It has no idea.

From the model's perspective, every message it writes, every tool call it makes, every plan it sketches — all of that just continues normally. It has no signal that the conversation history is filling up. It keeps producing long responses, initiating multi-step plans, making tool calls that generate pages of output.

Then at 90%: auto-compression kicks in. The model's working memory gets force-compressed. It loses the thread of what it did 40 messages ago. It starts contradicting its earlier decisions.

This is context window blindness: the gap between what the user sees and what the model knows about its own situation.

The Fix: LimitWarnerCapability

In pydantic-deep v0.3.8 — the modular agent runtime for Python — we added LimitWarnerCapability. The solution: inject usage information directly into the conversation as a user message, at two thresholds.

At ~70% usage:

You are approaching the context limit. Begin wrapping up your current task. Avoid starting new complex subtasks.

At ~85% usage:

CRITICAL: Your context window is almost full. Use /compact NOW before continuing.

These are injected as user messages — not system prompt modifications. The model treats them as authoritative input.

from pydantic_deep import DeepAgent

agent = DeepAgent(
    model="claude-opus-4-5",
    context_manager=True,  # default — enables LimitWarnerCapability
)

Auto-enabled by default. No configuration needed.

BM25 History Search

We also rewrote search_conversation_history from naive substring to BM25:

# Before v0.3.8
results = [msg for msg in history if query in msg.content]

# After v0.3.8 — BM25 ranked, zero external deps
results = search_conversation_history(history, "explain the authentication flow")
# Rare terms rank higher. Multi-word queries tokenized properly.

Pure Python. Zero external dependencies. Standard Lucene BM25 formula.

EvictionCapability

Large tool outputs are intercepted via the after_tool_execute hook before they enter message history — not trimmed after. The difference matters on long tasks.

Key Takeaways

Models have no intrinsic awareness of context usage — that info lives in the orchestration layer
LimitWarnerCapability bridges that gap with runtime user message injection at 70%/85%
BM25 replaces naive substring search for conversation history
EvictionCapability prevents large outputs from entering history at all

Full write-up: oss.vstorm.co/blog/context-window-blindness-ai-agents-limit-warner

GitHub: github.com/vstorm-co/pydantic-deep

Have you hit this? How did the hallucinations manifest?

StuckLoopDetection: How We Stopped an Agent Burning $12 on 47 Identical Calls

Kacper Włodarczyk — Mon, 13 Apr 2026 12:20:49 +0000

TL;DR: Most agent loops aren't model failures — they're mechanical repetitions that the model itself doesn't recognize. pydantic-deep v0.3.8 introduces StuckLoopDetection, a capability that catches three loop patterns before they waste tokens.

This is post 1/3 in the "Self-Aware Agents" series. Overview of all 5 releases here.

Here's the incident that made this necessary.

A coding agent was working on a refactor task overnight. It hit a file with an unusual import pattern, couldn't parse the result, and defaulted to reading the file again.

By morning: 47 calls to read_file on the same path. $12 in API costs. Zero progress.

The model wasn't broken. Each call looked locally reasonable. From outside: it was stuck.

Why Prompting Isn't Enough

"Don't repeat tool calls" in a system prompt works sometimes. The problems:

The model often doesn't recognize loops as loops — each repeated call looks locally justified
Prompt compliance degrades under cognitive load (long tasks, many tools, complex context)
You have to add the instruction to every agent separately

Detection at the capability level fixes all three.

The Three Loop Patterns

Pattern 1: Repeated Identical Calls

Turn 12: read_file(path="src/config.json")  → {"imports": [...], "unknown_field": ...}
Turn 13: read_file(path="src/config.json")  → {"imports": [...], "unknown_field": ...}
Turn 14: read_file(path="src/config.json")  → same result

Agent can't process the result, has no fallback, tries again. Default threshold: 3 calls.

Pattern 2: A-B-A-B Alternating

Turn 8:  list_directory(path="src/")
Turn 9:  read_file(path="src/main.py")
Turn 10: list_directory(path="src/")
Turn 11: read_file(path="src/main.py")

Tool A suggests Tool B, Tool B suggests Tool A. Looks like progress — it's not.

Pattern 3: No-Op Loops

Same call, same result, keeps going. Common with writes, status checks, verification calls.

The Implementation

from pydantic_deep import create_deep_agent
from pydantic_deep.capabilities import StuckLoopDetection

# Default: enabled with threshold=3
agent = create_deep_agent(
    model="anthropic:claude-opus-4-6",
    stuck_loop_detection=True,
)

# Custom config
agent = create_deep_agent(
    model="anthropic:claude-opus-4-6",
    capabilities=[
        StuckLoopDetection(
            max_repeated=5,
            action="warn",   # "warn" = ModelRetry, "error" = StuckLoopError
        )
    ],
)

action="warn" (default)

Triggers ModelRetry. The model gets a message:

You have called read_file(path="src/config.json") 3 times with identical arguments
and received the same result. This indicates a stuck loop. Try a different approach.

Most of the time, the model pivots. If it doesn't — the threshold triggers again.

action="error"

Raises StuckLoopError. Clean failure for automated pipelines.

from pydantic_deep.capabilities import StuckLoopDetection, StuckLoopError

try:
    result = await agent.run("refactor the imports in src/")
except StuckLoopError as e:
    print(f"Agent got stuck: {e.pattern} pattern detected")

Per-Run Isolation

Parallel agent.run() calls don't share stuck-detection state. Each run is isolated via for_run() — no leaked state between concurrent tasks.

# Safe to run concurrently with a shared agent instance
results = await asyncio.gather(
    agent.run("analyze src/module_a.py"),
    agent.run("analyze src/module_b.py"),
)

The Business Case

A 47-call loop at Claude Opus pricing: ~$12. Same task with detection: ~$0.50 + one ModelRetry.

Cost of stuck_loop_detection=True: zero API calls, negligible latency, enabled by default.

Even false positives are cheap: one ModelRetry message, then the model tries a different approach.

Tomorrow: LimitWarnerCapability — teaching agents to know their context window is almost full.

Get Started

curl -fsSL https://raw.githubusercontent.com/vstorm-co/pydantic-deep/main/install.sh | bash

GitHub: github.com/vstorm-co/pydantic-deep
OSS portal: oss.vstorm.co

Pydantic Deep Agents 0.3.3: ACP, Thinking, Lifecycle Hooks, and Opinionated Defaults

Kacper Włodarczyk — Thu, 02 Apr 2026 15:29:56 +0000

We just released pydantic-deep 0.3.3 — and this is the biggest release since we open-sourced the project. ACP support, deep subagents by default, thinking, Anthropic caching, lifecycle hooks, skills as slash commands, and a provider setup wizard.

Let's walk through the changes.

ACP: Your Agent in Any Editor

ACP (Agent Client Protocol) is a standardized protocol that lets AI agents run inside editors. Think of it like LSP, but for AI agents instead of language servers.

Our new apps/acp/ adapter exposes any pydantic-deep agent as an ACP-compatible server. The same agent you run in your terminal now runs in Zed with zero code changes.

What you get:

Streaming text deltas — real-time output, not waiting for completion
Tool call visibility — see tool names, arguments, and results (not a black box)
Model switching — change from Claude to GPT mid-session
Session management — conversation persistence across editor restarts
Auto-detect provider — reads your API keys, no manual config

The adapter wraps the same create_deep_agent() you already use. No new API to learn.

Because ACP is a protocol (not a Zed-specific plugin), it'll work with any editor that adopts it. We expect more editors to support ACP in the coming months.

Subagents Are Now Deep Agents by Default

This is the change that affects the most users.

Previously, subagents were plain pydantic-ai Agents — lightweight, but limited. They couldn't read files, search the web, or remember things between runs.

Now, every subagent (built-in and custom) is created via create_deep_agent() with:

Filesystem access (read, write, edit, grep, glob)
Web search and web fetch
Persistent memory
Large output eviction (auto-save to files when output exceeds 20K tokens)
Orphaned tool call patching

If your custom subagent doesn't specify agent or agent_factory, it automatically gets the full deep agent factory. You don't have to change anything — your subagents just got more capable.

We also replaced include_general_purpose_subagent with include_builtin_subagents, which adds a "research" deep agent for codebase exploration and web research.

Thinking Enabled by Default

thinking="high" is now the default. This enables model reasoning via pydantic-ai's Thinking capability.

We support 7 levels:

agent = create_deep_agent(
    model="anthropic:claude-opus-4-6",
    thinking="high",  # default
    # Options: True, False, "minimal", "low", "medium", "high", "xhigh"
)

For models that don't support thinking (like GPT-4.1), the parameter is silently ignored.

Anthropic Prompt Caching — On by Default

Three new defaults enabled automatically:

anthropic_cache_instructions — cache system prompt
anthropic_cache_tool_definitions — cache tool schemas
anthropic_cache_messages — cache conversation history

This reduces token costs and latency for Anthropic models significantly. For non-Anthropic models, these settings are silently ignored.

5 New Lifecycle Hooks

class HookEvent(Enum):
    BEFORE_RUN = "before_run"
    AFTER_RUN = "after_run"
    RUN_ERROR = "run_error"
    BEFORE_MODEL_REQUEST = "before_model_request"
    AFTER_MODEL_REQUEST = "after_model_request"

These map directly to pydantic-ai's lifecycle hooks. Use cases:

Session tracking — log when an agent run starts and ends
LLM call logging — capture every model request for debugging
Error alerts — get notified when a run fails
Cost monitoring — track token usage per request

Skills as Slash Commands

Skills now work as slash commands in the CLI. Type /code-review and the skill activates directly from the picker.

Discovery follows a 3-tier hierarchy:

Built-in (apps/cli/skills/) — ships with pydantic-deep
User (~/.pydantic-deep/skills/) — your personal skills
Project (.pydantic-deep/skills/) — project-specific skills

Later sources override earlier ones by name, so you can customize built-in skills per project.

Other Notable Changes

compact_conversation tool — the agent can manually trigger context compression with an optional focus topic
Provider setup wizard — first-run auto-detects missing API keys and guides through provider selection
/provider slash command — switch AI provider and model mid-session
/config slash command — view and change settings interactively
approve_tools config — choose which tools need user approval (default: ["execute"])
Enhanced BASE_PROMPT — Claude Code-inspired sections for code quality, careful execution, and formatting
Context files simplified to AGENTS.md and SOUL.md only

Opinionated Defaults

The philosophy behind 0.3.3: make the powerful thing the default thing.

Setting	Before	After
Memory	off	on
Thinking	off	`"high"`
Prompt caching	off	on (Anthropic)
Subagent type	plain Agent	deep agent
Max nesting depth	0	1
Eviction limit	none	20K tokens
Patch tool calls	off	on

You get a capable, production-ready agent with create_deep_agent("anthropic:claude-opus-4-6"). No configuration needed.

Get Started

pip install pydantic-deep-agents

Or try the CLI:

pip install pydantic-deep-agents[cli]
pydantic-deep

Full changelog: CHANGELOG.md

Repo: github.com/vstorm-co/pydantic-deepagents

We build open-source AI agent tooling at Vstorm. pydantic-deep is our framework — modular, type-safe, production-tested across 30+ deployments.

Pydantic AI Capabilities, Hooks & Agent Specs - What Changed and How Our Libraries Migrated

Kacper Włodarczyk — Mon, 30 Mar 2026 14:38:10 +0000

Pydantic AI just shipped the biggest API change since launch. Capabilities, hooks, and agent specs landed in v1.71+, and they fundamentally change how you extend agents.

We maintain 5 open-source libraries built on top of Pydantic AI: pydantic-ai-shields (formerly pydantic-ai-middleware), pydantic-ai-subagents, pydantic-ai-summarization, pydantic-ai-backend, and the full-stack AI agent template. All five have been migrated. This article covers what changed, why it matters, and real before/after code from our repos.

What Are Capabilities?

Capabilities are reusable, composable units of agent behavior. Instead of threading multiple configuration arguments separately -- tools here, instructions there, model settings somewhere else -- a capability bundles everything into a single capabilities parameter on the Agent constructor.

Each capability can provide:

Tools (via get_toolset())
Instructions (static strings or dynamic callables)
Model settings (per-step configuration)
Lifecycle hooks (before/after/wrap patterns for runs, model requests, tool calls)
Tool preparation (filter or modify tool definitions per step)

The base class is AbstractCapability. You subclass it, override the methods you need, and pass instances to the agent:

from pydantic_ai import Agent
from pydantic_ai.capabilities import AbstractCapability

agent = Agent(
    "openai:gpt-4.1",
    capabilities=[MyCapability(), AnotherCapability()],
)

Multiple capabilities compose automatically. Before-hooks fire in order (cap1 then cap2), after-hooks fire reversed (cap2 then cap1), and wrap-hooks nest as middleware layers. This is not something we had to build -- the framework handles it.

What Are Hooks?

Hooks are the lifecycle interception points within capabilities. Pydantic AI provides hooks at four levels:

Run hooks -- before_run, wrap_run, after_run, on_run_error
Node hooks -- before_node_run, wrap_node_run, after_node_run, on_node_run_error
Model request hooks -- before_model_request, wrap_model_request, after_model_request, on_model_request_error
Tool hooks -- split into validation and execution phases, each with before/wrap/after/error variants

Plus prepare_tools for filtering tool visibility per step.

That's roughly 20 hook points across 4 lifecycle levels. Error hooks use a neat pattern: raise to propagate, return to recover. If your error handler raises the original exception, it propagates unchanged. Raise a different exception to transform the error. Return a result to suppress it entirely.

For simple use cases, the Hooks capability gives you decorator-based registration without subclassing:

from pydantic_ai.capabilities import Hooks

hooks = Hooks()

@hooks.on.before_model_request
async def log_request(ctx, request_context):
    print(f"Sending {len(request_context.messages)} messages")
    return request_context

agent = Agent("openai:gpt-4.1", capabilities=[hooks])

What Are Agent Specs?

Agent specs separate agent configuration from code entirely. You define your agent in YAML or JSON:

model: anthropic:claude-opus-4-6
instructions: You are a helpful assistant.
capabilities:
  - WebSearch
  - Thinking:
      effort: high

Then load it:

agent = Agent.from_file("agent.yaml")

Capabilities that implement get_serialization_name() and from_spec() are automatically available. This means your custom capabilities can be YAML-driven too.

How Our Libraries Migrated

pydantic-ai-middleware to pydantic-ai-shields

This was the most dramatic change. Our middleware library had grown to include MiddlewareAgent, MiddlewareChain, ParallelMiddleware, ConditionalMiddleware, PipelineSpec, config loaders, a compiler -- a whole parallel abstraction layer.

We deleted all of it.

The v0.3.0 release renamed the package to pydantic-ai-shields and rebuilt everything as capabilities.

Before (middleware era):

from pydantic_ai_middleware import MiddlewareAgent, CostTrackingMiddleware

middleware_agent = MiddlewareAgent(
    agent,
    middlewares=[CostTrackingMiddleware(budget_limit_usd=5.0)]
)
result = await middleware_agent.run("Hello")

After (capabilities era):

from pydantic_ai import Agent
from pydantic_ai_shields import CostTracking, PromptInjection, PiiDetector

agent = Agent(
    "openai:gpt-4.1",
    capabilities=[
        CostTracking(budget_usd=5.0),
        PromptInjection(sensitivity="high"),
        PiiDetector(detect=["email", "ssn", "credit_card"]),
    ],
)
result = await agent.run("Hello")

No wrapper agent. No middleware chain. The shields are just capabilities that hook into before_run, after_run, prepare_tools, and before_tool_execute as needed.

The new package ships 10 capabilities: CostTracking, ToolGuard, InputGuard, OutputGuard, AsyncGuardrail for infrastructure, plus PromptInjection, PiiDetector, SecretRedaction, BlockedKeywords, and NoRefusals as zero-dependency content shields.

pydantic-ai-subagents

The subagents library now exposes SubAgentCapability that bundles the subagent toolset and dynamic instructions:

from pydantic_ai import Agent
from subagents_pydantic_ai import SubAgentCapability, SubAgentConfig

agent = Agent(
    "openai:gpt-4.1",
    capabilities=[SubAgentCapability(
        subagents=[
            SubAgentConfig(
                name="researcher",
                description="Researches topics",
                instructions="You are a research assistant.",
            ),
        ],
    )],
)

The capability provides tools via get_toolset() and injects instructions via get_instructions(). It also supports agent spec serialization.

pydantic-ai-summarization

Four capabilities replace the old middleware-based context management:

SummarizationCapability -- triggers LLM summarization when thresholds are reached
SlidingWindowCapability -- zero-cost alternative that discards oldest messages
LimitWarnerCapability -- injects warnings when limits approach
ContextManagerCapability -- full package: token tracking, auto-compression, tool output truncation

from pydantic_ai import Agent
from pydantic_ai_summarization.capability import ContextManagerCapability

agent = Agent(
    "openai:gpt-4.1",
    capabilities=[ContextManagerCapability(max_tokens=100_000)],
)

pydantic-ai-backend

Our filesystem toolkit became ConsoleCapability -- bundling tools, instructions, and permission enforcement:

from pydantic_ai import Agent
from pydantic_ai_backends import ConsoleCapability
from pydantic_ai_backends.permissions import READONLY_RULESET

agent = Agent(
    "openai:gpt-4.1",
    capabilities=[ConsoleCapability(permissions=READONLY_RULESET)],
)

Key Takeaways

1. Delete your abstraction layers. We removed thousands of lines of middleware code. The framework does it better now.

2. Composition is free. Multiple capabilities stack without you writing any merge logic.

3. Agent specs change deployment. Define agent behavior in YAML, deploy by changing a config file.

4. The migration is mechanical. For each middleware hook, there's a direct capability equivalent.

5. Think in capabilities, not agents. The old pattern: build a specialized agent. The new pattern: build a capability, attach it to any agent.

Full RAG Pipeline: 4 Vector Stores, Hybrid Search, and Reranking in One Template

Kacper Włodarczyk — Wed, 25 Mar 2026 01:25:12 +0000

We Added Full RAG to Our Open-Source AI Template: 4 Vector Stores, Hybrid Search, and Reranking

One template, every RAG decision already made — from vector store to reranking strategy.

You know the drill. You want to add RAG to your AI app. So you start: pick a vector database, write an embedding pipeline, figure out chunking, wire up retrieval, add it to your agent as a tool, build a frontend to manage documents...

Three weeks later you have a working prototype. Then someone asks "can we try Qdrant instead of Milvus?" and you realize your vector store is hardcoded in 14 places.

We just shipped v0.2.2 of our open-source full-stack AI template, and RAG was the biggest addition. Not a toy demo — a production pipeline with 4 vector stores, 4 embedding providers, hybrid search, reranking, document versioning, and a management dashboard. All configurable. All swappable.

Here's what we built and why.

I'm Kacper, AI Engineer at Vstorm — an Applied Agentic AI Engineering Consultancy. We've shipped 30+ production AI agent implementations and open-source our tooling at github.com/vstorm-co. Connect with me on LinkedIn.

The Architecture: 5 Steps, Every One Configurable

Every RAG system does the same thing: parse → chunk → embed → store → search. The difference is how many decisions you have to make at each step.

In our template, each step is a pluggable abstraction:

Document Upload
  │
  ├── Parse: PyMuPDF (default) | LlamaParse (130+ formats) | python-docx
  │
  ├── Chunk: recursive (default) | markdown | fixed
  │     └── chunk_size=512, overlap=50 (configurable via env vars)
  │
  ├── Embed: OpenAI | Voyage | Gemini (multimodal) | SentenceTransformers (local)
  │     └── dimensions auto-derived from model name
  │
  ├── Store: Milvus | Qdrant | ChromaDB | pgvector
  │
  └── Search: vector | hybrid (BM25 + vector + RRF) | + reranking (Cohere | CrossEncoder)

You pick your stack during project generation. The template wires everything up. No glue code.

4 Vector Stores, 1 Interface

The biggest design decision was making vector stores swappable. We implemented BaseVectorStore with four backends:

class BaseVectorStore(ABC):
    async def insert_document(self, collection_name: str, document: Document) -> None
    async def search(self, collection_name: str, query: str, limit: int = 4) -> list[SearchResult]
    async def delete_document(self, collection_name: str, document_id: str) -> None
    async def get_collection_info(self, collection_name: str) -> CollectionInfo

Milvus — production-grade, runs as 3 Docker services (etcd + MinIO + Milvus). Best for large-scale deployments. Cosine similarity with IVF_FLAT indexing.

Qdrant — single Docker service, great balance of performance and simplicity. Our default recommendation for most teams.

ChromaDB — embedded mode, zero Docker required. Perfect for prototyping and local development. Just pip install chromadb.

pgvector — uses your existing PostgreSQL. No new infrastructure. HNSW indexing. If you already have Postgres, this is the lowest-friction option.

Switching between them? One environment variable:

# In your .env:
VECTOR_STORE=qdrant    # or: milvus, chromadb, pgvector

The template handles connection strings, Docker services, schema creation, and index configuration automatically.

Hybrid Search: Why Vector-Only Isn't Enough

Pure vector search works well for semantic queries ("documents about building safety"). It fails on exact matches ("find contract #2024-0847") because embeddings don't preserve exact strings.

Our hybrid search combines both:

async def retrieve(self, query: str, collection_name: str, limit: int = 5):
    # Step 1: Vector search (semantic)
    raw_results = await self.store.search(collection_name, query, limit=limit * fetch_multiplier)

    # Step 2: BM25 keyword search
    if self._hybrid_enabled:
        bm25_results = await self._bm25_search(query, collection_name, limit * fetch_multiplier)
        if bm25_results:
            raw_results = self._rrf_fuse(raw_results, bm25_results)

    # Step 3: Rerank (optional)
    if should_rerank and self.rerank_service:
        results = await self.rerank_service.rerank(query=query, results=raw_results, top_k=limit * 2)

    return results[:limit]

The fusion uses Reciprocal Rank Fusion (RRF) — a simple but effective algorithm that combines rankings from multiple sources:

@staticmethod
def _rrf_fuse(vector_results, bm25_results, k=60):
    scores = {}
    for rank, r in enumerate(vector_results):
        key = r.content[:100]
        scores[key] = scores.get(key, 0) + 1.0 / (k + rank + 1)
    for rank, r in enumerate(bm25_results):
        key = r.content[:100]
        scores[key] = scores.get(key, 0) + 1.0 / (k + rank + 1)
    return sorted_by_score(scores)

Enable it with one env var: RAG_HYBRID_SEARCH=true.

Reranking: The Quality Multiplier

Initial retrieval casts a wide net. Reranking narrows it down. We support two options:

Cohere Reranker (API) — the fastest way to improve retrieval quality. Send your results + query, get them re-scored by a model trained specifically for relevance ranking:

response = await self.client.rerank(
    query=query,
    documents=[result.content for result in results],
    model="rerank-v3.5",
    top_n=top_k,
)

CrossEncoder (local) — runs a SentenceTransformers cross-encoder model locally. No API calls, no data leaves your infrastructure:

pairs = [[query, result.content] for result in results]
scores = self.model.predict(pairs)  # Runs locally on CPU/GPU

The pipeline is: retrieve 3× more results than needed → rerank → return top-k. This consistently improves precision without touching your embeddings or vector store.

Document Versioning: SHA256 Dedup

Re-ingesting a document shouldn't create duplicates. Our pipeline uses content hashing:

async def ingest_file(self, filepath, collection_name, replace=True):
    document = await self.processor.process_file(filepath)

    # Check for existing version by source path or content hash
    existing_id = await self._find_existing_by_source(collection_name, source_path)
    if not existing_id:
        existing_id = await self._find_existing_by_hash(collection_name, document.metadata.content_hash)

    # Replace old chunks with new ones
    if existing_id:
        await self.store.delete_document(collection_name, existing_id)

    await self.store.insert_document(collection_name, document)

Google Drive sync? Same logic — changed files get re-embedded, unchanged files skip.

4 Embedding Providers

Provider	Model	Dimensions	API Key?
OpenAI	text-embedding-3-small	1536	Yes
Voyage	voyage-3	1024	Yes
Gemini	gemini-embedding-exp-03-07	3072	Yes
SentenceTransformers	all-MiniLM-L6-v2	384	No (local)

Dimensions are auto-derived from the model name — no manual configuration:

EMBEDDING_DIMENSIONS = {
    "text-embedding-3-small": 1536,
    "voyage-3": 1024,
    "gemini-embedding-exp-03-07": 3072,
    "all-MiniLM-L6-v2": 384,
}

Gemini is the interesting one — it supports multimodal embeddings. Text and images in the same vector space. We use it for image description extraction from PDFs.

The Agent Integration

RAG becomes an agent tool — search_knowledge_base — available to all 5 AI frameworks (Pydantic AI, LangChain, LangGraph, CrewAI, DeepAgents):

async def search_knowledge_base(
    query: str,
    collection: str = "documents",
    collections: list[str] | None = None,  # Multi-collection search
    top_k: int = 5,
) -> str:
    """Search with automatic reranking & hybrid search if enabled."""

Results include source attribution: filename, page number, chunk number, and similarity score. The agent's system prompt instructs it to cite sources with [1], [2] references.

Key Takeaways

RAG is a pipeline of 5 decisions (parse, chunk, embed, store, search) — our template makes each one configurable without code changes
Vector-only search misses exact matches — hybrid (BM25 + vector + RRF) catches both semantic and keyword queries
Reranking is the cheapest quality improvement — 3× over-retrieve + rerank consistently beats tuning embeddings
Document versioning prevents duplicate chunks — SHA256 content hash + source path tracking
One env var switches everything — VECTOR_STORE=pgvector, RAG_HYBRID_SEARCH=true, EMBEDDING_MODEL=voyage-3

Try it yourself

full-stack-ai-agent-template — generates production-ready FastAPI + Next.js AI apps with full RAG pipeline

pip install fastapi-fullstack

Related:

AI Agent Configurator — configure 75+ options visually, download as ZIP
Step-by-step guides — 50 tutorials across 5 frameworks

More from Vstorm's open-source ecosystem:

All our open-source projects — 13 packages for the Pydantic AI ecosystem
awesome-pydantic-ai — curated list of Pydantic AI resources and tools
vstorm.co — our consultancy (30+ AI agent implementations)

If this was useful, follow me on LinkedIn for daily AI agent insights.

It's been a while since my last post, but I'm back with new content for you.

Kacper Włodarczyk — Tue, 17 Mar 2026 12:38:26 +0000

From 0 to Production AI Agent in 30 Minutes — Full-Stack Template with 5 AI Frameworks

Kacper Włodarczyk ・ Mar 17

#ai #programming #python #softwareengineering

From 0 to Production AI Agent in 30 Minutes — Full-Stack Template with 5 AI Frameworks

Kacper Włodarczyk — Tue, 17 Mar 2026 12:37:41 +0000

Every AI project starts the same way.

You need a FastAPI backend. Then authentication — JWT tokens, refresh logic, user management. Then a database — PostgreSQL, migrations, async connections. Then WebSocket streaming for real-time AI responses. Then a frontend — Next.js, state management, chat UI. Then Docker. Then CI/CD.

Three days of boilerplate before you write a single line of AI code.

I've set up this stack from scratch more times than I'd like to admit. After the third project where I copy-pasted the same auth middleware, the same WebSocket handler, the same Docker Compose config — I decided to build a generator that does all of it in one command.

The result: [[full-stack-ai-agent-template]] — an open-source full-stack template with 5 AI frameworks, 75+ configuration options, and a web configurator that generates your entire project in minutes.

614 stars on GitHub. Used by teams at NVIDIA, Pfizer, TikTok, and others. And you can go from zero to a running production AI agent in about 30 minutes.

Let me walk you through exactly how.

Step 1: Open the Web Configurator

Go to oss.vstorm.co/full-stack-ai-agent-template/configurator/.

No CLI installation needed. No pip. Just a browser.

The configurator gives you a visual interface to pick every option for your project. Database, auth, AI framework, background tasks, observability, frontend — all of it. You see the full config before you generate anything.

Alternatively, if you prefer the terminal:

pip install fastapi-fullstack
fastapi-fullstack

This launches the interactive wizard that walks you through the same options.

Step 2: Pick a Preset (or Go Custom)

The template ships with three presets that cover the most common use cases:

Preset	What you get
`--minimal`	Bare FastAPI app — no database, no auth, no extras
`--preset ai-agent`	PostgreSQL + JWT auth + AI agent + WebSocket streaming + conversation persistence + Redis
`--preset production`	Full production setup — Redis, caching, rate limiting, Sentry, Prometheus, Kubernetes

For this walkthrough, I'll use the AI Agent preset with Pydantic AI — the most common starting point for AI applications:

fastapi-fullstack create my_ai_app \
  --preset ai-agent \
  --ai-framework pydantic_ai \
  --frontend nextjs

That single command generates a full-stack project with:

FastAPI backend with async PostgreSQL
JWT authentication with user management
Pydantic AI agent with WebSocket streaming
Conversation persistence (chat history saved to DB)
Redis for caching and sessions
Next.js 15 frontend with React 19 and Tailwind CSS v4
Docker Compose for the full stack
GitHub Actions CI/CD
Logfire observability

Step 3: Look at What You Got

The generated project follows a clean layered architecture — Repository + Service pattern, inspired by real production codebases:

my_ai_app/
├── backend/
│   ├── app/
│   │   ├── main.py              # FastAPI app with lifespan
│   │   ├── api/routes/v1/       # Versioned API endpoints
│   │   ├── core/                # Config, security, middleware
│   │   ├── db/models/           # SQLAlchemy models
│   │   ├── schemas/             # Pydantic schemas
│   │   ├── repositories/        # Data access layer
│   │   ├── services/            # Business logic
│   │   ├── agents/              # AI agents (this is where your code goes)
│   │   └── commands/            # Django-style CLI commands
│   ├── cli/                     # Project CLI
│   ├── tests/                   # pytest test suite
│   └── alembic/                 # Database migrations
├── frontend/
│   ├── src/
│   │   ├── app/                 # Next.js App Router
│   │   ├── components/          # React components (chat UI included)
│   │   ├── hooks/               # useChat, useWebSocket
│   │   └── stores/              # Zustand state management
├── docker-compose.yml
├── Makefile
├── CLAUDE.md                    # AI coding assistant context
└── AGENTS.md                    # Multi-agent project guide

Notice the CLAUDE.md and AGENTS.md files — the generated project is optimized for AI coding assistants like Claude Code, Cursor, and Copilot. It follows progressive disclosure best practices so your AI assistant understands the project structure immediately.

Step 4: Start Everything with Docker

cd my_ai_app
make docker-up        # Backend + PostgreSQL + Redis
make docker-frontend  # Next.js frontend

That's it. Two commands. The entire stack is running:

API: http://localhost:8000
API Docs: http://localhost:8000/docs
Frontend: http://localhost:3000
Admin Panel: http://localhost:8000/admin

If you prefer running without Docker, the template generates a Makefile with shortcuts:

make install       # Install Python + Node dependencies
make docker-db     # Start just PostgreSQL
make db-migrate    # Create initial migration
make db-upgrade    # Apply migrations
make create-admin  # Create admin user
make run           # Start backend
cd frontend && bun dev  # Start frontend

Step 5: Your AI Agent Is Already Working

Open http://localhost:3000, log in, and start chatting. The AI agent is already wired up — WebSocket streaming, conversation history, tool calls — all functional out of the box.

Here's what the generated agent looks like:

# app/agents/assistant.py
from pydantic_ai import Agent, RunContext
from dataclasses import dataclass

@dataclass
class Deps:
    user_id: str | None = None
    db: AsyncSession | None = None

agent = Agent[Deps, str](
    model="openai:gpt-4o-mini",
    system_prompt="You are a helpful assistant.",
)

@agent.tool
async def search_database(ctx: RunContext[Deps], query: str) -> list[dict]:
    """Search the database for relevant information."""
    # Access user context and database via ctx.deps
    ...

Type-safe. Dependency injection built in. Tool calling with full context access. This isn't a toy example — it's the same pattern we use in production at [[Vstorm]].

The WebSocket endpoint handles streaming automatically:

@router.websocket("/ws")
async def agent_ws(websocket: WebSocket):
    await websocket.accept()

    async for event in agent.stream(user_input):
        await websocket.send_json({
            "type": "text_delta",
            "content": event.content
        })

Step 6: Customize the AI Layer

Here's the key insight: everything except the AI agent is production-ready infrastructure that you don't need to touch. Auth works. Database works. Streaming works. Frontend works.

You modify one directory: app/agents/.

Want to change from OpenAI to Anthropic? Update the model string:

agent = Agent[Deps, str](
    model="anthropic:claude-sonnet-4-5",
    system_prompt="You are a helpful assistant.",
)

Want to add a tool? Add a function:

@agent.tool
async def get_weather(ctx: RunContext[Deps], city: str) -> str:
    """Get current weather for a city."""
    async with httpx.AsyncClient() as client:
        resp = await client.get(f"https://api.weather.com/{city}")
        return resp.json()["summary"]

Want to switch to LangChain or CrewAI entirely? Regenerate the project with a different --ai-framework flag. The rest of the stack stays the same.

5 AI Frameworks, One Template

The template supports five AI frameworks, all with the same backend infrastructure:

Framework	Best for	Observability
Pydantic AI	Type-safe agents, dependency injection	Logfire
LangChain	Chains, existing LangChain tooling	LangSmith
LangGraph	Complex multi-step workflows, ReAct agents	LangSmith
CrewAI	Multi-agent crews, role-based agents	LangSmith
DeepAgents	Claude Code-style agentic coding, HITL	LangSmith

You pick the framework when generating the project. The WebSocket streaming, conversation persistence, auth, and frontend all work the same way regardless of which framework you choose.

# Generate with LangGraph
fastapi-fullstack create my_app --preset ai-agent --ai-framework langgraph --frontend nextjs

# Generate with CrewAI
fastapi-fullstack create my_app --preset ai-agent --ai-framework crewai --frontend nextjs

75+ Configuration Options

Beyond AI frameworks, the template covers the full spectrum of production needs:

Databases: PostgreSQL (async), MongoDB (async), SQLite
ORMs: SQLAlchemy, SQLModel
Auth: JWT + refresh tokens, API keys, Google OAuth
Background tasks: Celery, Taskiq, ARQ
Observability: Logfire, LangSmith, Sentry, Prometheus
Infrastructure: Docker, Kubernetes, GitHub Actions, GitLab CI, Traefik, Nginx
Frontend: Next.js 15 with React 19, TypeScript, Tailwind CSS v4, dark mode, i18n
Extras: Redis caching, rate limiting, SQLAdmin panel, webhooks, S3 file storage, RAG with Milvus

Every option is a boolean flag. No Jinja template hacking. No post-generation cleanup. The generator produces clean code that only includes what you selected.

Key Takeaways

The web configurator at oss.vstorm.co lets you visually configure and download a full-stack AI project — no CLI needed.
Three presets (minimal, ai-agent, production) cover 90% of use cases — customize from there.
5 AI frameworks share the same infrastructure — switch frameworks without rewriting your backend.
The generated code is production-grade, not a prototype — layered architecture, async everywhere, type-safe.
You modify app/agents/ and nothing else — auth, streaming, persistence, frontend are done.

Try it yourself

full-stack-ai-agent-template — Production-ready full-stack AI agent template with 5 frameworks and 75+ options.

pip install fastapi-fullstack

Or use the Web Configurator — no installation needed.

More from Vstorm's open-source ecosystem:

All our open-source projects — 13 packages for the Pydantic AI ecosystem
awesome-pydantic-ai — curated list of Pydantic AI resources and tools
vstorm.co — our consultancy (30+ AI agent implementations)

If this was useful, follow me on LinkedIn for daily AI agent insights.

Pydantic-DeepAgents: A Lightweight, Production-Ready Framework for Building Autonomous AI Agents

Kacper Włodarczyk — Mon, 22 Dec 2025 01:19:27 +0000

Inspired by LangChain deepagents — but simpler, type-safe, and with Docker sandboxing built-in

In 2025, autonomous AI agents are no longer just research prototypes — they’re powering real-world automation, code generation tools, data pipelines, and intelligent assistants. However, many popular agent frameworks come with heavy dependencies, complex graphs, and a steep learning curve that makes production deployment challenging.

That’s why we at Vstorm built Pydantic-DeepAgents — a minimal yet powerful open-source framework that extends Pydantic-AI with everything you need to create reliable, production-grade agents.

GitHub repository: https://github.com/vstorm-co/pydantic-deepagents

What makes Pydantic-DeepAgents different?

We were heavily inspired by LangChain’s excellent deepagents project — a clean implementation of “deep agent” patterns including planning loops, tool calling, subagent delegation, and human-in-the-loop workflows.

Instead of reinventing the wheel, we asked: What if we built the same powerful patterns, but fully in the Pydantic-AI ecosystem?

The result is a framework that:

Keeps dependencies lightweight (no LangGraph, no massive ecosystem)
Leverages Pydantic’s native type-safety and validation for structured outputs
Adds production-focused features missing from many alternatives

Core Features

Planning & Reasoning — TodoToolset for autonomous task breakdown and self-correction
Filesystem Access — Full read/write operations with FilesystemToolset
Subagent Delegation — Break complex tasks into specialized subagents (SubAgentToolset)
Extensible Skills System — Define new agent capabilities with simple Markdown prompts (perfect for rapid iteration)
Multiple Backends — In-memory, persistent filesystem, secure DockerSandbox (isolated code execution), and CompositeBackend
File Uploads — Seamless processing of uploaded files via run_with_files() or deps.upload_file()
Context Management — Automatic summarization for long-running conversations
Human-in-the-Loop — Built-in confirmation workflows for critical actions
Streaming Support — Token-by-token responses for responsive UIs
Structured Outputs — Type-safe Pydantic models via output_type

See It in Action

We’ve included a complete full-stack demo application (FastAPI backend + streaming web UI) that demonstrates:

Live agent reasoning traces
File uploads and processing
Human approval steps
Streaming responses

Demo app: https://github.com/vstorm-co/pydantic-deepagents/tree/main/examples/full_app

Quick video walkthrough: https://drive.google.com/file/d/1hqgXkbAgUrsKOWpfWdF48cqaxRht-8od/view?usp=sharing

When to choose Pydantic-DeepAgents?

Choose it when you want:

A clean, maintainable agent architecture without framework bloat
Strong guarantees around data validation and structured responses
Secure execution (Docker sandbox out of the box)
Fast prototyping with Markdown-defined skills
Easy deployment in production environments

It’s particularly great if you’re already using Pydantic-AI, prefer minimalism, or need agents that interact safely with files and external tools.

Get Started Today

pip install pydantic-deep

Check out the repository, star it if you find it useful, and feel free to open issues or PRs — we’d love contributions!

https://github.com/vstorm-co/pydantic-deepagents

We’re excited to see what you build with it.

— Team at Vstorm (https://vstorm.co)

Forem: Kacper Włodarczyk

24 Claude Code Skills to Fix Your AI Stack: Introducing production-stack-skills and content-skills

production-stack-skills: 10 Claude Code Skills for Production-Ready AI Code

content-skills: 14 Brand-First Skills That Kill AI Slop

One Install, Dual-CLI, Uninstall-Safe

Why Two Packs Shipped the Same Day

Key Takeaways

Frequently Asked Questions

What are Claude Code skills and how do I install them?

What's the difference between production-stack-skills and content-skills?

Do these skills work with Codex or only Claude Code?

How long does /production check take on a real repo?

When should I use a Claude Code skill instead of writing a full agent?

Get Started

Full Observability in AI Agents: What We Added to the pydantic-deepagents TUI

Full Observability in AI Agents: What We Added to the pydantic-deepagents TUI

The problem with invisible agents

What changed

Per-turn token usage

Cumulative cost in the header

Thinking streamed live → collapsed

Side panel on startup

All tool calls visible

Session saved on crash

Subagent logs: 20K chars (was 2K)

The full layout

Try it

Browser Automation + /improve: AI Agents That Browse the Web and Fix Themselves

Part 1: BrowserCapability — 9 Playwright Tools

Part 2: /improve — Session-Based Self-Improvement

This Week's Full Stack

curl | bash for AI Agents: One-Command Install for pydantic-deep

What install.sh does

Self-update

Startup notifications

Why uv?

Context Window Blindness: Why Your AI Agent Doesn't Know It's Running Out of Space

The Problem: Two Different Realities

The Fix: LimitWarnerCapability

BM25 History Search

EvictionCapability

Key Takeaways

StuckLoopDetection: How We Stopped an Agent Burning $12 on 47 Identical Calls

Why Prompting Isn't Enough

The Three Loop Patterns

Pattern 1: Repeated Identical Calls

Pattern 2: A-B-A-B Alternating

Pattern 3: No-Op Loops

The Implementation

action="warn" (default)

action="error"

Per-Run Isolation

The Business Case

Get Started

Pydantic Deep Agents 0.3.3: ACP, Thinking, Lifecycle Hooks, and Opinionated Defaults

ACP: Your Agent in Any Editor

Subagents Are Now Deep Agents by Default

Thinking Enabled by Default

Anthropic Prompt Caching — On by Default

5 New Lifecycle Hooks

Skills as Slash Commands

Other Notable Changes

Opinionated Defaults

Get Started

Pydantic AI Capabilities, Hooks & Agent Specs - What Changed and How Our Libraries Migrated

What Are Capabilities?

What Are Hooks?

What Are Agent Specs?

How Our Libraries Migrated

pydantic-ai-middleware to pydantic-ai-shields

pydantic-ai-subagents

pydantic-ai-summarization

pydantic-ai-backend

Key Takeaways

Links

Full RAG Pipeline: 4 Vector Stores, Hybrid Search, and Reranking in One Template

We Added Full RAG to Our Open-Source AI Template: 4 Vector Stores, Hybrid Search, and Reranking

The Architecture: 5 Steps, Every One Configurable

4 Vector Stores, 1 Interface

Hybrid Search: Why Vector-Only Isn't Enough

How long does `/production check` take on a real repo?