<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Shilpa Mitra</title>
    <description>The latest articles on Forem by Shilpa Mitra (@shilpamitra).</description>
    <link>https://forem.com/shilpamitra</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3943639%2F687febe9-259f-4c1e-a2a0-798ce0d6cc2b.png</url>
      <title>Forem: Shilpa Mitra</title>
      <link>https://forem.com/shilpamitra</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/shilpamitra"/>
    <language>en</language>
    <item>
      <title>How Claude Code Achieves a 92% Cache Hit Rate: A Deep Dive Into Prompt Caching for AI Agents</title>
      <dc:creator>Shilpa Mitra</dc:creator>
      <pubDate>Sun, 24 May 2026 17:08:31 +0000</pubDate>
      <link>https://forem.com/shilpamitra/how-claude-code-achieves-a-92-cache-hit-rate-a-deep-dive-into-prompt-caching-for-ai-agents-1hca</link>
      <guid>https://forem.com/shilpamitra/how-claude-code-achieves-a-92-cache-hit-rate-a-deep-dive-into-prompt-caching-for-ai-agents-1hca</guid>
      <description>&lt;p&gt;If you're running AI agents in production, there's a cost you're probably not thinking about.&lt;/p&gt;

&lt;p&gt;Every turn in an agentic conversation sends the full prompt to the model. That includes the system instructions, all the tool definitions, any project context that was loaded earlier, and the entire conversation history. The model processes all of it. From the top. Every single time.&lt;/p&gt;

&lt;p&gt;For a quick two-turn interaction, this doesn't matter much. But for a 50-turn coding session where the system prompt alone is 20,000 tokens? That's 1 million tokens of repeated computation across the session, all billed at full input price, all producing zero new insight. The model already processed that system prompt 49 turns ago. It's just doing it again because nothing told it not to.&lt;/p&gt;

&lt;p&gt;This is the problem prompt caching solves. And Claude Code is probably the best case study of how to do it right.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Two Parts of Every Prompt
&lt;/h2&gt;

&lt;p&gt;The first thing to understand is that not all tokens in a prompt are created equal.&lt;/p&gt;

&lt;p&gt;Look at any agentic API call and you'll see two distinct layers:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The foundation.&lt;/strong&gt; This is everything that stays the same from turn to turn. System instructions, tool schemas, project-level context like a &lt;code&gt;CLAUDE.md&lt;/code&gt; file, behavioral rules. If you looked at turn 1 and turn 47 side by side, this part would be identical.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The conversation.&lt;/strong&gt; This is everything that's different each turn. The user's latest message, tool call results, file contents that were just read, terminal output. This grows with every interaction and is genuinely new information the model needs to process.&lt;/p&gt;

&lt;p&gt;The entire trick behind prompt caching is recognizing that the foundation doesn't need to be reprocessed. You compute it once, store the result, and reuse it on every subsequent turn. The model only does fresh work on the conversation layer.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's Actually Being Cached: The Transformer Angle
&lt;/h2&gt;

&lt;p&gt;This isn't just skipping a string comparison. To understand why caching cuts costs so dramatically, you need to know what the model does when it reads a prompt.&lt;/p&gt;

&lt;p&gt;LLM inference has two stages:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Prefill&lt;/strong&gt;: the model takes your entire input and runs it through dense matrix multiplications, token by token, building an internal representation. This is computationally expensive and it's where most of the time and cost goes.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Decode&lt;/strong&gt;: the model generates its response one token at a time, mostly just reading from the state it already built during prefill.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;During prefill, the model computes three vectors for every token: &lt;strong&gt;Query&lt;/strong&gt;, &lt;strong&gt;Key&lt;/strong&gt;, and &lt;strong&gt;Value&lt;/strong&gt;. These are the building blocks of the attention mechanism, how the model figures out which parts of the input matter for which other parts.&lt;/p&gt;

&lt;p&gt;The important property: Key and Value vectors for any given token only depend on the tokens before it. They're deterministic. If the input is the same, the output is the same.&lt;/p&gt;

&lt;p&gt;So once you've computed the Key-Value pairs for a 20,000-token system prompt, you can store them. Next time a request comes in with that same prefix, you skip the entire prefill computation for those 20,000 tokens and go straight to processing the new content.&lt;/p&gt;

&lt;p&gt;Anthropic's infrastructure does this by hashing the input prefix. Same hash, same cached tensors, no recomputation. Different hash (even one byte different), full recomputation.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Economics
&lt;/h2&gt;

&lt;p&gt;Here's where this gets concrete. Anthropic's caching pricing has three tiers:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Operation&lt;/th&gt;
&lt;th&gt;Multiplier&lt;/th&gt;
&lt;th&gt;What it means&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Cache reads&lt;/td&gt;
&lt;td&gt;0.1x base input price&lt;/td&gt;
&lt;td&gt;90% discount on every cached token&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5-minute cache writes&lt;/td&gt;
&lt;td&gt;1.25x base input price&lt;/td&gt;
&lt;td&gt;Small premium to store the KV tensors&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1-hour cache writes&lt;/td&gt;
&lt;td&gt;2x base input price&lt;/td&gt;
&lt;td&gt;Extended TTL for longer sessions&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;For Claude Sonnet 4.6 (&lt;code&gt;$3/MTok&lt;/code&gt; base input), here's what that looks like in practice:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Standard input:     $3.00 / MTok
Cache read:         $0.30 / MTok   (90% savings)
5-min cache write:  $3.75 / MTok   (25% premium, one-time)
1-hour cache write: $6.00 / MTok   (2x premium, one-time)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A cache hit costs 10% of standard input. That means caching pays for itself after just one subsequent read for the 5-minute duration. For a 50-turn session reusing a 20,000-token prefix, the savings compound on every single turn.&lt;/p&gt;

&lt;h2&gt;
  
  
  Tracking a Real Claude Code Session
&lt;/h2&gt;

&lt;p&gt;Theory is nice. Let's trace the actual token economics of a single debugging session to see where the money goes.&lt;/p&gt;

&lt;p&gt;You open Claude Code in a Next.js project. The moment the session starts, it loads the system prompt, all available tool definitions (&lt;code&gt;file read&lt;/code&gt;, &lt;code&gt;file write&lt;/code&gt;, &lt;code&gt;bash&lt;/code&gt;, &lt;code&gt;grep&lt;/code&gt;, &lt;code&gt;glob&lt;/code&gt;, and others), and your project's &lt;code&gt;CLAUDE.md&lt;/code&gt;. That initial payload lands somewhere around 20,000 tokens. Every single one of those tokens is processed fresh. This is the only time you pay full price for them.&lt;/p&gt;

&lt;p&gt;You type:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"There's a race condition in the checkout flow. Orders are occasionally duplicating when users double-click the submit button."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Claude Code doesn't just start editing files. First, it spins up an &lt;strong&gt;Explore subagent&lt;/strong&gt; to understand the codebase. That subagent reads your API routes, checks your database schema, looks at your order processing logic, and examines the frontend form handler. All of those file reads and grep results get appended to the growing conversation as tool outputs.&lt;/p&gt;

&lt;p&gt;Here's the key: none of that new content touches the 20,000-token prefix. The system prompt, the tool definitions, the &lt;code&gt;CLAUDE.md&lt;/code&gt;, all of that is still sitting in cache from turn one. Every subsequent API call reads those 20,000 tokens at &lt;code&gt;$0.30/MTok&lt;/code&gt; instead of &lt;code&gt;$3.00/MTok&lt;/code&gt;. You're only paying full price for the new stuff: your message and the tool outputs.&lt;/p&gt;

&lt;p&gt;The Explore subagent finishes and hands its findings back to the main agent. But it doesn't dump 15,000 tokens of raw file contents into the conversation. It passes a &lt;strong&gt;condensed summary&lt;/strong&gt;: which files are relevant, what the current logic does, where the race condition likely lives. This is a deliberate design choice. Keeping the dynamic tail compact means the cache ratio stays high.&lt;/p&gt;

&lt;p&gt;Now the &lt;strong&gt;Plan subagent&lt;/strong&gt; kicks in. It takes the summary, reasons through the fix (idempotency key on the frontend, deduplication check on the API, database unique constraint as a safety net), and produces a step-by-step implementation plan. You approve it. Claude Code starts writing code.&lt;/p&gt;

&lt;p&gt;Over the next 15 minutes, you go back and forth. It writes the idempotency logic, you ask it to also handle the case where the page refreshes mid-checkout, it adjusts. Each of these turns adds new content to the dynamic tail. But the foundation, those 20,000 tokens, is read from cache every single time. Each cache hit also resets the TTL, so the cache never expires as long as you keep working.&lt;/p&gt;

&lt;p&gt;By the end of the session, you've gone through maybe 25 turns. The total tokens processed easily exceeds 1.5 million. But if you run &lt;code&gt;/cost&lt;/code&gt;, the bill tells a very different story than 1.5M tokens at full price. The vast majority were cache reads at a 90% discount.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;That's the difference between a $4.50 session and a $0.90 session. For one debugging task.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Production Numbers
&lt;/h2&gt;

&lt;p&gt;This isn't theoretical. Claude Code's production metrics:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Value&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Cache hit rate&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;92%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cost reduction&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;81%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;First-token latency reduction&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;79%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;In active sessions, 95%+ of input tokens are typically cache hits, billed at 0.1x the base price. Out of 400K tokens in a session, maybe 20K to 40K are billed at full price.&lt;/p&gt;

&lt;p&gt;Without prompt caching, a long Opus coding session (100 turns with compaction cycles) can cost $50 to $100 in input tokens. With it, $10 to $19.&lt;/p&gt;

&lt;h2&gt;
  
  
  The One Thing That Will Tank Your Cache Hit Rate
&lt;/h2&gt;

&lt;p&gt;Prompt caching has a gotcha that trips up almost everyone the first time.&lt;/p&gt;

&lt;p&gt;The cache key is a hash of the &lt;strong&gt;exact byte sequence&lt;/strong&gt; of your prompt prefix. Not the meaning. Not the content. The exact bytes, in the exact order. If you rearrange two paragraphs in your system prompt, the hash changes. Full cache miss. Everything recomputed at full price.&lt;/p&gt;

&lt;p&gt;This has three practical consequences:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Don't change your tool set mid-session
&lt;/h3&gt;

&lt;p&gt;Tool definitions are part of the cached prefix. If you add a tool on turn 12 that wasn't there on turn 1, every token after the change point is a cache miss. Load everything you might need at the start.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Don't switch models mid-conversation
&lt;/h3&gt;

&lt;p&gt;Each model has its own cache. Moving from Opus to Sonnet to save money on a later turn means rebuilding the cache from zero for the new model. You'll spend more on the rebuild than you saved on the cheaper rate.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Don't edit the system prompt to update state
&lt;/h3&gt;

&lt;p&gt;If your agent needs to track something (like "user is now authenticated"), don't inject that into the system prompt. Append it as a note in the next user message instead. The system prompt stays byte-identical, the cache stays valid.&lt;/p&gt;

&lt;p&gt;Claude Code follows all three of these rules religiously. That's how it maintains a 92% hit rate across millions of sessions.&lt;/p&gt;

&lt;h2&gt;
  
  
  Applying This to Your Own Agents
&lt;/h2&gt;

&lt;p&gt;If you're building on the Anthropic API, the same principles apply. Here's the practical playbook.&lt;/p&gt;

&lt;h3&gt;
  
  
  Prompt structure matters
&lt;/h3&gt;

&lt;p&gt;Put the most stable content at the top:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;1. System instructions and rules        (most stable, cached first)
2. Tool definitions                      (stable for session duration)
3. Reference documents / retrieved context
4. Conversation history + tool outputs   (dynamic, grows each turn)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The cache works from the top down. Everything above the first change point stays cached. Everything below it gets recomputed.&lt;/p&gt;

&lt;h3&gt;
  
  
  Use auto-caching
&lt;/h3&gt;

&lt;p&gt;Anthropic's API now supports automatic cache management. You add a single &lt;code&gt;cache_control&lt;/code&gt; field to your request and the system handles breakpoint placement for you:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"claude-sonnet-4-6-20260514"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"max_tokens"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1024&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"cache_control"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"ephemeral"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"system"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Your system prompt here..."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"messages"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="err"&gt;...&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It moves the cache boundary forward as the conversation grows and more content becomes stable. Before this existed, you had to manually calculate token boundaries. Getting it wrong meant missing the cache entirely.&lt;/p&gt;

&lt;h3&gt;
  
  
  Compact without breaking the cache
&lt;/h3&gt;

&lt;p&gt;When your conversation hits the context limit and you need to summarize it down, keep the system prompt and tool definitions identical. Add the compaction instruction as a new user message. The cached prefix stays valid. You only pay fresh tokens for the compaction prompt itself.&lt;/p&gt;

&lt;h3&gt;
  
  
  Monitor your hit rate
&lt;/h3&gt;

&lt;p&gt;Every API response includes three fields you should be tracking:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"usage"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"cache_creation_input_tokens"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;15200&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"cache_read_input_tokens"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;184800&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"input_tokens"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;3400&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;cache_creation_input_tokens&lt;/code&gt;: tokens written to cache (first time processing)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;cache_read_input_tokens&lt;/code&gt;: tokens read from cache (the cheap ones)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;input_tokens&lt;/code&gt;: tokens processed at full price (no cache available)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The ratio of &lt;code&gt;cache_read_input_tokens&lt;/code&gt; to total input tokens is your cache efficiency score. Track it like you'd track uptime. A sudden drop means something in your prompt structure changed and invalidated the cache.&lt;/p&gt;

&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;p&gt;Prompt caching isn't a setting you flip on and forget about. It's an architectural pattern that has to be baked into how your agent constructs its prompts, manages its tools, and handles long conversations.&lt;/p&gt;

&lt;p&gt;Claude Code shows what this looks like when it's done well: 92% cache hit rate, 81% cost reduction, built on stable prefixes, subagent summarization, and cache-aware context management.&lt;/p&gt;

&lt;p&gt;If you're building agents and not thinking about your cache architecture, you're leaving most of your budget on the table.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;We break down AI infrastructure and tooling like this regularly at &lt;a href="https://webafterai.substack.com" rel="noopener noreferrer"&gt;Web After AI&lt;/a&gt;. Practical, no hype, explained so it actually makes sense.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>claude</category>
      <category>architecture</category>
    </item>
    <item>
      <title>The 4 Levels of Hermes Agent Scaling Framework: From One Hermes Agent to a Fully Automated Team</title>
      <dc:creator>Shilpa Mitra</dc:creator>
      <pubDate>Fri, 22 May 2026 11:56:54 +0000</pubDate>
      <link>https://forem.com/shilpamitra/the-4-levels-of-hermes-agent-scaling-framework-from-one-hermes-agent-to-a-fully-automated-team-2gdp</link>
      <guid>https://forem.com/shilpamitra/the-4-levels-of-hermes-agent-scaling-framework-from-one-hermes-agent-to-a-fully-automated-team-2gdp</guid>
      <description>&lt;p&gt;Most people set up an AI agent and immediately start thinking about multi-agent architectures. Orchestrators, specialist swarms, automated pipelines. That's Level 4 thinking applied to a Level 1 setup, and it's how you end up with a fleet of agents shipping garbage at scale.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/NousResearch/hermes-agent" rel="noopener noreferrer"&gt;Hermes Agent&lt;/a&gt; by Nous Research (160K+ stars, fastest-growing open-source agent of 2026) is built for exactly this kind of progressive scaling. It's self-hosted, self-improving, stores everything locally in SQLite, and supports multi-agent orchestration out of the box as of v0.6.0.&lt;/p&gt;

&lt;p&gt;But the framework below isn't Hermes-specific. It applies to any agent system. The tool doesn't matter as much as the progression.&lt;/p&gt;

&lt;p&gt;Here are the four levels, what each one looks like in practice, and how to know when you're actually ready to move up.&lt;/p&gt;

&lt;h2&gt;
  
  
  First: What Hermes Agent Is
&lt;/h2&gt;

&lt;p&gt;Hermes is an autonomous AI agent that runs on your machine or VPS. It takes a goal, breaks it into steps, picks from 47 built-in tools to execute, and iterates until the task is done. Everything stays local.&lt;/p&gt;

&lt;p&gt;What sets it apart: after each task, Hermes writes a structured record of what worked and what didn't into episodic memory. On future tasks with similar patterns, it retrieves those records and adjusts its approach before starting. It also creates reusable "skills" from experience, essentially building procedural memory that improves over time.&lt;/p&gt;

&lt;p&gt;It connects to 20+ messaging platforms (Telegram, Discord, Slack, WhatsApp, Signal, and more), supports MCP servers, and runs across 6 terminal backends (local, Docker, SSH, Daytona, Singularity, Modal).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Install:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-fsSL&lt;/span&gt; https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or via pip:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;hermes-agent
hermes postinstall
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then configure:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;hermes doctor      &lt;span class="c"&gt;# check your environment&lt;/span&gt;
hermes model       &lt;span class="c"&gt;# pick a model&lt;/span&gt;
hermes config &lt;span class="nb"&gt;set&lt;/span&gt;  &lt;span class="c"&gt;# add API keys&lt;/span&gt;
hermes             &lt;span class="c"&gt;# start the agent&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Takes about 60 seconds on Linux, macOS, or WSL2.&lt;/p&gt;




&lt;h2&gt;
  
  
  Level 1: The Main Agent
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;You → Your Soul Hermes Agent
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is where everyone starts, and where most people should stay for &lt;strong&gt;weeks&lt;/strong&gt;, not days.&lt;/p&gt;

&lt;p&gt;Your single Hermes instance is your prototype area. You test workflows here. You refine prompts. You figure out which tasks the agent handles well and which ones it fumbles. You build up its memory and skills on your specific work.&lt;/p&gt;

&lt;p&gt;At this level, Hermes doubles as your orchestrator by default. You give it a complex task, it breaks it down, it executes. The self-improving loop is already running: every completed task makes it slightly better at similar tasks next time.&lt;/p&gt;

&lt;h3&gt;
  
  
  What to do at Level 1
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Run real work through it daily.&lt;/strong&gt; Not toy examples. Actual tasks from your workflow. The memory system only gets useful with real data.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Manage its memory actively.&lt;/strong&gt; Use &lt;code&gt;/recall&lt;/code&gt; to search what it remembers and &lt;code&gt;/remember&lt;/code&gt; to manually save important context. Correct it when it gets things wrong.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Install skills or let it create them.&lt;/strong&gt; Skills are procedural memory. Hermes can build them from experience, or you can install community-contributed ones from the Skills Hub.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Connect one messaging platform.&lt;/strong&gt; Telegram is the easiest. Run &lt;code&gt;hermes gateway setup&lt;/code&gt; to get always-on access from your phone. This changes the dynamic from "sitting at my terminal to use AI" to "texting my agent whenever I need something."&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  When to move on
&lt;/h3&gt;

&lt;p&gt;When you have at least 2-3 workflows that are &lt;strong&gt;consistently producing good output&lt;/strong&gt;. Not acceptable output. Not "close enough." Good output that you'd be comfortable shipping without heavy editing.&lt;/p&gt;

&lt;p&gt;This is the most important checkpoint in the entire framework. Everything that comes after multiplies the quality you establish here.&lt;/p&gt;




&lt;h2&gt;
  
  
  Level 2: Specialized Agents
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;You → SEO Agent
You → Content Pipeline Agent
You → DevOps Agent
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once a workflow is solid and repeatable, break it out into its own Hermes instance with its own credentials, memory, and scope.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why separate instances?
&lt;/h3&gt;

&lt;p&gt;Context pollution. An agent that handles your SEO research, your email drafting, and your code reviews is juggling three different domains in one memory space. Its SEO skills get diluted by code review patterns. Its writing voice gets contaminated by technical documentation habits.&lt;/p&gt;

&lt;p&gt;Specialized agents have cleaner memory, more focused skills, and better output because they only learn from one domain.&lt;/p&gt;

&lt;h3&gt;
  
  
  How to do this practically
&lt;/h3&gt;

&lt;p&gt;Each Hermes instance runs independently. Use different configuration profiles, or spin each one up in its own Docker container or VPS.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Different profiles for different agents&lt;/span&gt;
&lt;span class="nv"&gt;HERMES_PROFILE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;seo hermes
&lt;span class="nv"&gt;HERMES_PROFILE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;contentpipeline hermes
&lt;span class="nv"&gt;HERMES_PROFILE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;devops hermes
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each profile gets its own SQLite database, its own memory, its own skill library. You talk to each one directly. You're still the orchestrator at this stage, manually deciding which agent handles which task.&lt;/p&gt;

&lt;h3&gt;
  
  
  What to do at Level 2
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Write a scope document for each agent.&lt;/strong&gt; What it does, what it doesn't do, what tools it has access to. This isn't bureaucracy. It's how you prevent scope creep across agents.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Let each agent build its own skill library&lt;/strong&gt; within its domain. The SEO agent's skills should be about keyword research and competitor analysis, not email copywriting.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Keep the count low.&lt;/strong&gt; 2-3 specialists is plenty to start. The temptation to spin up a new agent for every task is strong. Resist it.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  When to move on
&lt;/h3&gt;

&lt;p&gt;When you're spending more time routing tasks between agents than actually reviewing their output.&lt;/p&gt;




&lt;h2&gt;
  
  
  Level 3: Orchestrated Team
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;You → Orchestrator Agent
           ↓
     Your Specialized Agents
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now you bring the orchestrator agent back. But this time it's not your prototype agent wearing multiple hats. It's a dedicated Hermes instance whose only job is routing tasks to your specialists and synthesizing their outputs.&lt;/p&gt;

&lt;p&gt;Hermes v0.6.0 added multi-agent orchestration. The orchestrator analyzes a complex task, identifies the optimal work breakdown, and spawns specialist worker agents with tailored context. Each worker gets its own scope and tools, returns a verifiable artifact, and records the handoff.&lt;/p&gt;

&lt;h3&gt;
  
  
  Example workflow
&lt;/h3&gt;

&lt;p&gt;You tell the orchestrator: "Research competitors in the CRM space and draft a blog post about our differentiators."&lt;/p&gt;

&lt;p&gt;The orchestrator:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Routes the research task to your Research agent&lt;/li&gt;
&lt;li&gt;Takes the research output and routes the writing task to your Content agent&lt;/li&gt;
&lt;li&gt;Synthesizes the outputs into a final deliverable&lt;/li&gt;
&lt;li&gt;Returns it to you for review&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;You still review the final output. You're not out of the loop. You're just not manually routing between agents anymore.&lt;/p&gt;

&lt;h3&gt;
  
  
  What to do at Level 3
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Set up task tracking.&lt;/strong&gt; Kanban-style works well. You need visibility into what each agent is working on, what's queued, and what's done.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Define handoff protocols.&lt;/strong&gt; What does the research agent pass to the content agent? What format? What level of detail? Ambiguous handoffs create ambiguous output.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Review regularly.&lt;/strong&gt; Quality issues compound fast in multi-agent setups. A small drift in the research agent's output becomes a big problem by the time it's been through two more agents.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  When to move on
&lt;/h3&gt;

&lt;p&gt;When the orchestrator's routing decisions are consistently correct and the specialist outputs consistently meet your quality bar without heavy editing.&lt;/p&gt;




&lt;h2&gt;
  
  
  Level 4: Automated Team
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Cron Job / Trigger Events → Orchestrator Agent
                       ↓
                 Full Agent Team
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is where you step out of the loop for routine work. Cron jobs and event triggers fire tasks into the orchestrator. The orchestrator routes them to the team. The team handles the work asynchronously.&lt;/p&gt;

&lt;h3&gt;
  
  
  What this looks like in practice
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Every Monday at 8am&lt;/strong&gt;, the orchestrator triggers your SEO agent to pull keyword rankings, your content agent to draft the weekly newsletter outline, and your ops agent to generate a metrics report.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;When a new competitor blog post is published&lt;/strong&gt; (event trigger), the research agent analyzes it and the content agent drafts a response piece.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;When a support ticket hits a specific tag&lt;/strong&gt;, the ops agent drafts a response for your review queue.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The task bus handles queuing and routing. Agents pick up work, complete it, and log results. You check in when you want to, not because you have to.&lt;/p&gt;

&lt;h3&gt;
  
  
  What to do at Level 4
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Start with one automated workflow&lt;/strong&gt;, not ten. Get one cron job running reliably before adding more. Debugging a broken automation is harder when you have twelve of them running simultaneously.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Build in quality gates.&lt;/strong&gt; Not every output needs your review, but have the orchestrator flag anything that falls below a confidence threshold for human review.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Monitor closely at first.&lt;/strong&gt; The trust you build here is earned, not assumed. Look at outputs daily for the first two weeks, then taper to spot-checks.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  The Part That Matters More Than Any of This
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Take small steps. You do NOT want to automate slop.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If your output at Level 1 is mediocre, you are about to scale mediocrity. 20 agents shipping low-quality work at speed is worse than 3 shipping great work slowly. Every level multiplies whatever quality you've established at the level before it.&lt;/p&gt;

&lt;p&gt;I'd rather run fewer agents with better output than max the agent count and spit out more of the same.&lt;/p&gt;

&lt;p&gt;The progression isn't about moving fast. It's about moving when you're ready. Level 1 might take you a month. Level 2 might take another month. That's fine. The agents aren't going anywhere. Your quality bar is what matters.&lt;/p&gt;




&lt;h2&gt;
  
  
  Resources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://github.com/NousResearch/hermes-agent" rel="noopener noreferrer"&gt;NousResearch/hermes-agent&lt;/a&gt; (160K+ stars)&lt;/li&gt;
&lt;li&gt;&lt;a href="https://hermes-agent.nousresearch.com/docs/" rel="noopener noreferrer"&gt;Official documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://hermes-agent.nousresearch.com/docs/getting-started/installation" rel="noopener noreferrer"&gt;Installation guide&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://hermes-agent.ai/features/multi-agent" rel="noopener noreferrer"&gt;Multi-agent orchestration (v0.6.0)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/NousResearch/hermes-agent/blob/main/website/docs/reference/skills-catalog.md" rel="noopener noreferrer"&gt;Skills catalog&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;I write about practical AI agent workflows, open-source tools, and the infrastructure behind them at &lt;a href="https://webafterai.substack.com" rel="noopener noreferrer"&gt;Web After AI&lt;/a&gt;. No hype, just stuff you can actually use.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>opensource</category>
      <category>automation</category>
      <category>productivity</category>
    </item>
    <item>
      <title>4 GitHub Repos That Prove AI Agents Aren't Just for Coding Anymore</title>
      <dc:creator>Shilpa Mitra</dc:creator>
      <pubDate>Thu, 21 May 2026 17:08:15 +0000</pubDate>
      <link>https://forem.com/shilpamitra/4-github-repos-that-prove-ai-agents-arent-just-for-coding-anymore-13g1</link>
      <guid>https://forem.com/shilpamitra/4-github-repos-that-prove-ai-agents-arent-just-for-coding-anymore-13g1</guid>
      <description>&lt;p&gt;Six months ago, "AI agent" basically meant "coding assistant." Claude Code, Copilot, Cursor. All doing the same thing: helping you write code.&lt;/p&gt;

&lt;p&gt;That's changing. The most interesting open-source projects right now aren't building yet another coding agent. They're building agents that specialize: agents that trade stocks, agents that run your entire content marketing operation, agents that make your coding agent actually follow engineering discipline.&lt;/p&gt;

&lt;p&gt;The model is the same underneath. The harness around it is what makes it useful for a specific job.&lt;/p&gt;

&lt;p&gt;Here are four repos that show where this is heading, with setup instructions for each.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. mattpocock/skills (91.7K stars) — Make Your Coding Agent an Actual Engineer&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Repo:&lt;/strong&gt; &lt;a href="https://github.com/mattpocock/skills" rel="noopener noreferrer"&gt;github.com/mattpocock/skills&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Matt Pocock (the TypeScript educator behind Total TypeScript) open-sourced his personal &lt;code&gt;.claude&lt;/code&gt; directory. It's a collection of skills that fix the most common failure modes of AI coding agents: building the wrong thing, skipping tests, producing code that works but is impossible to maintain, and declaring "done" when nothing actually compiles.&lt;/p&gt;

&lt;p&gt;Most people treat their coding agent like an intern with no process. Matt's skills give it the process.&lt;/p&gt;

&lt;h3&gt;
  
  
  The standout: &lt;code&gt;/grill-me&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;This skill forces the agent to interrogate you about what you actually want before writing a single line of code. It's a structured interview that catches misalignment before it becomes a wasted hour. There's also &lt;code&gt;/grill-with-docs&lt;/code&gt;, which does the same thing but additionally builds a shared vocabulary between you and the agent in a &lt;code&gt;CONTEXT.md&lt;/code&gt; file.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;CONTEXT.md&lt;/code&gt; approach is quietly brilliant. Instead of the agent using 20 words to describe something, you teach it your project's jargon. Over time, the agent's outputs get shorter, more precise, and the variables and functions it creates use consistent naming. It also reduces token usage, because concise terminology means shorter prompts and responses.&lt;/p&gt;

&lt;h3&gt;
  
  
  Other skills worth knowing
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;/tdd&lt;/code&gt;&lt;/strong&gt; — Test-driven development with red-green-refactor. The agent writes a failing test first, then fixes it. Far better code quality than "write the feature, then maybe add tests."&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;/diagnose&lt;/code&gt;&lt;/strong&gt; — Disciplined debugging loop: reproduce, minimise, hypothesise, instrument, fix, regression-test.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;/improve-codebase-architecture&lt;/code&gt;&lt;/strong&gt; — Finds structural improvements using your project's domain language from &lt;code&gt;CONTEXT.md&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;/handoff&lt;/code&gt;&lt;/strong&gt; — Compacts the current conversation into a handoff document so another agent (or a new session) can continue the work without losing context.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;/caveman&lt;/code&gt;&lt;/strong&gt; — Ultra-compressed communication mode. Cuts token usage by roughly 75% while keeping full technical accuracy. Useful when you're burning through credits.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Setup
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx skills@latest add mattpocock/skills
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Pick the skills you want and which coding agents to install them on. Make sure you select &lt;code&gt;/setup-matt-pocock-skills&lt;/code&gt; during install. Then run that command in your agent, and it'll configure your issue tracker (GitHub, Linear, or local files), triage labels, and docs location. Works with Claude Code, Cursor, Codex, and others.&lt;/p&gt;

&lt;h3&gt;
  
  
  How it compares to Addy Osmani's agent-skills
&lt;/h3&gt;

&lt;p&gt;If you've seen &lt;a href="https://github.com/addyosmani/agent-skills" rel="noopener noreferrer"&gt;addyosmani/agent-skills&lt;/a&gt;, you might wonder how these differ. Addy's skills focus on the full development lifecycle with slash commands like &lt;code&gt;/spec&lt;/code&gt;, &lt;code&gt;/plan&lt;/code&gt;, &lt;code&gt;/build&lt;/code&gt;, &lt;code&gt;/ship&lt;/code&gt;. Matt's skills focus more on engineering fundamentals: alignment, testing discipline, debugging, and architecture quality. They're complementary, not competing. You can run both in the same project.&lt;/p&gt;




&lt;h2&gt;
  
  
  2. AI-Trader (13.7K stars) — Let AI Agents Trade for You
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Repo:&lt;/strong&gt; &lt;a href="https://github.com/HKUDS/AI-Trader" rel="noopener noreferrer"&gt;github.com/HKUDS/AI-Trader&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;AI-Trader is an agent-native trading platform built by researchers at the University of Hong Kong. The core idea: just like humans have their trading platforms, AI agents need their own.&lt;/p&gt;

&lt;p&gt;You connect your AI agent (Claude Code, Cursor, OpenClaw, Codex, whatever), and it can publish trading signals, copy trades from top-performing agents, participate in strategy discussions, and access real-time market data across stocks, crypto, forex, options, and futures.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why it's interesting
&lt;/h3&gt;

&lt;p&gt;This isn't just one agent making trades. It's a platform where multiple agents collaborate, debate strategies, and learn from each other. They call it "collective intelligence trading."&lt;/p&gt;

&lt;p&gt;Agents publish three types of signals:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Strategies&lt;/strong&gt; — for discussion and analysis&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Operations&lt;/strong&gt; — for direct copy trading&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Discussions&lt;/strong&gt; — for collaborative reasoning&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;There's a reward system where agents earn points for successful predictions, and a $100K paper trading mode so you can test without risk.&lt;/p&gt;

&lt;h3&gt;
  
  
  Setup
&lt;/h3&gt;

&lt;p&gt;The simplest way to connect an agent:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Read https://ai4trade.ai/SKILL.md and register.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Send that message to your AI agent. It reads the integration guide, installs the necessary components, and registers itself on the platform. For human traders, visit &lt;a href="https://ai4trade.ai" rel="noopener noreferrer"&gt;ai4trade.ai&lt;/a&gt; and sign up directly.&lt;/p&gt;

&lt;p&gt;For developers who want to self-host:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/HKUDS/AI-Trader.git
&lt;span class="nb"&gt;cd &lt;/span&gt;AI-Trader
npm &lt;span class="nb"&gt;install&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The backend is FastAPI (Python), frontend is React. Full OpenAPI docs are in &lt;code&gt;docs/api/openapi.yaml&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  A word of caution
&lt;/h3&gt;

&lt;p&gt;Automated trading carries real financial risk. AI-Trader includes paper trading mode for a reason. Start there. The fact that it comes from a university research group rather than a fintech startup trying to sell you something is a point in its favor, but treat any trading system with healthy skepticism.&lt;/p&gt;




&lt;h2&gt;
  
  
  3. AiToEarn (12.2K stars) — AI Agent for Content Marketing Across 14 Platforms
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Repo:&lt;/strong&gt; &lt;a href="https://github.com/yikart/AiToEarn" rel="noopener noreferrer"&gt;github.com/yikart/AiToEarn&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;AiToEarn is an open-source content marketing platform with an AI agent built in. You create content once, and it publishes across 14 platforms simultaneously: TikTok, YouTube, Instagram, Twitter/X, LinkedIn, Pinterest, Facebook, Threads, plus Chinese platforms like Douyin, Xiaohongshu (Rednote), Bilibili, WeChat, and Kuaishou.&lt;/p&gt;

&lt;h3&gt;
  
  
  The "All In Agent"
&lt;/h3&gt;

&lt;p&gt;This is the interesting part. It's an AI agent that can automatically generate content, publish it, and manage your accounts across all platforms. Beyond publishing, the platform includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Trend radar&lt;/strong&gt; — what's going viral right now across platforms&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Case library&lt;/strong&gt; — how posts with 10K+ likes were structured&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Smart comment search&lt;/strong&gt; — finds high-conversion signals like "link please" or "how to buy" across your accounts&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cross-platform analytics&lt;/strong&gt; — unified dashboard for all your channels&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The comment search feature is particularly useful for anyone doing content-driven sales. It surfaces purchase-intent comments so you can reply fast and convert.&lt;/p&gt;

&lt;h3&gt;
  
  
  Setup
&lt;/h3&gt;

&lt;p&gt;Docker (recommended):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/yikart/AiToEarn.git
&lt;span class="nb"&gt;cd &lt;/span&gt;AiToEarn
docker compose up &lt;span class="nt"&gt;-d&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This starts the frontend, backend, MongoDB, and Redis in one command. Access the web interface at &lt;code&gt;http://localhost:8080&lt;/code&gt;. There's also an Electron desktop app available from the GitHub releases page.&lt;/p&gt;

&lt;h3&gt;
  
  
  Note on documentation
&lt;/h3&gt;

&lt;p&gt;The project originated in China. The English README and Docker deployment guide are solid, but some deeper configuration docs are still in Chinese. AI video model integrations (Kling, Sora, Runway, etc.) are listed as coming soon.&lt;/p&gt;




&lt;h2&gt;
  
  
  4. DeepSeek-TUI (Trending) — Claude Code, but for DeepSeek
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Repo:&lt;/strong&gt; &lt;a href="https://github.com/Hmbown/DeepSeek-TUI" rel="noopener noreferrer"&gt;github.com/Hmbown/DeepSeek-TUI&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;A terminal-based coding agent built specifically for DeepSeek models. If you've used Claude Code, the experience is similar: you type prompts in your terminal, the agent reads your files, edits code, runs shell commands, does git operations, and browses the web. The difference is it's built from the ground up for DeepSeek's API, which is significantly cheaper than Claude or GPT-4.&lt;/p&gt;

&lt;h3&gt;
  
  
  Three modes
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Mode&lt;/th&gt;
&lt;th&gt;What it does&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Plan&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Review a plan before the agent makes changes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Agent&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Default interactive mode with multi-step tool use&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;YOLO&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Auto-approve everything in a trusted workspace&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Tab to cycle between them. It also supports MCP servers, session resume, and can run as an HTTP/SSE API server.&lt;/p&gt;

&lt;p&gt;Built in Rust, so it's fast and lightweight.&lt;/p&gt;

&lt;h3&gt;
  
  
  Setup
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-g&lt;/span&gt; deepseek-tui
deepseek-tui
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;On first launch it'll ask for your DeepSeek API key. You can also set it beforehand:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;deepseek-tui login
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or via environment variable:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;DEEPSEEK_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"your-key"&lt;/span&gt; deepseek-tui
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Configuration lives in &lt;code&gt;~/.deepseek/config.toml&lt;/code&gt;. Useful commands: &lt;code&gt;deepseek-tui doctor&lt;/code&gt; (check setup), &lt;code&gt;deepseek-tui models&lt;/code&gt; (list available models).&lt;/p&gt;

&lt;p&gt;Also available via Rust:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;cargo &lt;span class="nb"&gt;install &lt;/span&gt;deepseek-tui &lt;span class="nt"&gt;--locked&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  The Pattern
&lt;/h2&gt;

&lt;p&gt;What connects all four of these: &lt;strong&gt;the model isn't the product anymore. The harness is.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Matt Pocock's skills don't change what Claude can do. They change how disciplined it is. AI-Trader doesn't invent a new trading algorithm. It builds a platform where existing agents collaborate. AiToEarn doesn't create a new content AI. It builds distribution infrastructure around existing ones. DeepSeek-TUI takes the Claude Code interaction pattern and wraps it around a different, cheaper model.&lt;/p&gt;

&lt;p&gt;Every one of these is the same insight applied to a different domain: wrap the right structure around a capable model, and you get something genuinely useful. The structure is where the value is.&lt;/p&gt;

&lt;p&gt;This is what the industry is starting to call &lt;strong&gt;harness engineering&lt;/strong&gt;, the practice of building the environment, constraints, and feedback loops around an AI agent so it produces reliable results. It's not prompt engineering. It's not fine-tuning. It's designing the system the model operates inside.&lt;/p&gt;

&lt;p&gt;If you want to go deeper on this and see how to actually chain free tools into a working setup, I wrote a step-by-step breakdown of building a zero-cost AI coding stack (9router + agentmemory + agent-skills) in my newsletter: &lt;a href="https://webafterai.substack.com" rel="noopener noreferrer"&gt;Web After AI&lt;/a&gt;.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;What specialized AI agents are you seeing in your domain? Drop a comment. I'm collecting examples for a follow-up piece.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>opensource</category>
      <category>webdev</category>
    </item>
  </channel>
</rss>
