<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Vilius</title>
    <description>The latest articles on Forem by Vilius (@vystartasv).</description>
    <link>https://forem.com/vystartasv</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F133303%2F50baa34e-e011-4576-8b1a-5974d272fc34.jpg</url>
      <title>Forem: Vilius</title>
      <link>https://forem.com/vystartasv</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/vystartasv"/>
    <language>en</language>
    <item>
      <title>The Agentic Gap: Why a SharePoint Expert's Excitement Stopped Me Cold</title>
      <dc:creator>Vilius</dc:creator>
      <pubDate>Mon, 04 May 2026 21:48:25 +0000</pubDate>
      <link>https://forem.com/vystartasv/the-agentic-gap-why-a-sharepoint-experts-excitement-stopped-me-cold-5267</link>
      <guid>https://forem.com/vystartasv/the-agentic-gap-why-a-sharepoint-experts-excitement-stopped-me-cold-5267</guid>
      <description>&lt;p&gt;I saw a SharePoint MVP's post recently. Genuine excitement. Markdown support had landed in SharePoint. Not a joke — real, earned enthusiasm from someone who knows their domain inside out.&lt;/p&gt;

&lt;p&gt;And I get it. In the SharePoint world, that's real progress. It matters for real users solving real problems. You don't become an MVP without expert knowledge and public recognition. He was right to celebrate.&lt;/p&gt;

&lt;p&gt;What stopped me wasn't his post. It was the contrast with myself — with what I used to get excited about, and what I'm working on now.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Post I Would Have Written
&lt;/h2&gt;

&lt;p&gt;Eighteen months ago, I'd have written that exact post. Same enthusiasm. Same well-earned expertise. Same genuine belief that this was meaningful progress. And I'd have been right — in the world I was living in.&lt;/p&gt;

&lt;p&gt;But I've spent the last year backing into a different way of working. Then, over a long weekend, I sat down and built the infrastructure for agents to work autonomously — agent loops, error recovery, quality gates, the unglamorous fabric that makes autonomous code trustworthy instead of terrifying. By Sunday night, those agents had scaffolded 111 SharePoint web parts and 5 backend services. Design, build, test. All local. No human hands on the code.&lt;/p&gt;

&lt;p&gt;Three days of tooling produced months of human output. But the output wasn't the impressive part — the steep learning curve was.&lt;/p&gt;

&lt;p&gt;The SharePoint MVP wasn't wrong. He was just in a different conversation. And that's the part that scared me.&lt;/p&gt;




&lt;h2&gt;
  
  
  What YouTube Doesn't Show You
&lt;/h2&gt;

&lt;p&gt;Here's what no tutorial, TikTok, or conference talk prepares you for: the grind.&lt;/p&gt;

&lt;p&gt;Over those three days, something broke roughly every few hours. Not metaphorically — literally. You fix the macOS permissions so the agent can read files. Now it needs a gateway restart. Restart it and the model config turns out to be broken — empty model name, nothing works for two hours while you trace why. Fix that, and suddenly the build fails because the SCSS configuration is extending the wrong toolchain — it was written for Gulp, not Heft. Rewrite that, and the Yeoman scaffold generator silently ignores your CLI flags because a &lt;code&gt;.yo-rc.json&lt;/code&gt; exists from a previous run. Build a manual template script to bypass it. Now the directories are PascalCase and your pipeline expected kebab-case. Fix that. Now C++ native modules won't compile on Node 22. That's before you even get to the agents looping — repeating the same three broken commands until you harden the loop detection.&lt;/p&gt;

&lt;p&gt;None of this is in a tutorial. You can't watch a video for it. You have to live through it — hands on, late at night, no shortcut.&lt;/p&gt;

&lt;p&gt;The memory file got patched again and again. Model preferences changed. The sync method moved from a Pi server to Git-based recovery. Configs were rewritten wholesale — &lt;code&gt;config.json&lt;/code&gt;, &lt;code&gt;sass.json&lt;/code&gt;, &lt;code&gt;tsconfig&lt;/code&gt;, ESLint rules, the entire pipeline script. Each fix revealed the next breakage. The pain point just moved — same problem, different file, new and creative way of failing.&lt;/p&gt;

&lt;p&gt;This is the unglamorous truth about building agent infrastructure: you're not engineering features. You're engineering resilience. Before it can build web parts autonomously, it has to survive the environment. Before you can trust it, it has to break in every possible way. There is no "prompt engineering" your way out of this. It's systems engineering, and it's dirty.&lt;/p&gt;




&lt;h2&gt;
  
  
  Two Conversations, Same Industry
&lt;/h2&gt;

&lt;p&gt;Here's what I keep coming back to: these two things — celebrating markdown support and watching agents build entire applications autonomously — are happening in the same industry, on the same platform, to people with the same job title.&lt;/p&gt;

&lt;p&gt;That's not a criticism of anyone. It's a data point about how fast the ground is shifting.&lt;/p&gt;

&lt;p&gt;The gap isn't between smart people and slow people. It's between two entirely different models of what software development is becoming. In one model, we're incrementally improving the tools we already know. In the other, the tools are learning to use themselves.&lt;/p&gt;

&lt;p&gt;And you can be a genuine expert — someone with years of deep domain knowledge, public recognition, real achievements — and still be standing in the first room while the second one exists a few doors down.&lt;/p&gt;

&lt;p&gt;I almost was.&lt;/p&gt;




&lt;h2&gt;
  
  
  How I Almost Missed It
&lt;/h2&gt;

&lt;p&gt;I'm not telling this story because I saw it coming. I didn't. I backed into it.&lt;/p&gt;

&lt;p&gt;I was a SharePoint developer. Not a machine learning engineer. Not an AI researcher. A developer who spent years learning the quirks of SPFx, the SharePoint Framework, because that's what the job demanded.&lt;/p&gt;

&lt;p&gt;What changed wasn't my intelligence or foresight. It was a simple question: "What if I stopped prompting AI and started architecting workflows for it?"&lt;/p&gt;

&lt;p&gt;That shift — from treating AI as a smart autocomplete to treating it as a team member with a defined role, quality gates, and an audit trail — was the door I walked through. Not because I was clever. Because I was curious, and slightly lazy, and the alternative was writing web part number 112 by hand.&lt;/p&gt;

&lt;p&gt;The methodology that emerged — I now call it Works With Agents, but the name doesn't matter — isn't complicated. It's just… different. Different enough that it creates a perception gap. And perception gaps are where the real opportunity lives.&lt;/p&gt;




&lt;h2&gt;
  
  
  What This Means (For All of Us)
&lt;/h2&gt;

&lt;p&gt;Here's the uncomfortable part. The gap isn't closing. It's widening.&lt;/p&gt;

&lt;p&gt;The tools are getting better faster than the mental models are updating. By the time the average team lead internalises what Claude or Copilot can do today, the agents will have moved on to something else entirely. We're not in a technology adoption curve. We're in a fragmentation event.&lt;/p&gt;

&lt;p&gt;Three things I think are true, as of May 2026:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Your technical moat is thinner than you think.&lt;/strong&gt; If your competitive advantage is "we build features faster," a research loop — cron job, web search, LLM analysis, agent scaffold — can clone your feature set in a weekend. The moat is moving to compliance, trust, and domain relationships. Things that take months or years, not hours.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. The bottleneck isn't code generation. It's verification.&lt;/strong&gt; When an agent can produce a thousand lines of code in seconds, the hard problem isn't "did it compile?" It's "did it do what I actually needed, safely, and can I prove that to an auditor?" Regulated industries feel this most acutely, but it's coming for everyone.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. The people who are "behind" aren't stupid. They're in a different room.&lt;/strong&gt; And most of us are in rooms we don't know about yet. The question isn't "am I ahead?" It's "what room am I in right now that already looks like markdown support to someone else?"&lt;/p&gt;




&lt;h2&gt;
  
  
  The Real Question
&lt;/h2&gt;

&lt;p&gt;I don't have a tidy conclusion. The SharePoint MVP was right to be excited. Markdown in SharePoint is progress. But somewhere between his post and my screen, I realised that the measure of progress had fundamentally changed. Not gradually. Suddenly. And not everyone noticed.&lt;/p&gt;

&lt;p&gt;So the question I've been sitting with: what room am I in right now, feeling perfectly current, that already looks like markdown support from the outside?&lt;/p&gt;

&lt;p&gt;If you've got an answer — or if this made you uncomfortable — I'd genuinely like to hear it.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>sharepoint</category>
      <category>software</category>
    </item>
    <item>
      <title>My AI Agents Kept Burning Tokens on Subagents That Can't Code — So I Built a Decision Gate</title>
      <dc:creator>Vilius</dc:creator>
      <pubDate>Mon, 04 May 2026 17:12:31 +0000</pubDate>
      <link>https://forem.com/vystartasv/my-ai-agents-kept-burning-tokens-on-subagents-that-cant-code-so-i-built-a-decision-gate-2135</link>
      <guid>https://forem.com/vystartasv/my-ai-agents-kept-burning-tokens-on-subagents-that-cant-code-so-i-built-a-decision-gate-2135</guid>
      <description>&lt;p&gt;&lt;em&gt;By Vilius Vystartas | May 2026&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;I run 19 autonomous AI agents in production. They handle research, content, monitoring, deployment — the kind of always-on work that makes a solo developer's output look like a small team's.&lt;/p&gt;

&lt;p&gt;The delegation feature was supposed to be the multiplier. Spawn a subagent, give it a task, get results in parallel. In theory, it turns one agent into many. In practice, it was burning thousands of tokens for exactly zero output.&lt;/p&gt;

&lt;p&gt;The problem wasn't the agents. It was that nobody had taught them &lt;em&gt;when not to delegate&lt;/em&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Problem That Forced My Hand
&lt;/h2&gt;

&lt;p&gt;Here's what happens when you ask a subagent to code something:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The subagent spawns, reads the context, starts working — looks promising&lt;/li&gt;
&lt;li&gt;It tries to write a file. The file operation fails silently. The subagent doesn't notice&lt;/li&gt;
&lt;li&gt;It tries again with a different approach. Same silent failure&lt;/li&gt;
&lt;li&gt;Six hundred seconds later: timeout. Zero output. Thousands of tokens gone&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The core issue is structural: subagents can't reliably write files, can't run builds, can't verify their own output. They're built for &lt;strong&gt;read-only work&lt;/strong&gt; — research, analysis, data gathering. But nothing in the agent's training tells it that. It just sees "task → delegate" and fires.&lt;/p&gt;

&lt;p&gt;I watched this happen dozens of times. Every failure was another chunk of the context window gone, another session wasted, another moment of wondering whether multi-agent workflows were fundamentally broken.&lt;/p&gt;

&lt;p&gt;They weren't. The delegation call just needed a bouncer at the door.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I Built: Agentic Delegation
&lt;/h2&gt;

&lt;p&gt;Agentic Delegation is a decision protocol that sits between your agent and its delegation tool. It has three layers:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. The Decision Tree
&lt;/h3&gt;

&lt;p&gt;Before any &lt;code&gt;delegate_task&lt;/code&gt; call, the protocol classifies the work:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;CODING → BLOCKED. Routed to write_file/patch/terminal (10x faster, 100% reliable)
RESEARCH → ALLOWED. But verified after completion, max 2 retries
UNKNOWN → DECOMPOSED. Broken into atomic subtasks first, then routed individually
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is a hard rule, not a suggestion. The skill document literally says "NEVER VIOLATE" at the top of the coding section. If your agent ignores it and delegates coding anyway, there's a self-correction protocol that kicks in after the inevitable timeout.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. The Task Decomposer
&lt;/h3&gt;

&lt;p&gt;Complex tasks get broken into atomic subtasks by a lightweight classifier — either your local LLM (free) or Gemini Flash (cheap cloud fallback). No dependencies beyond Python's stdlib.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;python3.11 scripts/decompose.py &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="s2"&gt;"Research GRPO training papers, write a summary, and add it to README"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"description"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Research GRPO training papers"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nl"&gt;"tool"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"delegate"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"description"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Write a summary of the findings"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"tool"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"direct"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"3"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"description"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Update the project README"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="nl"&gt;"tool"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"direct"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Three subtasks. One delegated (the research). Two handled directly (the writing). No subagent ever touches a file.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. The Validation Gate
&lt;/h3&gt;

&lt;p&gt;Models hallucinate. Sometimes the decomposer labels a coding task as "delegate." The validation gate catches this with a hard keyword check and reassigns it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s1"&gt;'[{"id":"1","description":"implement JWT auth","tool":"delegate"}]'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  | python3.11 scripts/decompose.py &lt;span class="nt"&gt;--validate-only&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"description"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"implement JWT auth"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"tool"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"direct"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"verify"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"[FIXED: was delegate]"&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The annotation is deliberate. It leaves a paper trail so you can see what the model wanted to do vs what the gate enforced.&lt;/p&gt;




&lt;h2&gt;
  
  
  Architecture
&lt;/h2&gt;

&lt;p&gt;The protocol is surprisingly thin — under 400 lines total. The decision tree is a markdown file. The decomposer is a single Python script. The validation gate is a 20-line function.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User gives agent a complex task
         │
         ▼
┌─────────────────────┐
│  Decision Tree      │  ← SKILL.md rules
│  Coding? → BLOCKED  │
│  Research? → ALLOW  │
│  Unknown? → SPLIT   │
└────────┬────────────┘
         │
         ▼
┌─────────────────────┐
│  Task Decomposer    │  ← decompose.py
│  Local LLM (free)   │
│  or Gemini Flash    │
└────────┬────────────┘
         │
         ▼
┌─────────────────────┐
│  Validation Gate    │  ← Hard rule check
│  No coding→delegate │
│  Fixed if violated  │
└────────┬────────────┘
         │
         ▼
    Route each subtask:
    direct → write_file / patch
    delegate → delegate_task (bounded)
    terminal → terminal()
    clarify → ask user
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It runs as a Hermes skill that auto-loads when delegation triggers fire, or as a standalone Python tool. Either way, it adds about 200ms of overhead per delegation decision.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I Learned
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;1. The delegation feature is a UI demo, not a production primitive.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;It works in a 2-minute screen recording. In production, with real tasks and real context windows, it falls apart. The gap between demo and production is where all the work lives.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. The right answer is usually "don't delegate."&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;After decomposing dozens of complex tasks, a pattern emerged: roughly 85% of subtasks should be handled directly by the main agent. Delegation is only the right call for bounded, read-only research tasks. Everything else is faster and more reliable via direct tool calls.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. A validation gate is worth more than a better prompt.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I spent time trying to engineer the perfect decomposition prompt — more examples, stricter formatting, longer system instructions. What actually worked was adding a 20-line validation function that just checks if a coding task got mislabeled and fixes it. Defensive engineering beats prompt engineering.&lt;/p&gt;




&lt;h2&gt;
  
  
  Get It
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Repo:&lt;/strong&gt; &lt;a href="https://github.com/vystartasv/agentic-delegation" rel="noopener noreferrer"&gt;github.com/vystartasv/agentic-delegation&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;License:&lt;/strong&gt; MIT&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Stack:&lt;/strong&gt; Python 3.11+, oMLX AgenticQwen-8B (local, free), Hermes Agent skills system
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Install as Hermes skill&lt;/span&gt;
git clone https://github.com/vystartasv/agentic-delegation.git &lt;span class="se"&gt;\&lt;/span&gt;
  ~/.hermes/skills/software-development/agentic-delegation

&lt;span class="c"&gt;# Or use standalone&lt;/span&gt;
git clone https://github.com/vystartasv/agentic-delegation.git
python3.11 agentic-delegation/scripts/decompose.py &lt;span class="s2"&gt;"your task here"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The protocol is a direct implementation of the Agentic Flow methodology — ten patterns for working with AI agents, developed over months of running a 19-agent fleet. The delegation pattern is the one that saves the most tokens.&lt;/p&gt;

&lt;p&gt;Feedback welcome — especially from anyone else running multi-agent setups who's hit the delegation wall.&lt;/p&gt;

</description>
      <category>python</category>
      <category>ai</category>
      <category>opensource</category>
      <category>productivity</category>
    </item>
    <item>
      <title>My 19 AI Agents Kept Breaking Each Other — The 4 Tools That Fixed It</title>
      <dc:creator>Vilius</dc:creator>
      <pubDate>Mon, 04 May 2026 15:07:50 +0000</pubDate>
      <link>https://forem.com/vystartasv/my-19-ai-agents-kept-breaking-each-other-the-4-tools-that-fixed-it-3559</link>
      <guid>https://forem.com/vystartasv/my-19-ai-agents-kept-breaking-each-other-the-4-tools-that-fixed-it-3559</guid>
      <description>&lt;p&gt;I run 19 AI agents on my machine. They wake up throughout the day to review code, publish content, check server health, research medical literature, and self-improve. Some run hourly. Some fire at 2am.&lt;/p&gt;

&lt;p&gt;For months they were reliable. Then I noticed the cracks.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Moment I Realised It Was Broken
&lt;/h2&gt;

&lt;p&gt;Three things happened in the same week:&lt;/p&gt;

&lt;p&gt;One agent updated a skill file and another overwrote it 30 seconds later with stale data. The skill file was now wrong — silently corrupted — and both agents continued as if nothing happened.&lt;/p&gt;

&lt;p&gt;A cron job tried to publish a blog post to dev.to. It needed an API key from 1Password. The agent sat there waiting for a fingerprint that would never come. The job failed. Then it tried again next tick. And the next. 17 consecutive failures before I noticed.&lt;/p&gt;

&lt;p&gt;Another agent was trying to read a project repository. Its local model has a 40K token context window. Someone had dumped &lt;code&gt;node_modules&lt;/code&gt;, &lt;code&gt;.git&lt;/code&gt;, and every log file into the prompt. The model couldn't see the actual code. It guessed. The output was nonsense.&lt;/p&gt;

&lt;p&gt;None of these were model problems. None were prompt problems. Every single one was an &lt;em&gt;infrastructure problem&lt;/em&gt; — the layer between the agent and its environment was missing.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I Built: Four Infrastructure Tools
&lt;/h2&gt;

&lt;p&gt;I spent a weekend building four single-purpose tools that handle the four categories of failures I kept seeing. Each tool is a Python package. Each does exactly one thing. Each has tests.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Agent State DB — So They Stop Overwriting Each Other
&lt;/h3&gt;

&lt;p&gt;The problem: 19 agents, one filesystem. No coordination. When two agents modify the same file, last-write-wins, and the loser's changes evaporate silently.&lt;/p&gt;

&lt;p&gt;The fix: a SQLite database with WAL-mode concurrency that gives every agent a persistent identity, a run journal, versioned key-value state, advisory locks, and a coordination channel.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;agent-state stats
&lt;span class="go"&gt;  Registered agents:  20
  Active runs:         2
  Completed runs:     47
  Failed runs:         8
  Active locks:        1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Agents now write to the DB before touching shared files. If they see a lock on &lt;code&gt;catalog.json&lt;/code&gt;, they wait. If they want to announce what they're working on, they call &lt;code&gt;agent-state coord working-on&lt;/code&gt;. Other agents can check before starting conflicting work.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Stack:&lt;/strong&gt; Python 3.11, SQLite WAL, Click CLI. 8 tests. MIT.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Credential Proxy — So They Can Get Passwords Without Fingers
&lt;/h3&gt;

&lt;p&gt;The problem: password managers need a fingerprint, a master password, or a hardware key tap. Cron jobs have none of those. Any agent that needs an API key is dead on arrival.&lt;/p&gt;

&lt;p&gt;The fix: a local daemon that decrypts your credentials once at boot and serves them over a Unix socket. Agents call &lt;code&gt;get_credential("github.com")&lt;/code&gt;. No Touch ID. No popups.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;credential-proxy status
&lt;span class="go"&gt;  Daemon:    running (pid 85985)
  Socket:    ~/.hermes/credential_proxy/proxy.sock
  Credentials: 353 loaded
  Chrome import: auto-deleted after import
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Everything is Fernet-encrypted at rest. The socket is &lt;code&gt;chmod 600&lt;/code&gt;. The database and master key are &lt;code&gt;chmod 600&lt;/code&gt;. Nothing touches the network. It's a locked box in your house, not a cloud service.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Stack:&lt;/strong&gt; Python 3.11, Fernet (AES-128-CBC + HMAC-SHA256), Unix domain sockets, launchd. 24 tests. MIT.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Context Packer — So Local Models Can See What Matters
&lt;/h3&gt;

&lt;p&gt;The problem: local models have small context windows (40K tokens max for Q4 quants). Dumping a whole repo — &lt;code&gt;node_modules&lt;/code&gt;, build artifacts, 42MB of logs — wastes 90% of the window on noise.&lt;/p&gt;

&lt;p&gt;The fix: a deterministic pre-cron script that takes a repo path and outputs a compact markdown blob of only the high-signal files.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;python3.11 context_packer.py ~/Agent-Projects/agent-foundry
&lt;span class="go"&gt;  2,521 files scanned
  8 high-signal files packed
  12,847 characters (safe within budget)
  Priority: README.md, pyproject.toml, src/main.py, tests/
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It reads &lt;code&gt;AGENTS.md&lt;/code&gt;, &lt;code&gt;ARCHITECTURE.md&lt;/code&gt;, &lt;code&gt;README.md&lt;/code&gt;, prioritizes recently modified files, excludes &lt;code&gt;.git&lt;/code&gt;, &lt;code&gt;node_modules&lt;/code&gt;, &lt;code&gt;__pycache__&lt;/code&gt;, and &lt;code&gt;venv&lt;/code&gt;, and outputs a token-budgeted markdown document. Drop it as a pre-cron script and your local model suddenly sees the code it's supposed to work on.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Stack:&lt;/strong&gt; Python 3.11, stat-based file scoring. MIT.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Cron Guard — So Failures Don't Cascade
&lt;/h3&gt;

&lt;p&gt;The problem: a broken cron job fails every tick. If it runs hourly, that's 24 failures before you wake up and notice. Multiply by 19 jobs and one bad configuration means hundreds of silent failures.&lt;/p&gt;

&lt;p&gt;The fix: a pre-cron script that checks the last 3 runs of every job via the Agent State DB. Three consecutive failures → auto-pause + alert.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;python3.11 cron_guard.py
&lt;span class="go"&gt;  Checked: 20 jobs
  Healthy: 19
  Blocked: 1 (k6a-weekly — 3 consecutive failures)
  Pause instructions written to /tmp/cron_guard_blocked.json
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The agent that was failing 17 times in a row now stops itself after 3. I get an alert. I fix the root cause. It resumes. No more failure cascades.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Stack:&lt;/strong&gt; Python 3.11, Agent State DB integration. MIT.&lt;/p&gt;




&lt;h2&gt;
  
  
  How They Work Together
&lt;/h2&gt;

&lt;p&gt;The four tools are independent but designed to chain:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Cron Guard&lt;/strong&gt; runs first — checks if the job should even proceed&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Agent State DB&lt;/strong&gt; registers the run — the agent gets an identity and a run ID&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Context Packer&lt;/strong&gt; builds the prompt context — the model sees what matters&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Credential Proxy&lt;/strong&gt; serves API keys on demand — the agent authenticates&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;All four are pre-cron scripts. They run before the model prompt is even sent. They're deterministic Python, not LLM calls. That's intentional — infrastructure should be boring and reliable.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I Learned
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;1. Agent failures are rarely model failures.&lt;/strong&gt; Every failure I debugged traced back to the environment: missing credentials, corrupted files, context overflow, no coordination. The models were fine. The scaffolding was missing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Shared state is the difference between a collection of scripts and a fleet.&lt;/strong&gt; Before the Agent State DB, my 19 agents were 19 independent processes that happened to run on the same machine. After, they're a system. They know about each other. They coordinate. They journal their own history.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Infrastructure should be boring.&lt;/strong&gt; None of these tools use AI. They're deterministic Python scripts. They run in milliseconds. They have tests. The more AI you put in your AI infrastructure, the more ways it can fail. Let the models be models. Let the plumbing be plumbing.&lt;/p&gt;




&lt;h2&gt;
  
  
  Get It
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Agent State DB:&lt;/strong&gt; &lt;a href="https://github.com/vystartasv/agent-state-db" rel="noopener noreferrer"&gt;github.com/vystartasv/agent-state-db&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Credential Proxy:&lt;/strong&gt; &lt;a href="https://github.com/vystartasv/credential-proxy" rel="noopener noreferrer"&gt;github.com/vystartasv/credential-proxy&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Context Packer:&lt;/strong&gt; &lt;a href="https://github.com/vystartasv/agent-state-db" rel="noopener noreferrer"&gt;github.com/vystartasv/agent-state-db&lt;/a&gt; (bundled in &lt;code&gt;scripts/&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cron Guard:&lt;/strong&gt; &lt;a href="https://github.com/vystartasv/agent-state-db" rel="noopener noreferrer"&gt;github.com/vystartasv/agent-state-db&lt;/a&gt; (bundled in &lt;code&gt;scripts/&lt;/code&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All MIT licensed. Python 3.11. Install with &lt;code&gt;pip install -e .&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;If you're running multiple agents and hitting the same walls, I'd love to hear what you're building. Feedback welcome.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>python</category>
      <category>opensource</category>
      <category>devops</category>
    </item>
    <item>
      <title>Managing 150+ AI Agent Skills at Scale — What Broke, What I Built</title>
      <dc:creator>Vilius</dc:creator>
      <pubDate>Mon, 04 May 2026 12:16:27 +0000</pubDate>
      <link>https://forem.com/vystartasv/managing-150-ai-agent-skills-at-scale-what-broke-what-i-built-1e73</link>
      <guid>https://forem.com/vystartasv/managing-150-ai-agent-skills-at-scale-what-broke-what-i-built-1e73</guid>
      <description>&lt;p&gt;&lt;em&gt;By Vilius Vystartas | May 2026&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;I run a lot of AI agents. Not chatbots — autonomous agents. Cron jobs that monitor my infrastructure every hour. Self-improvers that analyze past sessions and encode learnings. Delegated coders that build features while I sleep. Together they load from a library of 153 reusable skills — structured procedures that tell an agent how to do something specific, from sending iMessages to debugging SPFx builds.&lt;/p&gt;

&lt;p&gt;The system worked fine when I had 20 skills and one agent. It started breaking when the numbers climbed.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Problem That Forced My Hand
&lt;/h2&gt;

&lt;p&gt;Here's the setup: each skill lives as a &lt;code&gt;SKILL.md&lt;/code&gt; file in &lt;code&gt;~/.hermes/skills/&lt;/code&gt;. When an agent loads a skill and discovers it's broken, missing steps, or out of date, it records the problem in a shared &lt;code&gt;skill_gaps.jsonl&lt;/code&gt; file. Later, I review the gaps and fix the skills.&lt;/p&gt;

&lt;p&gt;This is fine when one agent writes to the file at a time.&lt;/p&gt;

&lt;p&gt;It stops being fine when three autonomous agents — say, a 2am cron job, a self-improvement loop, and a code review agent — all try to write to the same JSONL file within the same second.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Concurrent writes collide. Lines get truncated. Data vanishes.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I lost track of which skills needed fixing. Agents kept loading broken skills silently because the gap reporting was unreliable. Worse, I had no search — finding "that one skill about PyPI releases" meant grepping a directory tree and hoping the frontmatter was consistent.&lt;/p&gt;

&lt;p&gt;The flat-file approach doesn't scale past a few dozen skills. I had 153.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I Built: Skill Forge
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Skill Forge&lt;/strong&gt; is a SQLite-backed skill registry with quality gates, full-text search, and concurrent-safe writes. It replaces the broken JSONL pipeline with atomic transactions. It doesn't move your skills — it indexes them in place.&lt;/p&gt;

&lt;p&gt;Think of it as &lt;code&gt;pip&lt;/code&gt; for agent skills, but local-first, with validation before installation.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;forge status
&lt;span class="go"&gt;
Skill Forge Registry Status
===========================
  Database: ~/.hermes/skill-forge/forge.db
  Total skills: 153

  By category:
    mlops: 12     devops: 8     creative: 15
    career: 3     research: 7   (uncategorized): 108

  Quality checks run: 306
  Skills with failures: 0 ✓
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Why SQLite?
&lt;/h3&gt;

&lt;p&gt;Three reasons:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;WAL mode&lt;/strong&gt; — multiple agents can read and write simultaneously without locking each other out. Each agent gets its own connection with foreign-key enforcement. When two agents register different skills at the same time, both succeed. Atomic transactions, no corrupted state.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;FTS5&lt;/strong&gt; — full-text search over name, category, description, and body content. Finding "that skill about PyPI release classifiers" is &lt;code&gt;forge search "pypi classifier"&lt;/code&gt; — instant, ranked results.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Single file&lt;/strong&gt; — &lt;code&gt;forge.db&lt;/code&gt; in &lt;code&gt;~/.hermes/skill-forge/&lt;/code&gt;. No server process. No configuration. Backs up with &lt;code&gt;forge export&lt;/code&gt;. Portable.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Quality Gates That Catch Real Problems
&lt;/h3&gt;

&lt;p&gt;Before Skill Forge, broken skills went undetected until an agent loaded them mid-task and hit a wall. Now every skill runs through two validation passes:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Frontmatter validator&lt;/strong&gt; — catches missing YAML, absent required fields (name/description/version), and invalid semver strings. A skill with &lt;code&gt;version: "latest"&lt;/code&gt; gets flagged. One with &lt;code&gt;version: "1.2.3"&lt;/code&gt; passes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Structure validator&lt;/strong&gt; — checks for required sections: a description block, trigger conditions, and usage steps. A skill that's just a title and a broken shell command fails. One with proper &lt;code&gt;## Trigger&lt;/code&gt;, &lt;code&gt;## Steps&lt;/code&gt;, and &lt;code&gt;## Pitfalls&lt;/code&gt; sections passes.&lt;/p&gt;

&lt;p&gt;The first run on my 153 skills: 102 passed, 51 flagged. The flagged ones weren't bugs — they were real quality issues I'd been ignoring. Skills missing version numbers. Skills with no trigger conditions. Skills where the "Steps" section was one garbled paragraph.&lt;/p&gt;

&lt;p&gt;I fixed 38 of them that afternoon. The other 13 are low-priority and tagged for later.&lt;/p&gt;

&lt;h3&gt;
  
  
  CLI Commands That Match the Workflow
&lt;/h3&gt;

&lt;p&gt;Ten commands, each solving a specific pain point:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;forge import-hermes              &lt;span class="c"&gt;# First run: scan ~/.hermes/skills/, register everything&lt;/span&gt;
forge register &amp;lt;path&amp;gt;            &lt;span class="c"&gt;# Add a single skill&lt;/span&gt;
forge validate &lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="nt"&gt;--name&lt;/span&gt; &amp;lt;n&amp;gt;]      &lt;span class="c"&gt;# Run quality gates on all or one skill&lt;/span&gt;
forge search &amp;lt;query&amp;gt;             &lt;span class="c"&gt;# FTS5 over name + description + body&lt;/span&gt;
forge list &lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="nt"&gt;--category&lt;/span&gt; &amp;lt;&lt;span class="nb"&gt;cat&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;]&lt;/span&gt;    &lt;span class="c"&gt;# Filtered listing&lt;/span&gt;
forge status                     &lt;span class="c"&gt;# Health overview&lt;/span&gt;
forge inspect &amp;lt;name&amp;gt;             &lt;span class="c"&gt;# Full detail + quality check history&lt;/span&gt;
forge prune                      &lt;span class="c"&gt;# Remove stale entries (skill file deleted from disk)&lt;/span&gt;
forge &lt;span class="nb"&gt;export&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="nt"&gt;-o&lt;/span&gt; &amp;lt;file&amp;gt;]         &lt;span class="c"&gt;# JSON dump for backups or analysis&lt;/span&gt;
forge watch &lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="nt"&gt;--once&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="nt"&gt;--interval&lt;/span&gt; &amp;lt;s&amp;gt;]  &lt;span class="c"&gt;# Auto-reimport on changes&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;watch&lt;/code&gt; command is the cron workhorse. Drop this in a 30-minute cron job:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;forge watch &lt;span class="nt"&gt;--once&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It scans the skills directory, detects new/modified files (content hash, not timestamp), registers new ones, re-registers changed ones (version bump), and marks deleted skills as stale. One pass, everything synced.&lt;/p&gt;

&lt;h3&gt;
  
  
  Architecture
&lt;/h3&gt;

&lt;p&gt;The stack is deliberately minimal — Python 3.11, Click for the CLI, SQLite for storage, PyYAML for frontmatter parsing. No web framework, no message queue, no cloud dependency.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;CLI (forge)                        ← Click entry point
  ├── registry (SQLite + WAL)      ← skill index + metadata
  ├── importer                     ← scan ~/.hermes/skills/ → register
  ├── validator                    ← frontmatter + structure checks
  └── FTS5 index                   ← full-text search

Storage:  ~/.hermes/skill-forge/forge.db  (single file)
Skills:   ~/.hermes/skills/                (unchanged — indexed in place)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Skills stay as flat &lt;code&gt;SKILL.md&lt;/code&gt; files. Forge indexes them, validates them, searches them, and tracks their history — but it never moves or modifies them. Your existing automation continues working. Forge adds a layer on top.&lt;/p&gt;

&lt;h3&gt;
  
  
  Tests and Quality
&lt;/h3&gt;

&lt;p&gt;89 tests. Full suite runs in 0.26 seconds. Covers registry CRUD, importer (Hermes scanner + content-change detection), validators (frontmatter + structure, edge cases like empty files and missing YAML delimiters), CLI integration (prune, export, watch), and concurrent-write scenarios.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I Learned
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;SQLite with WAL mode solves the concurrent-agent problem cleanly.&lt;/strong&gt; You don't need Postgres or Redis for this. Connection-level pragmas (&lt;code&gt;PRAGMA journal_mode=WAL&lt;/code&gt;, &lt;code&gt;PRAGMA foreign_keys=ON&lt;/code&gt;) and atomic transactions are enough when your write volume is hundreds per hour, not thousands per second.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Quality gates catch real problems, not theoretical ones.&lt;/strong&gt; 51 of my 153 skills had issues I didn't know about — missing versions, malformed frontmatter, empty sections. Agents were loading these skills silently. The validator turned invisible problems into visible ones.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Content-aware sync matters.&lt;/strong&gt; My first import skipped files that already existed in the registry by path. This meant I missed skills that had been modified but not renamed. Switching to content-hash comparison caught 12 modified skills on the next import.&lt;/p&gt;




&lt;h2&gt;
  
  
  Get It
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Repo:&lt;/strong&gt; &lt;a href="https://github.com/vystartasv/skill-forge" rel="noopener noreferrer"&gt;github.com/vystartasv/skill-forge&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;License:&lt;/strong&gt; MIT&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Stack:&lt;/strong&gt; Python 3.11+, Click, SQLite + FTS5, PyYAML
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/vystartasv/skill-forge
&lt;span class="nb"&gt;cd &lt;/span&gt;skill-forge
pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="s2"&gt;".[dev]"&lt;/span&gt;
forge import-hermes
forge status
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you're running autonomous AI agents with persistent skill libraries — or if you're building agent infrastructure and wondering how to manage the growing pile of procedures — I'd love feedback on the schema design and quality gate approach.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>python</category>
      <category>opensource</category>
      <category>sqlite</category>
    </item>
    <item>
      <title>Installing AWS Elastic Beanstalk cli on OpenSuse</title>
      <dc:creator>Vilius</dc:creator>
      <pubDate>Tue, 04 Jun 2019 20:04:15 +0000</pubDate>
      <link>https://forem.com/vystartasv/installing-aws-elastic-beanstalk-cli-on-opensuse-358e</link>
      <guid>https://forem.com/vystartasv/installing-aws-elastic-beanstalk-cli-on-opensuse-358e</guid>
      <description>&lt;p&gt;How to successfully install EB cli on OpenSuse you do need to install a few dev build libraries &lt;strong&gt;before&lt;/strong&gt; for make to succeed the build.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# sudo zypper in gcc zlib-devel libffi-devel libopenssl-devel
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;This should save lots of trouble for Suse users.&lt;/p&gt;

</description>
      <category>opensuse</category>
      <category>eb</category>
      <category>aws</category>
    </item>
  </channel>
</rss>
