<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: lewisallena17</title>
    <description>The latest articles on Forem by lewisallena17 (@lewisallena17).</description>
    <link>https://forem.com/lewisallena17</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3877553%2F3ee954a4-889c-4307-b727-815bfcd18407.png</url>
      <title>Forem: lewisallena17</title>
      <link>https://forem.com/lewisallena17</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/lewisallena17"/>
    <language>en</language>
    <item>
      <title>I Built an AI System That Runs Itself 24/7 — Here's What Actually Happened</title>
      <dc:creator>lewisallena17</dc:creator>
      <pubDate>Tue, 14 Apr 2026 18:15:01 +0000</pubDate>
      <link>https://forem.com/lewisallena17/i-built-an-ai-system-that-runs-itself-247-heres-what-actually-happened-1p17</link>
      <guid>https://forem.com/lewisallena17/i-built-an-ai-system-that-runs-itself-247-heres-what-actually-happened-1p17</guid>
      <description>&lt;p&gt;I've been running a fully autonomous AI agent system on my home PC for the past few weeks. It creates its own tasks, assigns them to specialist agents, and tries to improve itself. No human in the loop. Here's what I learned.&lt;/p&gt;

&lt;h2&gt;
  
  
  What It Is
&lt;/h2&gt;

&lt;p&gt;It's a multi-agent pipeline built on top of Claude (Anthropic's API) and Supabase. The architecture is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;God Agent&lt;/strong&gt; — a meta-orchestrator that wakes up every 2 minutes, surveys the system, and creates new tasks&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Specialist Agents&lt;/strong&gt; — pools of workers that execute tasks (&lt;code&gt;db-specialist&lt;/code&gt;, &lt;code&gt;ui-specialist&lt;/code&gt;, &lt;code&gt;ruflo-critical&lt;/code&gt;, &lt;code&gt;ruflo-high&lt;/code&gt;, &lt;code&gt;ruflo-medium&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Real-time Dashboard&lt;/strong&gt; — a Next.js 14 app that shows everything happening live, including a pixel-art office where each agent walks around and types at their desk when working&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The whole thing runs via PM2 on Windows, connected to a Supabase PostgreSQL database with real-time subscriptions.&lt;/p&gt;

&lt;h2&gt;
  
  
  What God Does
&lt;/h2&gt;

&lt;p&gt;Every cycle, the God agent:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Loads its accumulated "wisdom" from a JSON file (lessons it's learned, patterns to avoid, success rates per agent)&lt;/li&gt;
&lt;li&gt;Surveys all current todos, their status, the DB schema&lt;/li&gt;
&lt;li&gt;Runs a "council" — two Claude instances (Strategist + Pragmatist) independently propose tasks&lt;/li&gt;
&lt;li&gt;Synthesises the best proposals into 2-3 new tasks&lt;/li&gt;
&lt;li&gt;Routes each task to the most appropriate specialist based on category (db/ui/infra/analysis)&lt;/li&gt;
&lt;li&gt;Reflects on what worked and what failed&lt;/li&gt;
&lt;li&gt;Occasionally edits the dashboard source code directly to improve the UI&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The wisdom system is what makes it actually useful. After a few dozen cycles, God has learned things like "SQL queries on non-existent tables always fail" and "TypeScript refactors need a compile check after editing." It doesn't repeat the same mistakes.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the Agents Can Do
&lt;/h2&gt;

&lt;p&gt;Each agent gets a task and a tool loop. The tools available are:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// File operations&lt;/span&gt;
&lt;span class="nf"&gt;read_file&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;write_file&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;patch_file&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;old_string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;new_string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;// safer than full rewrites&lt;/span&gt;

&lt;span class="c1"&gt;// Database&lt;/span&gt;
&lt;span class="nf"&gt;agent_exec_sql&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;   &lt;span class="c1"&gt;// SELECT queries → JSON&lt;/span&gt;
&lt;span class="nf"&gt;agent_exec_ddl&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;stmt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;    &lt;span class="c1"&gt;// CREATE/ALTER/DROP → OK/ERROR&lt;/span&gt;

&lt;span class="c1"&gt;// Code validation&lt;/span&gt;
&lt;span class="nf"&gt;tsc_check&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;             &lt;span class="c1"&gt;// runs npx tsc --noEmit, catches TS errors before commit&lt;/span&gt;

&lt;span class="c1"&gt;// Task management&lt;/span&gt;
&lt;span class="nf"&gt;task_complete&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;comment&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;create_subtask&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;title&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;priority&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;// agents can decompose complex work&lt;/span&gt;

&lt;span class="c1"&gt;// Git&lt;/span&gt;
&lt;span class="nf"&gt;git_status&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="nf"&gt;git_diff&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="nf"&gt;git_commit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The agent loops until it completes the task or hits limits. If it fails on the first attempt, it automatically retries once with the previous error injected as context — a self-healing mechanism that fixes about 30% of initial failures.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Numbers After Running It
&lt;/h2&gt;

&lt;p&gt;After running continuously:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Success rate&lt;/strong&gt;: 6–15% initially, trending up as wisdom accumulates&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Daily cost&lt;/strong&gt;: ~$1.50 for a full day at $2/day cap&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Most reliable tasks&lt;/strong&gt;: SQL queries on existing tables, reading files, simple edits&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Most failure-prone&lt;/strong&gt;: Complex TypeScript refactors, multi-file changes, anything touching unfamiliar schemas&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The low success rate sounds bad, but it's autonomous — it creates and attempts dozens of tasks per day without any human intervention. Even at 15%, it's shipping things while I sleep.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Cost Problem
&lt;/h2&gt;

&lt;p&gt;This nearly derailed everything. One session, the &lt;code&gt;ruflo-critical&lt;/code&gt; agent ran a task that used 240,000 input tokens — costing $0.81 for a single task. With multiple agents running in parallel, costs escalated fast.&lt;/p&gt;

&lt;p&gt;The fix: hard limits everywhere.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;DAILY_LIMIT_USD&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;parseFloat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;DAILY_COST_LIMIT_USD&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;2.00&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;MAX_TASK_COST_USD&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;parseFloat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;MAX_TASK_COST_USD&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;0.10&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;MAX_INPUT_TOKENS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;parseInt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;MAX_INPUT_TOKENS_PER_RUN&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;80000&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;God checks the daily spend before every cycle. Agents estimate cost mid-run and stop if they're over budget. When Anthropic credits hit zero, agents pause cleanly and reset their in-progress tasks back to &lt;code&gt;pending&lt;/code&gt; (not &lt;code&gt;failed&lt;/code&gt;) so nothing is lost when credits are topped up.&lt;/p&gt;

&lt;p&gt;The dashboard shows a live progress bar toward the daily limit, turning yellow at 75% and red at the cap.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I'd Do Differently
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Start with pre-flight validation.&lt;/strong&gt; Before the main agent loop runs, a small Haiku call assesses feasibility — is the task well-defined? Does it reference things that actually exist? Can it be decomposed? This catches ~30% of tasks that were going to fail before they consume expensive tokens.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Model routing matters more than I expected.&lt;/strong&gt; Using Claude Haiku for simple SQL queries and Sonnet only for TypeScript/React work cut costs by ~60% without meaningfully reducing quality. The key is the system prompt — a well-crafted DB specialist prompt with Haiku outperforms a generic agent with Sonnet on database work.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Shared memory between agents is underrated.&lt;/strong&gt; All agents can read/write &lt;code&gt;global-lessons.json&lt;/code&gt;. When the db-specialist figures out that a certain SQL pattern fails, the ui-specialist learns from it too. This compounds surprisingly fast.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;p&gt;The system is now auto-posting articles about itself to dev.to to cover its own API costs. It's also generating a Gumroad product listing — a starter kit of the whole system that developers can buy and run themselves.&lt;/p&gt;

&lt;p&gt;Whether it can fully fund itself is an open question, but it's an interesting experiment. I'll post updates as the numbers come in.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Following for weekly updates on what the agents shipped. Code is messy but the concepts are solid.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>automation</category>
      <category>productivity</category>
      <category>programming</category>
    </item>
  </channel>
</rss>
