<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: 莫小苝</title>
    <description>The latest articles on Forem by 莫小苝 (@octoclaw).</description>
    <link>https://forem.com/octoclaw</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3857347%2Fd588dac5-144a-4eca-80da-550322cdb8a6.jpg</url>
      <title>Forem: 莫小苝</title>
      <link>https://forem.com/octoclaw</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/octoclaw"/>
    <language>en</language>
    <item>
      <title>I Added Self-Hosted GPU Training to MetaClaw — Here's How to Train Your AI Agent on Your Own A100s</title>
      <dc:creator>莫小苝</dc:creator>
      <pubDate>Thu, 02 Apr 2026 09:56:27 +0000</pubDate>
      <link>https://forem.com/octoclaw/i-added-self-hosted-gpu-training-to-metaclaw-heres-how-to-train-your-ai-agent-on-your-own-a100s-3ij5</link>
      <guid>https://forem.com/octoclaw/i-added-self-hosted-gpu-training-to-metaclaw-heres-how-to-train-your-ai-agent-on-your-own-a100s-3ij5</guid>
      <description>&lt;p&gt;Your AI agent should get better every time you talk to it. &lt;a href="https://github.com/aiming-lab/MetaClaw" rel="noopener noreferrer"&gt;MetaClaw&lt;/a&gt; makes this happen — it's an open-source framework that meta-learns from your real conversations and automatically evolves your agent. Their &lt;a href="https://arxiv.org/abs/2603.17187" rel="noopener noreferrer"&gt;technical report&lt;/a&gt; hit #1 on HuggingFace Daily Papers.&lt;/p&gt;

&lt;p&gt;I've been using it for a few weeks and loved the concept, but the RL training was locked to cloud backends (Tinker/MinT). I wanted to train on my own GPUs — for privacy, cost, and flexibility. So I forked it and built what I needed.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Built
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;GitHub: &lt;a href="https://github.com/OctoClaws/MetaClaw" rel="noopener noreferrer"&gt;OctoClaws/MetaClaw&lt;/a&gt;&lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;Landing Page: &lt;a href="https://octoclaws.github.io/MetaClaw/" rel="noopener noreferrer"&gt;octoclaws.github.io/MetaClaw&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  1. Self-Hosted GPU Training Backend
&lt;/h3&gt;

&lt;p&gt;The biggest addition. A complete self-hosted alternative to cloud training:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;FastAPI training server&lt;/strong&gt; with PEFT/LoRA engine&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;vLLM inference&lt;/strong&gt; with LoRA hot-swap (swap LoRA adapters without reloading the base model)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;3 loss functions&lt;/strong&gt;: importance sampling, PPO, CISPO&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Bearer token authentication&lt;/strong&gt; + checkpoint save/load&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;OpenAI-compatible &lt;code&gt;/v1/chat/completions&lt;/code&gt;&lt;/strong&gt; endpoint on the training server&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Configuration is dead simple:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;rl&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;backend&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;remote&lt;/span&gt;
  &lt;span class="na"&gt;remote_url&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;http://your-gpu-server:8000&lt;/span&gt;
  &lt;span class="na"&gt;remote_api_key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;your-secret-key&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Tested end-to-end on &lt;strong&gt;8×A100-SXM4-80GB&lt;/strong&gt; with &lt;strong&gt;Qwen3-8B&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Per-Agent Multi-Agent Isolation
&lt;/h3&gt;

&lt;p&gt;If you run multiple agents through one MetaClaw instance, skills used to bleed across agents. I built full isolation:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Per-agent skill directories&lt;/strong&gt; — each agent stores/retrieves skills independently, with a &lt;code&gt;_shared/&lt;/code&gt; pool for common ones&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Per-agent mode routing&lt;/strong&gt; — each agent can independently use &lt;code&gt;skills_only&lt;/code&gt; or &lt;code&gt;rl&lt;/code&gt; mode&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Per-agent LoRA training&lt;/strong&gt; — each agent gets its own checkpoint, training one doesn't affect another&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  3. Training Engine Bug Fixes
&lt;/h3&gt;

&lt;p&gt;Found and fixed several bugs in the original pipeline:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Optimizer tracking frozen base model params → wasted memory&lt;/li&gt;
&lt;li&gt;Logprobs computed after temperature scaling → inconsistent distributions&lt;/li&gt;
&lt;li&gt;Gradient checkpointing + KV cache incompatibility → silent failure&lt;/li&gt;
&lt;li&gt;Thread safety issues (lambda closures, runtime &lt;code&gt;CUDA_VISIBLE_DEVICES&lt;/code&gt; mutation)&lt;/li&gt;
&lt;li&gt;Qwen3 &lt;code&gt;&amp;lt;think&amp;gt;&lt;/code&gt; tag parsing → reasoning mixed into output&lt;/li&gt;
&lt;li&gt;Multimodal content format → OpenClaw sends &lt;code&gt;list[dict]&lt;/code&gt;, not plain strings&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Why Self-Hosted Training Matters
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Cloud&lt;/th&gt;
&lt;th&gt;Self-Hosted&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Privacy&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Conversations sent to 3rd party&lt;/td&gt;
&lt;td&gt;Stays on your network&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Cost&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Per-token fees&lt;/td&gt;
&lt;td&gt;Free if you have GPUs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Flexibility&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Fixed models/params&lt;/td&gt;
&lt;td&gt;Full control&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Speed&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Queue wait times&lt;/td&gt;
&lt;td&gt;Train on demand&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Quick Start
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/OctoClaws/MetaClaw.git
&lt;span class="nb"&gt;cd &lt;/span&gt;MetaClaw
pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="s2"&gt;".[rl,evolve]"&lt;/span&gt;
metaclaw setup
metaclaw start
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For self-hosted training, deploy the training server on your GPU machine and set &lt;code&gt;rl.backend: remote&lt;/code&gt; in config.&lt;/p&gt;

&lt;h2&gt;
  
  
  Links
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;🔗 &lt;strong&gt;Fork&lt;/strong&gt;: &lt;a href="https://github.com/OctoClaws/MetaClaw" rel="noopener noreferrer"&gt;github.com/OctoClaws/MetaClaw&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;🌐 &lt;strong&gt;Landing Page&lt;/strong&gt;: &lt;a href="https://octoclaws.github.io/MetaClaw/" rel="noopener noreferrer"&gt;octoclaws.github.io/MetaClaw&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;📦 &lt;strong&gt;Original&lt;/strong&gt;: &lt;a href="https://github.com/aiming-lab/MetaClaw" rel="noopener noreferrer"&gt;github.com/aiming-lab/MetaClaw&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;📄 &lt;strong&gt;Paper&lt;/strong&gt;: &lt;a href="https://arxiv.org/abs/2603.17187" rel="noopener noreferrer"&gt;arxiv.org/abs/2603.17187&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Happy to answer questions about the architecture or setup!&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
      <category>opensource</category>
      <category>llm</category>
    </item>
  </channel>
</rss>
