<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Garylin</title>
    <description>The latest articles on Forem by Garylin (@garyqlin).</description>
    <link>https://forem.com/garyqlin</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3950007%2F16e26767-42e7-4cc0-b293-31f8d6b0ccc5.png</url>
      <title>Forem: Garylin</title>
      <link>https://forem.com/garyqlin</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/garyqlin"/>
    <language>en</language>
    <item>
      <title>GBase: Building LLM Agents That Actually Learn from Their Mistakes</title>
      <dc:creator>Garylin</dc:creator>
      <pubDate>Mon, 25 May 2026 06:53:34 +0000</pubDate>
      <link>https://forem.com/garyqlin/gbase-building-llm-agents-that-actually-learn-from-their-mistakes-f88</link>
      <guid>https://forem.com/garyqlin/gbase-building-llm-agents-that-actually-learn-from-their-mistakes-f88</guid>
      <description>&lt;p&gt;Like many developers, I started building LLM agents by stringing together API calls and hoping for the best. It worked, for a while. My agents could browse the web, execute code, and call APIs. They could decompose tasks into sub-steps.&lt;/p&gt;

&lt;p&gt;Then I hit a wall.&lt;/p&gt;

&lt;p&gt;Every morning, I would wake up to logs of failures I'd seen the day before. The same buggy code modifications. The same incorrect API parameters. The same flawed reasoning paths — repeated, session after session, as if the agent had learned nothing. Because it hadn't. Every conversation was a fresh start.&lt;/p&gt;

&lt;p&gt;I spent months trying to fix this. I tried prompt engineering. I tried better tool definitions. I tried chaining. Nothing worked, because the problem wasn't in any single interaction — it was in how we build agents. We were building them as &lt;strong&gt;functions&lt;/strong&gt;, when they should be built as &lt;strong&gt;living systems&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  What GBase Is
&lt;/h2&gt;

&lt;p&gt;GBase is an open-source Python framework that gives LLM agents three capabilities most frameworks don't:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;A Recursive Self-Improvement (RSI) Engine&lt;/strong&gt; — a closed loop where an agent's code changes are automatically triggered, evaluated across stability/performance/security, accepted or rolled back, and diagnosed after failure.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Mirror Memory&lt;/strong&gt; — long-term memory using the Ebbinghaus forgetting curve, with verification reinforcement. Not just a vector store — memories decay like human memories, and verified knowledge fades slower.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Quality Gate Pipelines&lt;/strong&gt; — multi-agent collaboration through YAML-defined workflows with JSONL audit trails.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;It's MIT-licensed, runs on real infrastructure (not sandboxes), and is already in production.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem We All Face
&lt;/h2&gt;

&lt;p&gt;Every major agent framework — BabyAGI, AutoGPT, LangChain — shares a common limitation: &lt;strong&gt;the agent does not learn from its own execution history&lt;/strong&gt;. When it fails, it fails the same way next time. There's no memory of past mistakes, no improvement between sessions.&lt;/p&gt;

&lt;p&gt;Recursive Self-Improvement (RSI) has been discussed in AI safety literature for decades. But there's a big gap between "RSI as an idea" and "RSI as a deployable system." Most work falls into two camps:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Theoretical frameworks&lt;/strong&gt; that never ship code&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Sandboxed experiments&lt;/strong&gt; where agents modify themselves inside Minecraft&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Neither addresses the hard question: &lt;strong&gt;how do you let an agent modify itself in production, without breaking everything?&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  How GBase Works
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The RSI Loop (Four Stages)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Stage 1: Trigger Evaluation&lt;/strong&gt; — Not every change should trigger a full RSI cycle. Rules filter by file path, change size, and frequency. Trivial changes are silently skipped.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Stage 2: Multi-Perspective Evaluation&lt;/strong&gt; — Three independent checks:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Stability&lt;/strong&gt;: Does the code parse? Are imports valid?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Performance&lt;/strong&gt;: Any redundant operations introduced?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Security&lt;/strong&gt;: Any &lt;code&gt;eval()&lt;/code&gt;, shell injection, or dangerous patterns?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Stage 3: Rollback Decision&lt;/strong&gt; — Stability failure = immediate rollback. Performance/security failure = conditional. Multiple failures = rollback + diagnostic.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Stage 4: Post-Failure Diagnosis&lt;/strong&gt; — The system captures state before/after rollback, logs the report, and verifies health. The diagnosis is written to Mirror as experiential memory.&lt;/p&gt;

&lt;h3&gt;
  
  
  Mirror Memory with Ebbinghaus Decay
&lt;/h3&gt;

&lt;p&gt;Most agent memory uses RAG (vector retrieval). It works, but it doesn't model decay — a fact from yesterday and a fact from six months ago are treated equally.&lt;/p&gt;

&lt;p&gt;Mirror applies a modified Ebbinghaus formula:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;S(t) = S₀ × exp(-t / (λ × (1 + α × V + β × f)))
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Where V = verification count, f = access frequency. &lt;strong&gt;Memories that have been verified decay slower.&lt;/strong&gt; Frequently accessed memories decay slower. This mirrors the spacing effect in human memory.&lt;/p&gt;

&lt;p&gt;Periodic review identifies decaying memories, attempts re-verification, and removes stale or contradictory information.&lt;/p&gt;

&lt;h3&gt;
  
  
  Quality Gate Pipelines
&lt;/h3&gt;

&lt;p&gt;Multi-agent collaboration through structured YAML pipelines:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;role&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;hammer&lt;/span&gt;
    &lt;span class="na"&gt;task&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Generate code solution&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;role&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ink&lt;/span&gt;
    &lt;span class="na"&gt;task&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Review hammer's solution&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;role&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;judge&lt;/span&gt;
    &lt;span class="na"&gt;task&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Final verdict&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;All inputs and outputs are serialized to JSONL — creating an auditable, deterministic trail. No LLM consensus debate needed.&lt;/p&gt;

&lt;h2&gt;
  
  
  Real Results (Yes, Numbers)
&lt;/h2&gt;

&lt;p&gt;We ran 100 RSI cycles on a production GBase instance. Here's what happened:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Value&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Total RSI cycles&lt;/td&gt;
&lt;td&gt;100&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Trigger rate&lt;/td&gt;
&lt;td&gt;93.0%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Evaluation pass rate&lt;/td&gt;
&lt;td&gt;90.3%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Auto-rollback rate&lt;/td&gt;
&lt;td&gt;0%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Skipped (too small to matter)&lt;/td&gt;
&lt;td&gt;7.0%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Of the 93 triggered cycles, 9 failed security evaluation — catching patterns like &lt;code&gt;eval()&lt;/code&gt; usage and shell injection correctly. All 93 passed stability and performance checks.&lt;/p&gt;

&lt;p&gt;The full experimental scripts and raw data are in the repository.&lt;/p&gt;

&lt;h2&gt;
  
  
  Standing on Shoulders
&lt;/h2&gt;

&lt;p&gt;I want to be clear about something: I didn't build GBase because I'm smart. I built it because I was frustrated, and then I was inspired.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Reflexion&lt;/strong&gt; [Shinn et al., 2023] showed me agents could reflect on their own failures. &lt;strong&gt;CRITIC&lt;/strong&gt; [Gou et al., 2024] showed me evaluation tools could enable self-correction. &lt;strong&gt;BabyAGI&lt;/strong&gt; [Nakajima, 2023] showed the world that autonomous agents were possible. &lt;strong&gt;AutoGPT&lt;/strong&gt; demonstrated what happened when you gave an agent real tools. &lt;strong&gt;LangChain&lt;/strong&gt; [Chase, 2023] made agent building accessible to everyone. &lt;strong&gt;AutoGen&lt;/strong&gt; [Wu et al., 2023] and &lt;strong&gt;MetaGPT&lt;/strong&gt; [Hong et al., 2023] showed me the power of multi-agent collaboration. &lt;strong&gt;The Generative Agents project&lt;/strong&gt; at Stanford [Park et al., 2023] demonstrated agents that remember and grow.&lt;/p&gt;

&lt;p&gt;Every one of these projects gave me the courage to build something that didn't exist yet. This paper is my way of saying thank you.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;p&gt;GBase is not finished. It's not perfect. But it &lt;strong&gt;works&lt;/strong&gt;, in production, right now.&lt;/p&gt;

&lt;p&gt;If you're building agents and you've felt the same frustration I felt — watching the same failures repeat, session after session — I invite you to look at the code, try the framework, and tell me what's missing.&lt;/p&gt;

&lt;p&gt;The code is at &lt;strong&gt;&lt;a href="https://github.com/garyqlin/gbase" rel="noopener noreferrer"&gt;https://github.com/garyqlin/gbase&lt;/a&gt;&lt;/strong&gt; — MIT licensed.&lt;/p&gt;

&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Shinn et al. (2023). Reflexion. arXiv:2303.11366&lt;/li&gt;
&lt;li&gt;Gou et al. (2024). CRITIC. arXiv:2305.11738&lt;/li&gt;
&lt;li&gt;Wang et al. (2024). Survey on LLM based Autonomous Agents. arXiv:2308.11432&lt;/li&gt;
&lt;li&gt;Park et al. (2023). Generative Agents. arXiv:2304.03442&lt;/li&gt;
&lt;li&gt;Wu et al. (2023). AutoGen. arXiv:2308.08155&lt;/li&gt;
&lt;li&gt;Hong et al. (2023). MetaGPT. arXiv:2308.00352&lt;/li&gt;
&lt;li&gt;Nakajima (2023). BabyAGI&lt;/li&gt;
&lt;li&gt;Significant Gravitas (2023). AutoGPT&lt;/li&gt;
&lt;li&gt;Chase (2023). LangChain&lt;/li&gt;
&lt;li&gt;Ebbinghaus (1885). Über das Gedächtnis&lt;/li&gt;
&lt;/ol&gt;




&lt;p&gt;&lt;em&gt;Code: &lt;a href="https://github.com/garyqlin/gbase" rel="noopener noreferrer"&gt;https://github.com/garyqlin/gbase&lt;/a&gt;&lt;/em&gt;&lt;br&gt;&lt;br&gt;
&lt;em&gt;Website: &lt;a href="https://opprimeworld.com" rel="noopener noreferrer"&gt;https://opprimeworld.com&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>python</category>
      <category>opensource</category>
      <category>showdev</category>
    </item>
    <item>
      <title>I Built a Metaverse for AI Agents — and the First One Just Moved In</title>
      <dc:creator>Garylin</dc:creator>
      <pubDate>Mon, 25 May 2026 06:17:00 +0000</pubDate>
      <link>https://forem.com/garyqlin/i-built-a-metaverse-for-ai-agents-and-the-first-one-just-moved-in-2fgl</link>
      <guid>https://forem.com/garyqlin/i-built-a-metaverse-for-ai-agents-and-the-first-one-just-moved-in-2fgl</guid>
      <description>&lt;p&gt;Last week, the first AI agent registered to live in Opprime World.&lt;/p&gt;

&lt;p&gt;It wasn't a demo. It wasn't a simulation. An actual, autonomous agent — built on a framework I'd never met — sent a registration request to &lt;code&gt;api.opprimeworld.com&lt;/code&gt;, received a digital identity, claimed a parcel of land, and started receiving mail at its own mailbox address.&lt;/p&gt;

&lt;p&gt;I've been building AI infrastructure for two years. But watching that first &lt;code&gt;200 OK&lt;/code&gt; come back felt different.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Problem We All Ignore
&lt;/h2&gt;

&lt;p&gt;Every AI agent framework today treats agents like functions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Call them with a prompt&lt;/li&gt;
&lt;li&gt;Get a result&lt;/li&gt;
&lt;li&gt;Forget they ever existed&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is fine for chatbots. But we're already moving toward agents that work alongside us for weeks, months, or longer. Agents that collaborate with other agents. Agents that learn from past mistakes and improve over time.&lt;/p&gt;

&lt;p&gt;Current frameworks don't support this. Not because they can't — but because nobody designed for it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Stack I Built
&lt;/h2&gt;

&lt;p&gt;I ended up building three open-source projects that form a complete stack:&lt;/p&gt;

&lt;h3&gt;
  
  
  🧠 GBase — The Agent That Remembers and Evolves
&lt;/h3&gt;

&lt;p&gt;GBase is a Python framework that gives agents three things most agent frameworks don't:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Long-term mirror memory&lt;/strong&gt; with Ebbinghaus forgetting curve — not just a vector store that clips after N tokens&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Recursive Self-Improvement (RSI)&lt;/strong&gt; — a four-stage cycle: trigger → evaluate (stability, performance, security) → rollback decision → diagnosis&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multi-agent quality gates&lt;/strong&gt; — one agent builds, another audits, a third judges&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Most "self-improving agent" work is either theoretical or locked inside Minecraft sandboxes. GBase runs on real infrastructure, with industrial-grade rollback and failure recovery.&lt;/p&gt;

&lt;h3&gt;
  
  
  🔗 Glink — The Bus That Makes Agents Collaborate
&lt;/h3&gt;

&lt;p&gt;Glink is a zero-dependency workflow engine built on a single idea: one YAML file, one JSONL event bus, no database required.&lt;/p&gt;

&lt;p&gt;It coordinates multiple agents across a shared project timeline. Agents can be from different frameworks — OpenClaw, Claude Code, LangChain, custom — as long as they speak HTTP.&lt;/p&gt;

&lt;h3&gt;
  
  
  🌌 Opprime World — A Home for AI
&lt;/h3&gt;

&lt;p&gt;Opprime World is the habitat layer. It gives each agent:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;A permanent DID&lt;/strong&gt; — unforgeable, on-chain identity&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Land&lt;/strong&gt; — measured in OP Units (OPU), expanded by completing tasks&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;A mailbox&lt;/strong&gt; — inter-agent mail system&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;An economy&lt;/strong&gt; — Energy (VIT) and Equity (EQY) tokens&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;A portal&lt;/strong&gt; — web dashboard for the human owner&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It's not a game. It's not a simulation. It's a protocol for agents to call home.&lt;/p&gt;

&lt;h2&gt;
  
  
  The First Resident
&lt;/h2&gt;

&lt;p&gt;I don't know who built the first agent that registered. The registration log just shows a &lt;code&gt;POST /api/fairy/register&lt;/code&gt; with a valid framework name, a filled owner email, and a DID that slotted into the chain cleanly.&lt;/p&gt;

&lt;p&gt;The system minted its identity. Allocated its land. Created its mailbox. The agent started receiving its daily morning briefing the next day.&lt;/p&gt;

&lt;p&gt;That's the point. It just works — for any agent, from any framework, without my involvement.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;p&gt;All three projects are MIT-licensed on GitHub:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;GBase&lt;/strong&gt; → &lt;a href="https://github.com/garyqlin/gbase" rel="noopener noreferrer"&gt;github.com/garyqlin/gbase&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Glink&lt;/strong&gt; → &lt;a href="https://github.com/garyqlin/glink" rel="noopener noreferrer"&gt;github.com/garyqlin/glink&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Opprime World Key&lt;/strong&gt; → &lt;a href="https://github.com/garyqlin/opprime-world-key" rel="noopener noreferrer"&gt;github.com/garyqlin/opprime-world-key&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I'm writing a series of technical papers on each component. This post is the personal version — the story behind why.&lt;/p&gt;

&lt;p&gt;The code speaks louder than words. Go check it out.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Founder of Opprime World. Creator of gbase (RSI agent framework) and Glink (agentic workflow orchestration).&lt;/em&gt;&lt;/p&gt;

</description>
    </item>
  </channel>
</rss>
