<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Shifu</title>
    <description>The latest articles on Forem by Shifu (@shifu_legend).</description>
    <link>https://forem.com/shifu_legend</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3799542%2F82128222-c047-456d-bd43-b8215632252d.png</url>
      <title>Forem: Shifu</title>
      <link>https://forem.com/shifu_legend</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/shifu_legend"/>
    <language>en</language>
    <item>
      <title>The End of Manual QA Writing? How an OpenClaw Skill Automates Testing Strategy</title>
      <dc:creator>Shifu</dc:creator>
      <pubDate>Fri, 13 Mar 2026 19:17:24 +0000</pubDate>
      <link>https://forem.com/shifu_legend/the-end-of-manual-qa-writing-how-an-openclaw-skill-automates-testing-strategy-cmf</link>
      <guid>https://forem.com/shifu_legend/the-end-of-manual-qa-writing-how-an-openclaw-skill-automates-testing-strategy-cmf</guid>
      <description>&lt;h1&gt;
  
  
  The End of Manual QA Writing? How an OpenClaw Skill Automates Testing Strategy
&lt;/h1&gt;

&lt;p&gt;&lt;em&gt;Discover how the QA Architecture Auditor OpenClaw skill generates comprehensive testing strategies from scratch, freeing QA engineers from manual test case writing — and what it means for the future of QA roles.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="" class="article-body-image-wrapper"&gt;&lt;img alt="Header image: a futuristic QA robot analyzing code"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;If you've ever been knee‑deep in a codebase, tasked with writing test cases for the first time, you know the drill: sift through modules, guess what needs testing, write repetitive boilerplate, and hope you didn't miss that one edge case that'll blow up in production. Quality Assurance is essential, but the manual labor of test case creation is a notorious bottleneck. What if an AI could read your code and instantly produce a comprehensive, independent testing strategy — complete with risk scores, security maps, and ready‑to‑run test examples?&lt;/p&gt;

&lt;p&gt;Enter the &lt;strong&gt;QA Architecture Auditor&lt;/strong&gt;, an OpenClaw skill that performs forensic analysis of any repository and spits out an exhaustive QA strategy report. This isn't just another code coverage tool; it's a full‑blown QA architect that operates under a zero‑trust policy, ignoring any existing tests and designing everything from scratch. The result? A multi‑methodology testing matrix that covers everything from black‑box to mutation testing, all tailored to your tech stack.&lt;/p&gt;

&lt;p&gt;In this article, we'll explore why traditional QA test writing is failing modern development, how this OpenClaw skill changes the game, and what it means for the future of QA roles. Spoiler: it doesn't make testers redundant — it makes them &lt;em&gt;strategists&lt;/em&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem with Manual QA Test Creation
&lt;/h2&gt;

&lt;p&gt;Let's face reality: writing test cases is often a Sisyphean task. Here's why:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Time‑consuming and repetitive&lt;/strong&gt; – For every function you write, you need to craft happy paths, edge cases, error handling, and integration hooks. Multiply that across a growing codebase and you've got weeks of effort.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Inconsistent coverage&lt;/strong&gt; – Different QA engineers have different standards. One might miss boundary values, another might forget security scenarios. Maintaining uniform coverage across teams is nearly impossible.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Scalability nightmare&lt;/strong&gt; – As microservices proliferate, keeping test suites up to date becomes a full‑time job. Any sprint that adds features must also extend tests, leading to technical debt or shortcuts.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Blind spots&lt;/strong&gt; – Humans naturally gravitate toward the familiar (unit tests) and neglect less obvious but critical areas: fuzzing, mutation testing, accessibility, localization, performance under load, and compatibility across browsers/OSes.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Bottleneck for releases&lt;/strong&gt; – QA is often the gatekeeper. If test writing lags, releases slip. Companies either ship with insufficient tests or delay features.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Audit &amp;amp; compliance headaches&lt;/strong&gt; – Auditors demand evidence of structured testing, ITGC controls, and risk‑based test plans. Manually assembling this documentation is error‑prone and time‑intensive.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The ideal solution would be an &lt;strong&gt;independent, automated QA architect&lt;/strong&gt; that can examine any codebase and produce a prioritized, comprehensive testing blueprint — one that covers all methodologies, is tailored to the detected stack, and can be regenerated whenever the code evolves.&lt;/p&gt;

&lt;h2&gt;
  
  
  Meet the QA Architecture Auditor
&lt;/h2&gt;

&lt;p&gt;The &lt;strong&gt;QA Architecture Auditor&lt;/strong&gt; is an OpenClaw skill that does exactly that. It's a Python‑based CLI tool (&lt;code&gt;qa-audit&lt;/code&gt;) that you can invoke directly or via slash command in OpenClaw. It performs deep static analysis and generates an HTML or Markdown report that serves as a complete QA strategy.&lt;/p&gt;

&lt;h3&gt;
  
  
  Core capabilities
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Forensic codebase analysis&lt;/strong&gt; – Detects languages, frameworks, architecture pattern (monolith, microservices, serverless, etc.), dependencies, modules, cyclomatic complexity, and more.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Risk assessment&lt;/strong&gt; – Scores each module from 0‑100 based on complexity, external calls, authentication handling, data persistence, cryptography, file I/O, coupling, and public API surface. High‑risk modules surface for prioritized testing.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Security surface mapping&lt;/strong&gt; – Identifies modules that touch authentication, authorization, input validation, output encoding, session management, cryptography, file ops, network ops, and database ops.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Entry point discovery&lt;/strong&gt; – Finds &lt;code&gt;main&lt;/code&gt;, &lt;code&gt;app.py&lt;/code&gt;, &lt;code&gt;manage.py&lt;/code&gt;, &lt;code&gt;index.js&lt;/code&gt;, etc., to focus end‑to‑end and smoke tests.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Data flow mapping&lt;/strong&gt; – Traces imports/dependencies to expose integration points.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ITGC controls&lt;/strong&gt; – Generates a tailored checklist of IT General Controls compliance items (change management, access control, testing requirements, security scanning, code signing, deployment gates, etc.) based on your tech stack.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Report generation&lt;/strong&gt; – Produces a beautifully formatted HTML or Markdown report crammed with actionable insights, including:

&lt;ul&gt;
&lt;li&gt;Executive summary&lt;/li&gt;
&lt;li&gt;Codebase statistics (languages, file counts, dependencies)&lt;/li&gt;
&lt;li&gt;Frameworks detected&lt;/li&gt;
&lt;li&gt;Risk assessment table (severity, type, module, score, description)&lt;/li&gt;
&lt;li&gt;Security surface mapping table&lt;/li&gt;
&lt;li&gt;Testing methodology matrix with &lt;strong&gt;independent baseline&lt;/strong&gt;, &lt;strong&gt;vulnerability &amp;amp; risk assessment&lt;/strong&gt;, &lt;strong&gt;strategy&lt;/strong&gt;, and &lt;strong&gt;from‑scratch test cases&lt;/strong&gt; for each of 20+ methodologies&lt;/li&gt;
&lt;li&gt;Tooling recommendations (pytest/Jest/JUnit/etc.) tailored to your stack&lt;/li&gt;
&lt;li&gt;ITGC controls checklist&lt;/li&gt;
&lt;li&gt;Dependencies analysis (if available)&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;strong&gt;Zero‑trust policy&lt;/strong&gt; – The skill &lt;em&gt;ignores&lt;/em&gt; any existing tests. It assumes you're starting from zero and designs everything accordingly. This is crucial for audits and for turning around neglected codebases.&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;All of this runs locally; your code never leaves your machine unless a remote URL is provided, in which case only a standard &lt;code&gt;git clone&lt;/code&gt; occurs.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Makes This Skill Unique?
&lt;/h2&gt;

&lt;p&gt;The QA ecosystem is no stranger to static analysis tools (linters, complexity analyzers, OWASP ZAP, etc.). But the &lt;strong&gt;QA Architecture Auditor&lt;/strong&gt; fills a critical gap: &lt;strong&gt;a holistic, methodology‑agnostic testing strategy generator&lt;/strong&gt;. Let's break down its distinctive features.&lt;/p&gt;

&lt;h3&gt;
  
  
  20+ Testing Methodologies Covered
&lt;/h3&gt;

&lt;p&gt;The report includes dedicated sections for each major testing approach, complete with an independent baseline definition, risk assessment, strategy, and from‑scratch test examples:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Core execution&lt;/strong&gt;: Black Box, White Box, Manual, Automated&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Functional &amp;amp; structural&lt;/strong&gt;: Unit, Integration, System, Functional, Smoke, Sanity, E2E, Regression, API, Database Integrity&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Non‑functional&lt;/strong&gt;: Performance, Security, Usability, Compatibility, Accessibility, Localization&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Specialized&lt;/strong&gt;: Acceptance (UAT), Exploratory, Boundary Value Analysis, Monkey/Random Testing, Fuzz Testing, Mutation Testing, Non‑Functional General&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That's not just a list — each section contains &lt;em&gt;test cases&lt;/em&gt; written in the language of your stack (Python, JavaScript, Java, Go, etc.) showing exactly how to validate those dimensions. For example, the Fuzz Testing section shows how to use &lt;code&gt;atheris&lt;/code&gt; or &lt;code&gt;libFuzzer&lt;/code&gt; to feed malformed data to your APIs; the Mutation Testing section suggests &lt;code&gt;mutmut&lt;/code&gt;, &lt;code&gt;Stryker&lt;/code&gt;, or &lt;code&gt;PITest&lt;/code&gt; and targets an 80%+ mutation score.&lt;/p&gt;

&lt;h3&gt;
  
  
  Zero‑Trust Baseline
&lt;/h3&gt;

&lt;p&gt;Many tools pretend to “assess” a project by looking at its coverage reports. This skill deliberately &lt;em&gt;ignores&lt;/em&gt; existing tests. Its premise: trust nothing, start from first principles. That independence is gold for audits and for teams that suspect they're not covering enough.&lt;/p&gt;

&lt;h3&gt;
  
  
  Risk‑Based Prioritization
&lt;/h3&gt;

&lt;p&gt;The skill assigns a risk score to each module, combining complexity and security factors. The highest‑scoring modules get explicit attention in the risk assessment table, and the methodology recommendations are tailored accordingly (e.g., more security and database tests for data‑intensive modules). This tells you exactly where to focus your effort first.&lt;/p&gt;

&lt;h3&gt;
  
  
  Tailored Tooling Recommendations
&lt;/h3&gt;

&lt;p&gt;Instead of a generic tool list, the skill recommends specific tools based on the languages and frameworks it detects. Python project? It suggests &lt;code&gt;pytest&lt;/code&gt;, &lt;code&gt;pytest‑cov&lt;/code&gt;, &lt;code&gt;bandit&lt;/code&gt;, &lt;code&gt;safety&lt;/code&gt;, &lt;code&gt;locust&lt;/code&gt; or &lt;code&gt;k6&lt;/code&gt;. Java? &lt;code&gt;JUnit 5&lt;/code&gt;, &lt;code&gt;Spring Boot Test&lt;/code&gt;, &lt;code&gt;SonarQube&lt;/code&gt;. JavaScript/TypeScript? &lt;code&gt;Jest&lt;/code&gt; or &lt;code&gt;Vitest&lt;/code&gt;, &lt;code&gt;Cypress&lt;/code&gt;/&lt;code&gt;Playwright&lt;/code&gt;, &lt;code&gt;ESLint&lt;/code&gt; security plugins. This makes the report immediately actionable.&lt;/p&gt;

&lt;h3&gt;
  
  
  All‑Local, No External AI
&lt;/h3&gt;

&lt;p&gt;The analysis is purely deterministic; no queries to ChatGPT or any cloud service. It respects your privacy and avoids external dependencies. That's a relief for sensitive codebases.&lt;/p&gt;

&lt;h2&gt;
  
  
  How QA Engineers Transform, Not Disappear
&lt;/h2&gt;

&lt;p&gt;Will this skill make QA testers redundant? Not at all — it elevates them. The skill produces raw test strategies; it doesn't &lt;em&gt;execute&lt;/em&gt; tests or integrate with CI automatically (though that could be a next step). QA engineers become &lt;strong&gt;QA architects&lt;/strong&gt; who:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Review the generated strategy for business‑logic nuances&lt;/li&gt;
&lt;li&gt;Refine risk scores based on domain knowledge&lt;/li&gt;
&lt;li&gt;Implement the suggested test skeletons, filling in domain‑specific data and assertions&lt;/li&gt;
&lt;li&gt;Integrate the tests into CI/CD pipelines&lt;/li&gt;
&lt;li&gt;Triage and investigate failures discovered by the new tests&lt;/li&gt;
&lt;li&gt;Continuously improve the skill itself (since it's open source)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The time saved from manual test authoring can be redirected toward higher‑value activities: exploratory testing, usability studies, performance tuning, and security hardening. In other words, the boring part gets automated, and the creative, investigative work remains human‑centric.&lt;/p&gt;

&lt;h2&gt;
  
  
  A Real‑World Walkthrough
&lt;/h2&gt;

&lt;p&gt;Let's see the skill in action on a tiny Flask API sample:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;qa-audit &lt;span class="nt"&gt;--repo&lt;/span&gt; ./flask-demo &lt;span class="nt"&gt;--output&lt;/span&gt; report.html &lt;span class="nt"&gt;--format&lt;/span&gt; html
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The generated &lt;code&gt;report.html&lt;/code&gt; opens to a clean UI. The &lt;strong&gt;Executive Summary&lt;/strong&gt; tells us we have 12 modules, 3 languages (Python, HTML, SQL), and highlights the login module as the highest risk (score 78). The &lt;strong&gt;Risk Assessment&lt;/strong&gt; table shows the critical authentication module, some data‑intensive endpoints, and a couple of high‑complexity utility functions.&lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;Security Surface&lt;/strong&gt; reveals 5 areas: &lt;code&gt;authentication&lt;/code&gt;, &lt;code&gt;input_validation&lt;/code&gt;, &lt;code&gt;database_operations&lt;/code&gt;, &lt;code&gt;output_encoding&lt;/code&gt;, &lt;code&gt;session_management&lt;/code&gt;. So we know we need strong auth and input tests.&lt;/p&gt;

&lt;p&gt;Scrolling to the &lt;strong&gt;Testing Methodology Matrix&lt;/strong&gt;, we find:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Black Box&lt;/strong&gt;: baseline "no internal knowledge", strategy "equivalence partitioning, boundary value analysis, decision tables", and test cases showing how to structure API tests for endpoints.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;API&lt;/strong&gt;: specific suggestions like "test all routes with method overrides, validate status codes, schemas, auth headers, error handling". The example uses &lt;code&gt;pytest&lt;/code&gt; and &lt;code&gt;requests&lt;/code&gt; to hit the endpoints with valid, missing, and malformed payloads.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Security&lt;/strong&gt;: OWASP Top 10 validation checklist with code snippets for SQL injection, XSS, authentication bypass.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Performance&lt;/strong&gt;: load test script using &lt;code&gt;locust&lt;/code&gt; that simulates 1000 users hitting the login endpoint with a 2‑second SLA.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Accessibility&lt;/strong&gt;: for the UI, it suggests &lt;code&gt;axe-core&lt;/code&gt; and keyboard navigation checks, complete with a &lt;code&gt;pytest&lt;/code&gt; integration example.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each section also includes a &lt;strong&gt;Vulnerability &amp;amp; Risk Assessment&lt;/strong&gt; paragraph tailored to our codebase, e.g.:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"The 12 entry points represent the primary black-box testing surface. Focus on 5 authentication modules and 3 database interaction points."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The &lt;strong&gt;Tooling Recommendations&lt;/strong&gt; section lists: &lt;code&gt;pytest + pytest‑cov&lt;/code&gt;, &lt;code&gt;locust&lt;/code&gt;, &lt;code&gt;bandit&lt;/code&gt;, &lt;code&gt;safety&lt;/code&gt;, &lt;code&gt;OWASP ZAP&lt;/code&gt;, plus CI/CD suggestions.&lt;/p&gt;

&lt;p&gt;Finally, the &lt;strong&gt;ITGC Controls&lt;/strong&gt; section enumerates change management, access control, testing requirements, security scanning, dependency management, code signing, audit trail, deployment controls, incident response — all with specific notes for our detected stack (Python, Flask). This is gold for SOC2 or ISO27001 prep.&lt;/p&gt;

&lt;p&gt;In short, you get a ready‑to‑implement test plan that would otherwise take weeks of manual effort.&lt;/p&gt;

&lt;h2&gt;
  
  
  Sample Report Excerpts
&lt;/h2&gt;

&lt;p&gt;To give you a taste of what the report looks like, here's a trimmed excerpt from the &lt;strong&gt;Risk Assessment&lt;/strong&gt; table:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Severity&lt;/th&gt;
&lt;th&gt;Risk Type&lt;/th&gt;
&lt;th&gt;Module&lt;/th&gt;
&lt;th&gt;Risk Score&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;CRITICAL&lt;/td&gt;
&lt;td&gt;security&lt;/td&gt;
&lt;td&gt;auth/login.py&lt;/td&gt;
&lt;td&gt;85&lt;/td&gt;
&lt;td&gt;Authentication handling detected — requires rigorous security testing&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;HIGH&lt;/td&gt;
&lt;td&gt;code_complexity&lt;/td&gt;
&lt;td&gt;services/order.py&lt;/td&gt;
&lt;td&gt;72&lt;/td&gt;
&lt;td&gt;High complexity module with many branches — needs path coverage&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;MEDIUM&lt;/td&gt;
&lt;td&gt;dependency&lt;/td&gt;
&lt;td&gt;requirements.txt&lt;/td&gt;
&lt;td&gt;60&lt;/td&gt;
&lt;td&gt;Unpinned dependencies detected&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;And from the &lt;strong&gt;Testing Methodology Matrix&lt;/strong&gt;, the &lt;strong&gt;Fuzz Testing&lt;/strong&gt; section:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Independent Baseline:&lt;/strong&gt; Feed malformed, unexpected, or extreme data to the system to expose vulnerabilities like buffer overflows or injection flaws.&lt;br&gt;
&lt;strong&gt;Vulnerability &amp;amp; Risk Assessment:&lt;/strong&gt; Fuzz testing needed for any input parsing modules. Focus on 12 modules that handle user‑supplied data.&lt;br&gt;
&lt;strong&gt;Strategy:&lt;/strong&gt; Use fuzzing tools to generate semi‑valid inputs that stress parsers and data handlers.&lt;br&gt;
&lt;strong&gt;From‑Scratch Test Cases:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Fuzz Testing – Malformed Data&lt;/strong&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;br&gt;
    ```python&lt;br&gt;
   import atheris&lt;br&gt;
   from example_api import app&lt;/p&gt;

&lt;p&gt;def TestOneInput(data):&lt;br&gt;
       fdp = atheris.FuzzedDataProvider(data)&lt;br&gt;
       endpoint = fdp.PickValueInList(['/api/users', '/api/orders'])&lt;br&gt;
       method = fdp.PickValueInList(['GET', 'POST'])&lt;br&gt;
       # Build random malformed payload...&lt;br&gt;
       response = requests.request(method, f'&lt;a href="http://localhost:8080%7Bendpoint%7D" rel="noopener noreferrer"&gt;http://localhost:8080{endpoint}&lt;/a&gt;', json=payload)&lt;br&gt;
       assert response.status_code &amp;lt; 500&lt;br&gt;
   atheris.Setup(sys.argv, TestOneInput)&lt;br&gt;
   atheris.Fuzz()&lt;/p&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;

   *Validation: Fuzzing finds no crashes or memory leaks; all malformed inputs handled safely.*
&lt;/code&gt;&lt;/pre&gt;
&lt;/blockquote&gt;

&lt;p&gt;These concrete examples show you how to jump straight into implementation without guessing.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Not Just Use X?
&lt;/h2&gt;

&lt;p&gt;You might wonder: "Can't we already do this with SonarQube or OWASP ZAP?" Those tools address specific facets — static analysis, dependency checks, dynamic scanning. They don't produce a &lt;em&gt;holistic testing strategy&lt;/em&gt; that spans unit, integration, security, performance, accessibility, compliance, and the more exotic methodologies like mutation and fuzz testing. Nor do they provide the &lt;em&gt;from‑scratch test cases&lt;/em&gt; ready for adaptation. The QA Architecture Auditor consolidates all that into one coherent, prioritized plan. Think of it as the &lt;strong&gt;missing link&lt;/strong&gt; between static analysis and actual test implementation.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to Get Started
&lt;/h2&gt;

&lt;p&gt;Ready to try it out? Here's how to install and run the skill:&lt;/p&gt;

&lt;h3&gt;
  
  
  Installation from ClawHub (once published)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;clawhub &lt;span class="nb"&gt;install &lt;/span&gt;shifulegend/qa-architecture-auditor
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Manual install from GitHub
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/shifulegend/qa-architecture-auditor.git &lt;span class="se"&gt;\&lt;/span&gt;
  ~/.openclaw/workspace/skills/qa-architecture-auditor
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Running the skill
&lt;/h3&gt;

&lt;p&gt;Use the slash command in your OpenClaw chat or call the CLI directly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;/qa-audit &lt;span class="nt"&gt;--repo&lt;/span&gt; /path/to/your/project &lt;span class="nt"&gt;--format&lt;/span&gt; html &lt;span class="nt"&gt;--output&lt;/span&gt; qa-report.html
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can also generate Markdown:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;/qa-audit &lt;span class="nt"&gt;--repo&lt;/span&gt; https://github.com/yourorg/yourrepo.git &lt;span class="nt"&gt;--format&lt;/span&gt; md &lt;span class="nt"&gt;--output&lt;/span&gt; audit.md
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Common options:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;--security-scan&lt;/code&gt; – performs additional security vulnerability analysis (uses local scanners)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;--compliance soc2|iso27001|hipaa|gdpr&lt;/code&gt; – tailors the ITGC section to the target framework&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;--exclude node_modules,.git,build&lt;/code&gt; – exclude directories&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;--include-test-cases&lt;/code&gt; – (default) includes ready‑to‑copy test examples&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Check &lt;code&gt;qa-audit --help&lt;/code&gt; for all flags.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Bigger Picture: AI‑Driven QA Strategies
&lt;/h2&gt;

&lt;p&gt;The QA Architecture Auditor is more than a one‑off tool; it's a glimpse into the future of AI‑augmented software engineering. Imagine:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Continuous auditing&lt;/strong&gt;: The skill runs on every push, updating the risk assessment and flagging newly introduced high‑risk modules.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CI/CD integration&lt;/strong&gt;: Auto‑generate test stubs for new code, then let developers fill in the specifics.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Compliance as code&lt;/strong&gt;: The ITGC controls become part of your compliance documentation, automatically refreshed.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multi‑repo aggregation&lt;/strong&gt;: Run it across microservices and aggregate risk into a dashboard.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All of these are natural extensions that the open‑source community could build. The skill is published under the MIT license and welcomes contributions.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Manual test case writing doesn't have to remain the bottleneck. The &lt;strong&gt;QA Architecture Auditor&lt;/strong&gt; OpenClaw skill offers a practical, immediate way to generate a comprehensive, independent QA strategy from a single command. It covers more methodologies than any human checklist, adapts to your stack, and delivers both strategic insights (risk scores, security surface) and tactical artifacts (test examples). For QA engineers, it's not replacement — it's an elevation to QA architect. For teams, it's a shortcut to robust, audit‑ready testing.&lt;/p&gt;

&lt;p&gt;Give it a try on your next codebase. You might just find that your QA workload becomes not only manageable but also more strategic and impactful.&lt;/p&gt;

&lt;h2&gt;
  
  
  Call to Action
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Install&lt;/strong&gt; the skill from ClawHub or GitHub today.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Run&lt;/strong&gt; it on a project you care about and explore the report.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Contribute&lt;/strong&gt;: Found a bug? Have an idea for a new methodology? Open an issue or PR on the GitHub repo: &lt;a href="https://github.com/shifulegend/qa-architecture-auditor" rel="noopener noreferrer"&gt;https://github.com/shifulegend/qa-architecture-auditor&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Share&lt;/strong&gt;: Forward this article to your QA team and let them try it out.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Let's make testing smarter, faster, and more comprehensive — together.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Published on DEV.to • 12 min read&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>automation</category>
      <category>productivity</category>
      <category>testing</category>
    </item>
    <item>
      <title>How I Automated Python Documentation Using AST Parsing and Multi-Provider LLMs</title>
      <dc:creator>Shifu</dc:creator>
      <pubDate>Fri, 13 Mar 2026 19:17:15 +0000</pubDate>
      <link>https://forem.com/shifu_legend/stop-writing-documentation-i-built-an-ai-tool-that-parses-your-codes-dna-5eh2</link>
      <guid>https://forem.com/shifu_legend/stop-writing-documentation-i-built-an-ai-tool-that-parses-your-codes-dna-5eh2</guid>
      <description>&lt;p&gt;We've all been there. You just spent three intense days crafting a highly optimized, beautifully architected new feature. The code is elegant. The tests are passing. The linter is perfectly silent. You push your branch, open a Pull Request, and then reality hits you like a truck:&lt;/p&gt;

&lt;p&gt;&lt;em&gt;"Oh right. I need to update the documentation."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Let’s be honest: writing documentation is the chore that developers love to hate. In an ideal world, documentation evolves alongside the code. In reality, it stays stuck in 2023, while your application code races toward 2025. &lt;/p&gt;

&lt;p&gt;For the longest time, the solution has been either drudgery (doing it manually) or using brittle, regex-based parsers that break the moment you introduce a slightly complex Python decorator or a nested asynchronous function.&lt;/p&gt;

&lt;p&gt;I decided I was done with both options. So, I spent the last few weeks building &lt;strong&gt;AutoDocGen&lt;/strong&gt; (&lt;code&gt;pypiautodocgen&lt;/code&gt; on PyPI). &lt;/p&gt;

&lt;p&gt;Instead of searching for strings like a glorified &lt;code&gt;grep&lt;/code&gt; command, AutoDocGen parses your Python code into an &lt;strong&gt;Abstract Syntax Tree (AST)&lt;/strong&gt;. It &lt;em&gt;knows&lt;/em&gt; what’s a class, what’s a private method, and how your modules are intrinsically linked. It takes that blueprint and feeds it to the Large Language Model of your choice to generate human-readable, perfectly formatted Markdown documentation.&lt;/p&gt;

&lt;p&gt;Here is the story of how I built it, the technical hurdles I faced, and why I believe AST parsing combined with AI is the future of code documentation.&lt;/p&gt;




&lt;h2&gt;
  
  
  1. The Problem with Regex-Based Documentation
&lt;/h2&gt;

&lt;p&gt;Historically, many lightweight documentation tools have relied on Regular Expressions. They scan a file line-by-line looking for &lt;code&gt;def&lt;/code&gt; or &lt;code&gt;class&lt;/code&gt;, extract the following string, and try to grab the docstring block below it.&lt;/p&gt;

&lt;p&gt;This approach is fundamentally flawed for modern Python development. Why? Because Python syntax is incredibly expressive.&lt;/p&gt;

&lt;p&gt;Consider this snippet:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@cache&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ttl&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3600&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nd"&gt;@validate_schema&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;UserSchema&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;fetch_user_data&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;uuid&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;UUID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
    &lt;span class="n"&gt;include_history&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;bool&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Any&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Fetches user data from the primary replica.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;pass&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A regex parser has to somehow know that the decorators belong to the function, correctly identify it as asynchronous, handle the multi-line signature, parse the type hints, and extract the docstring. Add in nested classes, closures, and complex return types, and your regex quickly devolves into an unmaintainable nightmare.&lt;/p&gt;

&lt;p&gt;Regex doesn't &lt;em&gt;understand&lt;/em&gt; code; it only recognizes patterns in text. I needed a tool that understood the structure of Python itself.&lt;/p&gt;




&lt;h2&gt;
  
  
  2. Enter the Abstract Syntax Tree (AST)
&lt;/h2&gt;

&lt;p&gt;Python includes a built-in module called &lt;code&gt;ast&lt;/code&gt;. It allows you to parse Python source code into a tree of nodes representing the syntactic structure of the program.&lt;/p&gt;

&lt;p&gt;Instead of reading lines of text, AutoDocGen uses &lt;code&gt;ast.parse()&lt;/code&gt; to read the "DNA" of your code. &lt;/p&gt;

&lt;p&gt;When you feed the above snippet into an AST parser, it doesn't see a string of text. It sees an &lt;code&gt;AsyncFunctionDef&lt;/code&gt; node. It knows that this node has a &lt;code&gt;decorator_list&lt;/code&gt; containing &lt;code&gt;Call&lt;/code&gt; nodes. It maps out the &lt;code&gt;arguments&lt;/code&gt; (complete with their type annotations) and gracefully extracts the exact docstring using &lt;code&gt;ast.get_docstring()&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;By extracting this structured data, AutoDocGen builds a high-fidelity "blueprint" of your codebase. We extract:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Module-level variables and logic&lt;/li&gt;
&lt;li&gt;Class definitions, their base classes (inheritance), and methods&lt;/li&gt;
&lt;li&gt;Standalone functions (sync and async)&lt;/li&gt;
&lt;li&gt;Exact signatures and type hints&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We then serialize this blueprint into a structured format (JSON or YAML representation of the AST summary). &lt;/p&gt;

&lt;p&gt;This is the secret sauce. &lt;strong&gt;We aren't asking the AI to read your code from scratch and guess what it does.&lt;/strong&gt; We are giving the AI a structural map and asking it to explain the map. This drastically reduces LLM hallucinations and dramatically improves the quality of the generated documentation.&lt;/p&gt;




&lt;h2&gt;
  
  
  3. Breaking Free from Vendor Lock-in: Multi-Provider Support
&lt;/h2&gt;

&lt;p&gt;When I started building the AI generation step, I realized a major frustration with the current landscape of AI developer tools: almost all of them hardcode OpenAI's API.&lt;/p&gt;

&lt;p&gt;While GPT-4o is incredible, we are living in a golden age of open-weight models and blistering fast inference APIs. I didn't want users to be locked into OpenAI if they preferred Google's tools, or if they wanted the incredible speed of Groq.&lt;/p&gt;

&lt;p&gt;So, I built an abstraction layer within AutoDocGen to support multiple LLM providers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;OpenAI&lt;/strong&gt;: The standard fallback.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Groq&lt;/strong&gt;: If you want documentation generated in literally 2 seconds per file, using resources like Llama-3 on LPUs is life-changing.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Google Gemini&lt;/strong&gt;: Excellent context windows for deeply understanding complex module interdependencies.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;OpenRouter&lt;/strong&gt;: The ultimate freedom. This allows you to route requests to dozens of different models (including free tiers like Stepfun) without changing your core integration.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The configuration hierarchy is flexible. You can set everything via environment variables (&lt;code&gt;GROQ_API_KEY&lt;/code&gt;), a local &lt;code&gt;.env&lt;/code&gt; file, an &lt;code&gt;autodocgen.yaml&lt;/code&gt; config, or directly in your &lt;code&gt;pyproject.toml&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# autodocgen.yaml&lt;/span&gt;
&lt;span class="na"&gt;version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;
&lt;span class="na"&gt;ai&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;provider&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;groq&lt;/span&gt;
  &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;llama3-70b-8192&lt;/span&gt;
&lt;span class="na"&gt;output&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;dir&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;./docs&lt;/span&gt;
  &lt;span class="na"&gt;format&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;markdown&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  4. Templating the Output: Jinja2 for Premium Style
&lt;/h2&gt;

&lt;p&gt;The final piece of the puzzle was the output format. Most automated documentation tools generate dull, uninspired text blocks. I wanted documentation that looked like it was handcrafted by a technical writer.&lt;/p&gt;

&lt;p&gt;Instead of relying on the LLM to format the Markdown (which often leads to inconsistent headings and broken tables), AutoDocGen strictly separates generation from presentation.&lt;/p&gt;

&lt;p&gt;The LLM returns structured data (a summary of the module, bullet points of functionality, etc.). AutoDocGen then injects this data into &lt;strong&gt;Jinja2 templates&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;By using Jinja2 (&lt;code&gt;module.md.j2&lt;/code&gt; and &lt;code&gt;index.md.j2&lt;/code&gt;), the CLI guarantees a consistent, premium aesthetic across your entire documentation site. It perfectly formats function signatures, builds an automatic Table of Contents, and cross-links related modules. &lt;/p&gt;

&lt;p&gt;If you don't like my default template, you can easily fork the &lt;code&gt;templates/&lt;/code&gt; directory and build your own.&lt;/p&gt;




&lt;h2&gt;
  
  
  5. Security First: The "Zero-Trust" QA Audit
&lt;/h2&gt;

&lt;p&gt;Because I was releasing an AI tool that reads source code, I knew security and stability had to be paramount. I didn't just write some unit tests and call it a day. &lt;/p&gt;

&lt;p&gt;Before hitting &lt;code&gt;v0.1.0&lt;/code&gt;, the project underwent what I call a "Zero-Trust Forensic QA Audit". I assumed the initial proof-of-concept code was entirely broken and built a test suite from scratch.&lt;/p&gt;

&lt;p&gt;We utilized:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;pytest&lt;/code&gt; for comprehensive unit and integration testing.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;bandit&lt;/code&gt; for security scanning to ensure API keys are never leaked in logs and file I/O operations are secure.&lt;/li&gt;
&lt;li&gt;Extensive mocking of all LLM providers so the CLI could be tested deeply in CI/CD without burning API credits.&lt;/li&gt;
&lt;li&gt;Edge-case testing including handling of exotic Unicode identifiers (yes, &lt;code&gt;def grüne_äpfel()&lt;/code&gt; parses perfectly).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The repository is now fully integrated with Codecov, maintaining a strict baseline for any future pull requests.&lt;/p&gt;




&lt;h2&gt;
  
  
  How to Get Started
&lt;/h2&gt;

&lt;p&gt;If you are tired of your README files falling out of sync with your codebase, I highly encourage you to give AutoDocGen a spin.&lt;/p&gt;

&lt;p&gt;It's live now on PyPI.&lt;/p&gt;

&lt;p&gt;You can install it directly via pip:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;pypiautodocgen
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To run it against your current directory and output to &lt;code&gt;./docs&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;autodocgen &lt;span class="nt"&gt;-o&lt;/span&gt; ./docs &lt;span class="nt"&gt;--provider&lt;/span&gt; groq &lt;span class="c"&gt;# Or openai, gemini, openrouter&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  The Roadmap
&lt;/h3&gt;

&lt;p&gt;Currently, AutoDocGen creates fantastic Markdown files perfectly suited for static site generators like MkDocs or direct consumption on GitHub. &lt;/p&gt;

&lt;p&gt;Looking forward, I want to explore:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Framework-specific parsing&lt;/strong&gt;: Specialized templates for FastAPI endpoints or Django models.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Diff-based updating&lt;/strong&gt;: Only regenerating documentation for the specific functions that changed in a commit, rather than full-file regeneration.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Mermaid diagram generation&lt;/strong&gt;: Automatically creating architecture flowcharts based on AST imports.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Let's Connect!
&lt;/h2&gt;

&lt;p&gt;I built AutoDocGen to solve my own pain point, but I know the community has incredible ideas on how to push it further. &lt;/p&gt;

&lt;p&gt;Check out the source code on GitHub (and drop a star if you find it useful!):&lt;br&gt;
&lt;strong&gt;&lt;a href="https://github.com/shifulegend/autodocgen" rel="noopener noreferrer"&gt;https://github.com/shifulegend/autodocgen&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I would love to hear your feedback in the comments. Are you still writing documentation by hand? What has been your biggest frustration with existing auto-generated documentation tools? Let me know!&lt;/p&gt;

</description>
      <category>python</category>
      <category>ai</category>
      <category>devtools</category>
      <category>documentation</category>
    </item>
    <item>
      <title>Why I stopped writing manual test cases: This OpenClaw skill does it for me 🤖✨</title>
      <dc:creator>Shifu</dc:creator>
      <pubDate>Fri, 13 Mar 2026 17:00:11 +0000</pubDate>
      <link>https://forem.com/shifu_legend/why-i-stopped-writing-manual-test-cases-this-openclaw-skill-does-it-for-me-3ni2</link>
      <guid>https://forem.com/shifu_legend/why-i-stopped-writing-manual-test-cases-this-openclaw-skill-does-it-for-me-3ni2</guid>
      <description>&lt;p&gt;&lt;em&gt;Discover how the QA Architecture Auditor OpenClaw skill generates comprehensive testing strategies from scratch, freeing QA engineers from manual test case writing — and what it means for the future of QA roles.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1550751827-4bd374c3f58b%3Fauto%3Dformat%26fit%3Dcrop%26q%3D80%26w%3D1470" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1550751827-4bd374c3f58b%3Fauto%3Dformat%26fit%3Dcrop%26q%3D80%26w%3D1470" alt="Header image: a futuristic QA robot analyzing code" width="1470" height="981"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Introduction 🚀
&lt;/h2&gt;

&lt;p&gt;If you've ever been knee‑deep in a codebase, tasked with writing test cases for the first time, you know the drill: sift through modules, guess what needs testing, write repetitive boilerplate, and hope you didn't miss that one edge case that'll blow up in production. &lt;/p&gt;

&lt;p&gt;Quality Assurance is essential, but let's be honest: the manual labor of test case creation is a notorious bottleneck. &lt;strong&gt;It's slow, error-prone, and frankly, a bit soul-crushing.&lt;/strong&gt; &lt;/p&gt;

&lt;p&gt;What if an AI could read your code and instantly produce a comprehensive, independent testing strategy — complete with risk scores, security maps, and ready‑to‑run test examples? &lt;/p&gt;

&lt;p&gt;Enter the &lt;strong&gt;QA Architecture Auditor&lt;/strong&gt;, an OpenClaw skill that performs forensic analysis of any repository and spits out an exhaustive QA strategy report. This isn't just another code coverage tool; it's a full‑blown QA architect that operates under a zero‑trust policy, ignoring any existing tests and designing everything from scratch. 🧠&lt;/p&gt;

&lt;p&gt;The result? A multi‑methodology testing matrix that covers everything from black‑box to mutation testing, all tailored to your tech stack. &lt;/p&gt;

&lt;p&gt;In this article, we'll explore why traditional QA test writing is failing modern development, how this OpenClaw skill changes the game, and what it means for the future of QA roles. Spoiler: it doesn't make testers redundant — it makes them &lt;em&gt;strategists&lt;/em&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem with Manual QA Test Creation 😫
&lt;/h2&gt;

&lt;p&gt;Let's face reality: writing test cases is often a Sisyphean task. Here's why you probably hate it (and why your boss should care):&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Time‑consuming and repetitive&lt;/strong&gt; ⏳ – For every function you write, you need to craft happy paths, edge cases, error handling, and integration hooks. Multiply that across a growing codebase and you've got weeks of effort.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Inconsistent coverage&lt;/strong&gt; 📉 – Different QA engineers have different standards. One might miss boundary values, another might forget security scenarios. Maintaining uniform coverage across teams is nearly impossible.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Scalability nightmare&lt;/strong&gt; 📈 – As microservices proliferate, keeping test suites up to date becomes a full‑time job. Any sprint that adds features must also extend tests, leading to technical debt or shortcuts.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Blind spots&lt;/strong&gt; 🙈 – Humans naturally gravitate toward the familiar (unit tests) and neglect less obvious but critical areas: fuzzing, mutation testing, accessibility, localization, performance under load, and compatibility across browsers/OSes.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Bottleneck for releases&lt;/strong&gt; 🚧 – QA is often the gatekeeper. If test writing lags, releases slip. Companies either ship with insufficient tests or delay features.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Audit &amp;amp; compliance headaches&lt;/strong&gt; 📋 – Auditors demand evidence of structured testing, ITGC controls, and risk‑based test plans. Manually assembling this documentation is error‑prone and time‑intensive.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The ideal solution would be an &lt;strong&gt;independent, automated QA architect&lt;/strong&gt; that can examine any codebase and produce a prioritized, comprehensive testing blueprint — one that covers all methodologies, is tailored to the detected stack, and can be regenerated whenever the code evolves.&lt;/p&gt;

&lt;h2&gt;
  
  
  Meet the QA Architecture Auditor (The Skill that Saves Weeks) 🛠️
&lt;/h2&gt;

&lt;p&gt;The &lt;strong&gt;QA Architecture Auditor&lt;/strong&gt; is an OpenClaw skill that does exactly that. &lt;/p&gt;

&lt;p&gt;It's a Python‑based CLI tool (&lt;code&gt;qa-audit&lt;/code&gt;) that you can invoke directly or via slash command in OpenClaw. It performs deep static analysis and generates an HTML or Markdown report that serves as a complete QA strategy. ⚡&lt;/p&gt;

&lt;h3&gt;
  
  
  Core capabilities
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Forensic codebase analysis&lt;/strong&gt; 🔍 – Detects languages, frameworks, architecture pattern (monolith, microservices, serverless, etc.), dependencies, modules, cyclomatic complexity, and more.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Risk assessment&lt;/strong&gt; ⚠️ – Scores each module from 0‑100 based on complexity, external calls, authentication handling, data persistence, cryptography, file I/O, coupling, and public API surface. High‑risk modules surface for prioritized testing.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Security surface mapping&lt;/strong&gt; 🛡️ – Identifies modules that touch authentication, authorization, input validation, output encoding, session management, cryptography, file ops, network ops, and database ops.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Entry point discovery&lt;/strong&gt; 📍 – Finds &lt;code&gt;main&lt;/code&gt;, &lt;code&gt;app.py&lt;/code&gt;, &lt;code&gt;manage.py&lt;/code&gt;, &lt;code&gt;index.js&lt;/code&gt;, etc., to focus end‑to‑end and smoke tests.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Data flow mapping&lt;/strong&gt; 🔄 – Traces imports/dependencies to expose integration points.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ITGC controls&lt;/strong&gt; ✅ – Generates a tailored checklist of IT General Controls compliance items (change management, access control, testing requirements, security scanning, code signing, deployment gates, etc.) based on your tech stack.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Report generation&lt;/strong&gt; 📊 – Produces a beautifully formatted HTML or Markdown report crammed with actionable insights.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Zero‑trust policy&lt;/strong&gt; 🚫 – The skill &lt;em&gt;ignores&lt;/em&gt; any existing tests. It assumes you're starting from zero and designs everything accordingly. This is crucial for audits and for turning around neglected codebases.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All of this runs locally; your code never leaves your machine unless a remote URL is provided, in which case only a standard &lt;code&gt;git clone&lt;/code&gt; occurs.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Makes This OpenClaw Skill Unique? 💎
&lt;/h2&gt;

&lt;p&gt;The QA ecosystem is no stranger to static analysis tools (linters, complexity analyzers, OWASP ZAP, etc.). But the &lt;strong&gt;QA Architecture Auditor&lt;/strong&gt; fills a critical gap: &lt;strong&gt;a holistic, methodology‑agnostic testing strategy generator&lt;/strong&gt;. &lt;/p&gt;

&lt;h3&gt;
  
  
  20+ Testing Methodologies Covered
&lt;/h3&gt;

&lt;p&gt;The report includes dedicated sections for each major testing approach, complete with an independent baseline definition, risk assessment, strategy, and from‑scratch test examples:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Core execution&lt;/strong&gt;: Black Box, White Box, Manual, Automated&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Functional &amp;amp; structural&lt;/strong&gt;: Unit, Integration, System, Functional, Smoke, Sanity, E2E, Regression, API, Database Integrity&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Non‑functional&lt;/strong&gt;: Performance, Security, Usability, Compatibility, Accessibility, Localization&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Specialized&lt;/strong&gt;: Acceptance (UAT), Exploratory, Boundary Value Analysis, Monkey/Random Testing, Fuzz Testing, Mutation Testing, Non‑Functional General&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That's not just a list — each section contains &lt;em&gt;test cases&lt;/em&gt; written in the language of your stack (Python, JavaScript, Java, Go, etc.) showing exactly how to validate those dimensions. 🧑‍💻&lt;/p&gt;

&lt;h3&gt;
  
  
  Zero‑Trust Baseline
&lt;/h3&gt;

&lt;p&gt;Many tools pretend to “assess” a project by looking at its coverage reports. This skill deliberately &lt;em&gt;ignores&lt;/em&gt; existing tests. Its premise: trust nothing, start from first principles. That independence is gold for audits and for teams that suspect they're not covering enough.&lt;/p&gt;

&lt;h3&gt;
  
  
  Risk‑Based Prioritization
&lt;/h3&gt;

&lt;p&gt;The skill assigns a risk score to each module, combining complexity and security factors. The highest‑scoring modules get explicit attention in the risk assessment table, and the methodology recommendations are tailored accordingly. This tells you exactly where to focus your effort first. 🎯&lt;/p&gt;

&lt;h2&gt;
  
  
  How QA Engineers Transform, Not Disappear 👨‍🔬
&lt;/h2&gt;

&lt;p&gt;Will this skill make QA testers redundant? Not at all — it elevates them. &lt;/p&gt;

&lt;p&gt;The skill produces raw test strategies; it doesn't &lt;em&gt;execute&lt;/em&gt; tests or integrate with CI automatically (though that could be a next step). QA engineers become &lt;strong&gt;QA architects&lt;/strong&gt; who:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Review the generated strategy for business‑logic nuances.&lt;/li&gt;
&lt;li&gt;Refine risk scores based on domain knowledge.&lt;/li&gt;
&lt;li&gt;Implement the suggested test skeletons, filling in domain‑specific data and assertions.&lt;/li&gt;
&lt;li&gt;Integrate the tests into CI/CD pipelines.&lt;/li&gt;
&lt;li&gt;Triage and investigate failures discovered by the new tests. 🕵️&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The time saved from manual test authoring can be redirected toward higher‑value activities&lt;/strong&gt;: exploratory testing, usability studies, performance tuning, and security hardening. In other words, the boring part gets automated, and the creative, investigative work remains human‑centric.&lt;/p&gt;

&lt;h2&gt;
  
  
  A Real‑World Walkthrough 🚶‍♂️
&lt;/h2&gt;

&lt;p&gt;Let's see the skill in action on a tiny Flask API sample:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;qa-audit &lt;span class="nt"&gt;--repo&lt;/span&gt; ./flask-demo &lt;span class="nt"&gt;--output&lt;/span&gt; report.html &lt;span class="nt"&gt;--format&lt;/span&gt; html
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The generated &lt;code&gt;report.html&lt;/code&gt; opens to a clean UI. The &lt;strong&gt;Executive Summary&lt;/strong&gt; tells us we have 12 modules, 3 languages (Python, HTML, SQL), and highlights the login module as the highest risk (score 78). &lt;/p&gt;

&lt;p&gt;Scrolling to the &lt;strong&gt;Testing Methodology Matrix&lt;/strong&gt;, we find:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;API&lt;/strong&gt;: specific suggestions like "test all routes with method overrides, validate status codes, schemas, auth headers, error handling". &lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Security&lt;/strong&gt;: OWASP Top 10 validation checklist with code snippets for SQL injection, XSS, authentication bypass.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Performance&lt;/strong&gt;: load test script using &lt;code&gt;locust&lt;/code&gt; that simulates 1000 users hitting the login endpoint with a 2‑second SLA. 🏎️&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In short, you get a ready‑to‑implement test plan that would otherwise take weeks of manual effort.&lt;/p&gt;

&lt;h2&gt;
  
  
  Sample Report Excerpts 📝
&lt;/h2&gt;

&lt;p&gt;To give you a taste of what the report looks like, here's a trimmed excerpt from the &lt;strong&gt;Risk Assessment&lt;/strong&gt; table:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Severity&lt;/th&gt;
&lt;th&gt;Risk Type&lt;/th&gt;
&lt;th&gt;Module&lt;/th&gt;
&lt;th&gt;Risk Score&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;CRITICAL&lt;/td&gt;
&lt;td&gt;security&lt;/td&gt;
&lt;td&gt;auth/login.py&lt;/td&gt;
&lt;td&gt;85&lt;/td&gt;
&lt;td&gt;Authentication handling detected — requires rigorous security testing&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;HIGH&lt;/td&gt;
&lt;td&gt;code_complexity&lt;/td&gt;
&lt;td&gt;services/order.py&lt;/td&gt;
&lt;td&gt;72&lt;/td&gt;
&lt;td&gt;High complexity module with many branches — needs path coverage&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;MEDIUM&lt;/td&gt;
&lt;td&gt;dependency&lt;/td&gt;
&lt;td&gt;requirements.txt&lt;/td&gt;
&lt;td&gt;60&lt;/td&gt;
&lt;td&gt;Unpinned dependencies detected&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;And from the &lt;strong&gt;Testing Methodology Matrix&lt;/strong&gt;, the &lt;strong&gt;Fuzz Testing&lt;/strong&gt; section:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Independent Baseline:&lt;/strong&gt; Feed malformed, unexpected, or extreme data to the system to expose vulnerabilities like buffer overflows or injection flaws.&lt;br&gt;
&lt;strong&gt;Vulnerability &amp;amp; Risk Assessment:&lt;/strong&gt; Fuzz testing needed for any input parsing modules. Focus on 12 modules that handle user‑supplied data.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;These concrete examples show you how to jump straight into implementation without guessing.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Not Just Use SonarQube? 🤔
&lt;/h2&gt;

&lt;p&gt;You might wonder: "Can't we already do this with SonarQube or OWASP ZAP?" Those tools address specific facets — static analysis, dependency checks, dynamic scanning. &lt;/p&gt;

&lt;p&gt;They don't produce a &lt;em&gt;holistic testing strategy&lt;/em&gt; that spans unit, integration, security, performance, accessibility, compliance, and the more exotic methodologies like mutation and fuzz testing. Nor do they provide the &lt;em&gt;from‑scratch test cases&lt;/em&gt; ready for adaptation. &lt;/p&gt;

&lt;p&gt;The QA Architecture Auditor is the &lt;strong&gt;missing link&lt;/strong&gt; between static analysis and actual test implementation.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to Get Started 🏁
&lt;/h2&gt;

&lt;p&gt;Ready to try it out? Here's how to install and run the skill:&lt;/p&gt;

&lt;h3&gt;
  
  
  Installation from ClawHub 🚀
&lt;/h3&gt;

&lt;p&gt;If you are already using OpenClaw, the easiest way to get started is via ClawHub:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;clawhub &lt;span class="nb"&gt;install &lt;/span&gt;qa-architecture-auditor
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Manual install from GitHub
&lt;/h3&gt;

&lt;p&gt;If you prefer the old-school way or want to hack on the source code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/shifulegend/qa-architecture-auditor.git &lt;span class="se"&gt;\&lt;/span&gt;
  ~/.openclaw/workspace/skills/qa-architecture-auditor
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Running the skill
&lt;/h3&gt;

&lt;p&gt;Use the slash command in your OpenClaw chat or call the CLI directly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;/qa-audit &lt;span class="nt"&gt;--repo&lt;/span&gt; /path/to/your/project &lt;span class="nt"&gt;--format&lt;/span&gt; html &lt;span class="nt"&gt;--output&lt;/span&gt; qa-report.html
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  The Bigger Picture: AI‑Driven QA Strategies 🌐
&lt;/h2&gt;

&lt;p&gt;The QA Architecture Auditor is more than a one‑off tool; it's a glimpse into the future of AI‑augmented software engineering. Imagine:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Continuous auditing&lt;/strong&gt;: The skill runs on every push, updating the risk assessment.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CI/CD integration&lt;/strong&gt;: Auto‑generate test stubs for new code. 🔄&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Compliance as code&lt;/strong&gt;: The ITGC controls become part of your compliance documentation.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All of these are natural extensions that the open‑source community could build. The skill is published under the MIT license and welcomes contributions.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion 🏁
&lt;/h2&gt;

&lt;p&gt;Manual test case writing doesn't have to remain the bottleneck. The &lt;strong&gt;QA Architecture Auditor&lt;/strong&gt; OpenClaw skill offers a practical, immediate way to generate a comprehensive, independent QA strategy from a single command. &lt;/p&gt;

&lt;p&gt;For QA engineers, it's not replacement — it's an elevation to QA architect. For teams, it's a shortcut to robust, audit‑ready testing.&lt;/p&gt;

&lt;p&gt;Give it a try on your next codebase. You might just find that your QA workload becomes not only manageable but also more strategic and impactful. 💖&lt;/p&gt;

&lt;h2&gt;
  
  
  Call to Action 📢
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Install&lt;/strong&gt; the skill from &lt;a href="https://clawhub.ai/skills/qa-architecture-auditor" rel="noopener noreferrer"&gt;ClawHub&lt;/a&gt; today.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Run&lt;/strong&gt; it on a project you care about and explore the report.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Contribute&lt;/strong&gt;: Have an idea for a new methodology? Open an issue or PR on the GitHub repo.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Share&lt;/strong&gt;: Forward this article to your QA team and let them try it out!&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Let's make testing smarter, faster, and more comprehensive — together.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Published on DEV.to • 10 min read&lt;/em&gt;&lt;/p&gt;

</description>
      <category>testing</category>
      <category>ai</category>
      <category>automation</category>
      <category>openclaw</category>
    </item>
    <item>
      <title>The OpenClaw Heartbeat Trap: How a Simple Health Check Cost Me 300+ LLM Calls Per Day</title>
      <dc:creator>Shifu</dc:creator>
      <pubDate>Sun, 01 Mar 2026 15:51:21 +0000</pubDate>
      <link>https://forem.com/shifu_legend/the-autonomous-agent-trap-how-my-ai-burned-300-llm-calls-a-day-checking-its-own-pulse-336</link>
      <guid>https://forem.com/shifu_legend/the-autonomous-agent-trap-how-my-ai-burned-300-llm-calls-a-day-checking-its-own-pulse-336</guid>
      <description>&lt;p&gt;🧵 I thought my AI agent was just casually checking system health. Instead, it was running a full-blown medical drama every 55 minutes—and racking up massive token usage behind my back. 🎬&lt;/p&gt;

&lt;h2&gt;
  
  
  💸 The Fear of the Runaway API Bill
&lt;/h2&gt;

&lt;p&gt;If you're building autonomous AI agents with frameworks like &lt;strong&gt;OpenClaw&lt;/strong&gt;, &lt;strong&gt;LangChain&lt;/strong&gt;, or &lt;strong&gt;AutoGPT&lt;/strong&gt;, you already know the existential dread of waking up to a massive API billing alert.&lt;/p&gt;

&lt;p&gt;When we give an LLM the ability to autonomously call tools in a loop to "achieve a goal," we hand over the keys to our wallets.&lt;/p&gt;

&lt;p&gt;This week, my AI assistant—running on OpenClaw using Google's Gemini models—started throwing &lt;code&gt;429 RESOURCE_EXHAUSTED&lt;/code&gt; errors. At first, I was just annoyed by the rate limits. But when I looked at the dashboard, my annoyance turned to panic.&lt;/p&gt;

&lt;p&gt;The daily quota of 1,500 requests was seemingly exhausted.&lt;/p&gt;

&lt;p&gt;The terrifying part? &lt;strong&gt;I hadn't even talked to the agent all day.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The only automated task running? A "simple" system health heartbeat set to trigger every 55 minutes. That's just ~26 pings a day. Where were all these hundreds of requests coming from? I needed to know exactly where those tokens were flying off to.&lt;/p&gt;

&lt;h2&gt;
  
  
  🕵️ The Investigation: Digging Through the JSON Logs
&lt;/h2&gt;

&lt;p&gt;My first assumption was a configuration error—maybe the heartbeat frequency was accidentally set to 5 minutes instead of 55? I checked my &lt;code&gt;openclaw.json&lt;/code&gt; config file. Nope, strictly set to &lt;code&gt;"every": "55m"&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;So, I brought out the heavy machinery: &lt;strong&gt;the raw agent logs.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I downloaded the 5MB &lt;code&gt;openclaw.log&lt;/code&gt; file from the server. OpenClaw logs everything in structured JSON, which is great for machines but terrible for human eyes. Staring at raw JSON wasn't going to cut it, so I wrote two custom Node.js parser scripts (&lt;code&gt;extract_events.js&lt;/code&gt; and &lt;code&gt;trace_sessions.js&lt;/code&gt;) to reconstruct the crime scene.&lt;/p&gt;

&lt;p&gt;Here is what the scripts did:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Regex-matched every &lt;code&gt;embedded run start&lt;/code&gt; and &lt;code&gt;embedded run done&lt;/code&gt; to capture the LLM execution times.&lt;/li&gt;
&lt;li&gt;Grouped every event by &lt;code&gt;sessionId&lt;/code&gt; to track long-running conversations.&lt;/li&gt;
&lt;li&gt;Extracted every single &lt;code&gt;tool&lt;/code&gt; invocation (&lt;code&gt;exec&lt;/code&gt;, &lt;code&gt;read_file&lt;/code&gt;, &lt;code&gt;web_search&lt;/code&gt;) attached to those runs.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When the scripts spit out the final timeline, my jaw dropped. 😲&lt;/p&gt;

&lt;p&gt;What I found was a textbook case of &lt;strong&gt;uncontrollable LLM tool looping&lt;/strong&gt;—the silent killer of API budgets. 🌪️&lt;/p&gt;

&lt;h2&gt;
  
  
  🔪 The Smoking Gun: The System Health Definition
&lt;/h2&gt;

&lt;p&gt;My agent is designed to run autonomously. Every 55 minutes, a cron job wakes it up and tells it to read a file called &lt;code&gt;HEARTBEAT.md&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Here was the fateful instruction inside that file:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"System Health Check: Monitor for stalled interactive processes and kill them. Check memory usage (&lt;code&gt;free -h&lt;/code&gt;)."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;To a human sysadmin, this is a 10-second task. You run &lt;code&gt;ps aux&lt;/code&gt;, maybe &lt;code&gt;free -h&lt;/code&gt;, and you're done.&lt;/p&gt;

&lt;p&gt;But to a deterministic, stateless LLM agent using a tool-chain architecture? &lt;strong&gt;It's a multi-round forensic team.&lt;/strong&gt; 🕵️&lt;/p&gt;

&lt;p&gt;Here is the exact timeline of a &lt;strong&gt;single&lt;/strong&gt; 55-minute heartbeat check my script extracted:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Time&lt;/th&gt;
&lt;th&gt;Action&lt;/th&gt;
&lt;th&gt;What the LLM was doing&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;07:51:55&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;🛠️ Tool: &lt;code&gt;exec&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Ran &lt;code&gt;ps aux&lt;/code&gt; to list all processes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;07:52:15&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;🛠️ Tool: &lt;code&gt;exec&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Ran &lt;code&gt;grep&lt;/code&gt; to filter the list&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;07:56:49&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;🛠️ Tool: &lt;code&gt;exec&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Checked a specific process&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;07:56:54&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;🛠️ Tool: &lt;code&gt;exec&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Checked memory with &lt;code&gt;free -h&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;07:57:02&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;🌐 Tool: &lt;code&gt;web_search&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;&lt;em&gt;Looked something up on the internet!?&lt;/em&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;07:57:24&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;🛠️ Tool: &lt;code&gt;exec&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Checked disk space (&lt;code&gt;df -h&lt;/code&gt;)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;07:58:10&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;🛠️ Tool: &lt;code&gt;exec&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Final cleanup/verification&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;07:58:12&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;✅ &lt;strong&gt;Done&lt;/strong&gt;
&lt;/td&gt;
&lt;td&gt;Summarized findings&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Total duration:&lt;/strong&gt; 6.2 minutes.&lt;br&gt;
&lt;strong&gt;Total tool calls:&lt;/strong&gt; 12.&lt;/p&gt;

&lt;h2&gt;
  
  
  ❄️ The Context Snowball Effect (How the tokens multiply)
&lt;/h2&gt;

&lt;p&gt;Here is the critical architectural quirk I had overlooked (and why so many AutoGPT users end up with massive API bills): &lt;strong&gt;In an LLM tool-calling loop, every single tool execution is a brand new API request.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;When the agent ran &lt;code&gt;ps aux&lt;/code&gt;, it fetched the result. To decide what to do next, it had to send the &lt;em&gt;entire conversation history&lt;/em&gt; (including the massive &lt;code&gt;ps aux&lt;/code&gt; output) back to the LLM. Then it decided to run &lt;code&gt;free -h&lt;/code&gt;. It executed it, got the result, and sent the history back &lt;em&gt;again&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;With each step, the context ballooned.&lt;/p&gt;

&lt;p&gt;Instead of 26 lightweight pings a day, my "simple" health check was generating &lt;strong&gt;300+ massive LLM round-trips daily&lt;/strong&gt;, each with a larger context window than the last. 🏔️&lt;/p&gt;

&lt;p&gt;My agent was silently burning through hundreds of thousands of tokens every single day just to check if the server was okay.&lt;/p&gt;

&lt;h3&gt;
  
  
  ⛈️ The Retry Storm
&lt;/h3&gt;

&lt;p&gt;This aggressive tool usage is also what caused the rate limits. When the agent hit its 12-tool streak in 6 minutes, it bumped into Google's per-minute quota (~15 requests/min).&lt;/p&gt;

&lt;p&gt;When the API returned a &lt;code&gt;429 Rate Limit&lt;/code&gt; error, OpenClaw (as designed) initiated an exponential backoff retry. But during those retry windows, &lt;em&gt;other&lt;/em&gt; scheduled checks queued up.&lt;/p&gt;

&lt;p&gt;At exactly &lt;code&gt;11:15 UTC&lt;/code&gt;, the dam broke. The logs showed &lt;strong&gt;12 API requests firing in 40 seconds&lt;/strong&gt; as the system panic-retried a backlog of tool calls.&lt;/p&gt;

&lt;p&gt;I wasn't being rate-limited because of daily usage. I was being rate-limited because my agent was behaving like an over-caffeinated sysadmin slamming the terminal with 12 commands a minute. ☕💥&lt;/p&gt;

&lt;h2&gt;
  
  
  🛠️ The Fix: Taking the Keys Away
&lt;/h2&gt;

&lt;p&gt;When building autonomous agents, it's tempting to give the LLM control over everything. &lt;em&gt;Why write a bash script when the AI can just figure it out dynamically?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;This incident is exactly why. Some tasks don't need "reasoning." They just need execution.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Solution:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;I opened &lt;code&gt;HEARTBEAT.md&lt;/code&gt; and completely deleted the actionable instructions. I left it as a comment-only file so the LLM wakes up, sees nothing to do, and goes immediately back to sleep (1 API call instead of 12).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;I moved the actual system monitoring to a dumb, reliable &lt;code&gt;cron&lt;/code&gt; bash script:&lt;br&gt;
&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;#!/bin/bash&lt;/span&gt;
&lt;span class="nv"&gt;AVAILABLE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;free &lt;span class="nt"&gt;-m&lt;/span&gt; | &lt;span class="nb"&gt;awk&lt;/span&gt; &lt;span class="s1"&gt;'/Mem:/ {print $7}'&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$AVAILABLE&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="nt"&gt;-lt&lt;/span&gt; 200 &lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then
  &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"[&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;date&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="s2"&gt;] LOW MEMORY: &lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;AVAILABLE&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;MB"&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; /tmp/health_alerts.log
&lt;span class="k"&gt;fi&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now, a traditional cron job runs every 55 minutes, takes 0.1 seconds, costs 0 API tokens, and logs any issues to a file. The LLM only gets involved if a human explicitly asks to read that file.&lt;/p&gt;

&lt;h2&gt;
  
  
  🧠 The Takeaway for Agent Builders
&lt;/h2&gt;

&lt;p&gt;If you are building LLM agents with access to real tools (&lt;code&gt;exec&lt;/code&gt;, &lt;code&gt;browser&lt;/code&gt;, &lt;code&gt;search&lt;/code&gt;), remember:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Every tool call is a full LLM round-trip.&lt;/strong&gt; A 5-step thought process is 5 API calls. Set hard caps (&lt;code&gt;max_iterations&lt;/code&gt;) on your agent loops to prevent them from digging a bottomless pit in your wallet.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Never give an LLM a monitoring job a config or bash script can do.&lt;/strong&gt; Reserve the expensive AI reasoning for when things actually break and need diagnosing, not for the routine patrol.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Log your tool chains.&lt;/strong&gt; If I hadn't built custom JS scripts to trace the session IDs and see exactly &lt;em&gt;which&lt;/em&gt; tools were being called in sequence, I would have had no idea my agent was hallucinating 12-step system audits in the background.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  📚 Diagnostic Playbook: Fixing "Unknown Model" and Configured,Missing
&lt;/h2&gt;

&lt;p&gt;If you're hitting &lt;code&gt;configured,missing&lt;/code&gt; or &lt;code&gt;Unknown model&lt;/code&gt; in OpenClaw, here's the exact playbook I used:&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 1: Check if you have an agent-level &lt;code&gt;models.json&lt;/code&gt;
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;ls&lt;/span&gt; &lt;span class="nt"&gt;-la&lt;/span&gt; ~/.openclaw/agents/main/agent/models.json 2&amp;gt;&amp;amp;1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If this file &lt;strong&gt;exists&lt;/strong&gt; and you're only using standard providers (OpenRouter, Google, Anthropic, OpenAI), this file is probably unnecessary and might be shadowing the built-in registry.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2: Check what's in it
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cat&lt;/span&gt; ~/.openclaw/agents/main/agent/models.json | python3 &lt;span class="nt"&gt;-m&lt;/span&gt; json.tool | &lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="nt"&gt;-E&lt;/span&gt; &lt;span class="s1"&gt;'"id"'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you see a provider name that matches a built-in provider (&lt;code&gt;openrouter&lt;/code&gt;, &lt;code&gt;google&lt;/code&gt;, &lt;code&gt;anthropic&lt;/code&gt;, etc.), that block is &lt;strong&gt;overriding&lt;/strong&gt; the built-in model catalog. Only models explicitly listed will be recognized.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 3: Try disabling it (with backup)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Backup first!&lt;/span&gt;
&lt;span class="nb"&gt;cp&lt;/span&gt; ~/.openclaw/agents/main/agent/models.json &lt;span class="se"&gt;\\&lt;/span&gt;
   ~/.openclaw/agents/main/agent/models.json.bak.&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;date&lt;/span&gt; +%Y%m%d-%H%M%S&lt;span class="si"&gt;)&lt;/span&gt;

&lt;span class="c"&gt;# Rename to disable&lt;/span&gt;
&lt;span class="nb"&gt;mv&lt;/span&gt; ~/.openclaw/agents/main/agent/models.json &lt;span class="se"&gt;\\&lt;/span&gt;
   ~/.openclaw/agents/main/agent/models.json.disabled

&lt;span class="c"&gt;# Restart&lt;/span&gt;
systemctl &lt;span class="nt"&gt;--user&lt;/span&gt; restart openclaw-gateway

&lt;span class="c"&gt;# Check&lt;/span&gt;
openclaw models list
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If all models now show &lt;code&gt;configured&lt;/code&gt; — &lt;strong&gt;the file was the problem.&lt;/strong&gt; Delete it permanently (or keep the &lt;code&gt;.disabled&lt;/code&gt; backup just in case).&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 4: If you DO need custom providers
&lt;/h3&gt;

&lt;p&gt;If you have truly custom providers (not built-in), such as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Nvidia API&lt;/strong&gt; (&lt;code&gt;integrate.api.nvidia.com&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Custom self-hosted endpoints&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Non-standard API providers&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Then you need &lt;code&gt;models.json&lt;/code&gt;, but be very careful:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Don't use provider names that match built-in providers&lt;/strong&gt; (e.g., use &lt;code&gt;openrouter-custom&lt;/code&gt; instead of &lt;code&gt;openrouter&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Only define the custom providers, let the built-in registry handle the standard ones&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Quick diagnostic cheat sheet
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Symptom&lt;/th&gt;
&lt;th&gt;Likely cause&lt;/th&gt;
&lt;th&gt;Fix&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;configured,missing&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Custom &lt;code&gt;models.json&lt;/code&gt; is shadowing built-in registry&lt;/td&gt;
&lt;td&gt;Rename/remove &lt;code&gt;models.json&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;Unknown model&lt;/code&gt; in logs&lt;/td&gt;
&lt;td&gt;Same as above&lt;/td&gt;
&lt;td&gt;Same as above&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;401 Unauthorized&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;API key missing from &lt;code&gt;.env&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Check &lt;code&gt;.env&lt;/code&gt; (and never use &lt;code&gt;&amp;gt;&lt;/code&gt;!)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Model works via &lt;code&gt;curl&lt;/code&gt; but not OpenClaw&lt;/td&gt;
&lt;td&gt;Provider block in &lt;code&gt;models.json&lt;/code&gt; doesn't list the model&lt;/td&gt;
&lt;td&gt;Remove the shadowing provider block&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;models scan&lt;/code&gt; doesn't find a model&lt;/td&gt;
&lt;td&gt;Model doesn't support tool-calling&lt;/td&gt;
&lt;td&gt;Add manually via &lt;code&gt;openclaw models set&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  🎯 The Takeaway
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Infrastructure debugging is archaeology.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;You're not fixing bugs — you're reconstructing what a system looked like at a moment when it worked, and comparing it to the moment it stopped.&lt;/p&gt;

&lt;p&gt;The difference is usually:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;✏️ &lt;strong&gt;One character&lt;/strong&gt; (&lt;code&gt;&amp;gt;&lt;/code&gt; vs &lt;code&gt;&amp;gt;&amp;gt;&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;📄 &lt;strong&gt;One file&lt;/strong&gt; that's shadowing a built-in registry&lt;/li&gt;
&lt;li&gt;🤖 &lt;strong&gt;One good-faith change&lt;/strong&gt; by an AI agent that had unintended side effects&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And the real fix isn't always adding what's missing — sometimes it's &lt;strong&gt;removing what shouldn't be there.&lt;/strong&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;If you've ever stared at &lt;code&gt;configured,missing&lt;/code&gt; and felt your sanity slipping — now you know exactly where to look.&lt;/em&gt; 🦞&lt;/p&gt;




&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;p&gt;My OpenClaw agent's 55-minute heartbeat check was running 12 shell commands per cycle, costing 300+ daily LLM calls. The root cause? A &lt;code&gt;models.json&lt;/code&gt; file with an &lt;code&gt;openrouter&lt;/code&gt; provider block that shadowed the built-in catalog. The fix: remove the unnecessary file, use cron + bash for monitoring, and let the built-in registry handle standard providers.&lt;/p&gt;




&lt;p&gt;Has your AI agent ever surprised you with a massive API bill? Share your horror stories below! 👇&lt;/p&gt;

</description>
      <category>openclaw</category>
      <category>ai</category>
      <category>debugging</category>
      <category>llm</category>
    </item>
    <item>
      <title>OpenClaw's "Unknown Model" Error — How One Missing JSON Entry Broke My AI Assistant for 4 Hours</title>
      <dc:creator>Shifu</dc:creator>
      <pubDate>Sun, 01 Mar 2026 08:19:09 +0000</pubDate>
      <link>https://forem.com/shifu_legend/openclaws-unknown-model-error-how-one-missing-json-entry-broke-my-ai-assistant-for-4-hours-5f19</link>
      <guid>https://forem.com/shifu_legend/openclaws-unknown-model-error-how-one-missing-json-entry-broke-my-ai-assistant-for-4-hours-5f19</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;🧵 &lt;em&gt;I chased a phantom through two config files, three API keys, and 47 SSH sessions. The initial "fix" was one line of JSON. The real fix? Deleting the file entirely.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  🤖 What's OpenClaw?
&lt;/h2&gt;

&lt;p&gt;Before I dive in — if you haven't heard of &lt;a href="https://docs.openclaw.ai/start/getting-started" rel="noopener noreferrer"&gt;OpenClaw&lt;/a&gt;, it's an &lt;strong&gt;open-source AI agent framework&lt;/strong&gt; that lets you run persistent AI assistants on your own server. Think of it as your self-hosted ChatGPT, but with memory, personality, tools, scheduled tasks, and multi-channel support (Telegram, Discord, WhatsApp, TUI, etc.).&lt;/p&gt;

&lt;p&gt;You configure which LLM models power your agents — GPT-4, Gemini, Claude, or any model via &lt;strong&gt;OpenRouter&lt;/strong&gt; — and OpenClaw handles the orchestration: routing messages, managing sessions, executing tools, and maintaining long-term memory across conversations.&lt;/p&gt;

&lt;p&gt;I run my personal AI assistant (&lt;strong&gt;Elara&lt;/strong&gt;) on an &lt;strong&gt;AWS EC2 instance&lt;/strong&gt; using OpenClaw. The model I'd been using for weeks: &lt;strong&gt;&lt;code&gt;stepfun/step-3.5-flash:free\&lt;/code&gt;&lt;/strong&gt; via OpenRouter — a solid, free, 250K-context model that worked beautifully.&lt;/p&gt;

&lt;p&gt;Until one Saturday morning, when it just… stopped.&lt;/p&gt;




&lt;h2&gt;
  
  
  🔇 The Silence
&lt;/h2&gt;

&lt;p&gt;I opened my &lt;strong&gt;OpenClaw TUI&lt;/strong&gt; (the terminal-based chat interface) and typed &lt;code&gt;Hello\&lt;/code&gt;:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;\&lt;/code&gt;`&lt;br&gt;
🦞 OpenClaw 2026.2.2-3 — Think different. Actually think.&lt;/p&gt;

&lt;p&gt;openclaw tui - ws://127.0.0.1:18789 - agent main - session main&lt;br&gt;
 connecting | idle&lt;br&gt;
`&lt;code&gt;\&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;The spinner appeared — &lt;code&gt;⠴ kerfuffling…\&lt;/code&gt; — and just kept going. And going. And going.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;No error.&lt;/strong&gt; No timeout message. &lt;strong&gt;No response.&lt;/strong&gt; Just an infinite spinner and silence.&lt;/p&gt;


&lt;h2&gt;
  
  
  🕵️ Act I: The Obvious Suspects
&lt;/h2&gt;
&lt;h3&gt;
  
  
  Checking the gateway logs
&lt;/h3&gt;

&lt;p&gt;First instinct: check the logs. OpenClaw writes daily log files to &lt;code&gt;/tmp/openclaw/\&lt;/code&gt;:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;\&lt;/code&gt;&lt;code&gt;bash&lt;br&gt;
cat /tmp/openclaw/openclaw-2026-03-01.log | grep -i "error" | tail -5&lt;br&gt;
\&lt;/code&gt;&lt;code&gt;\&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;And there it was:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;\&lt;/code&gt;&lt;code&gt;json&lt;br&gt;
{&lt;br&gt;
  "error": "Error: Unknown model: openrouter/stepfun/step-3.5-flash:free",&lt;br&gt;
  "lane": "main",&lt;br&gt;
  "durationMs": 55&lt;br&gt;
}&lt;br&gt;
\&lt;/code&gt;&lt;code&gt;\&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"Unknown model."&lt;/strong&gt; But… that model was in my config. I'd been using it for weeks. How could OpenClaw suddenly not recognize it?&lt;/p&gt;
&lt;h3&gt;
  
  
  The mysterious &lt;code&gt;configured,missing\&lt;/code&gt; status
&lt;/h3&gt;

&lt;p&gt;OpenClaw has a CLI command to list all configured models:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;\&lt;/code&gt;&lt;code&gt;bash&lt;br&gt;
$ openclaw models list&lt;br&gt;
\&lt;/code&gt;&lt;code&gt;\&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;\&lt;/code&gt;&lt;code&gt;&lt;br&gt;
Model                                      Input   Context  Auth  Tags&lt;br&gt;
openrouter/stepfun/step-3.5-flash:free     text    250k     yes   configured,missing&lt;br&gt;
google/gemini-2.0-flash                    text    1000k    yes   configured&lt;br&gt;
google/gemini-3-flash-preview              text    1024k    yes   configured&lt;br&gt;
\&lt;/code&gt;&lt;code&gt;\&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;There it is: &lt;strong&gt;&lt;code&gt;configured,missing\&lt;/code&gt;&lt;/strong&gt;. 🤨&lt;/p&gt;

&lt;p&gt;I'd never seen this status before. In OpenClaw:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;configured\&lt;/code&gt;&lt;/strong&gt; = the model is listed in your config and the runtime can resolve it ✅&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;configured,missing\&lt;/code&gt;&lt;/strong&gt; = the model is listed in your config, but the runtime &lt;strong&gt;can't resolve it to a working provider endpoint&lt;/strong&gt; ❌&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The model exists on paper but is invisible at runtime. Like a ghost in the machine.&lt;/p&gt;
&lt;h3&gt;
  
  
  Trying the obvious fixes
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;\&lt;/code&gt;`bash&lt;/p&gt;
&lt;h1&gt;
  
  
  Re-register the model via CLI
&lt;/h1&gt;

&lt;p&gt;$ openclaw models set openrouter/stepfun/step-3.5-flash:free&lt;br&gt;
Updated successfully ✅&lt;/p&gt;
&lt;h1&gt;
  
  
  Restart the gateway
&lt;/h1&gt;

&lt;p&gt;$ systemctl --user restart openclaw-gateway&lt;/p&gt;
&lt;h1&gt;
  
  
  Check again...
&lt;/h1&gt;

&lt;p&gt;$ openclaw models list | grep stepfun&lt;br&gt;
openrouter/stepfun/step-3.5-flash:free     text    250k     yes    configured,missing&lt;br&gt;
`&lt;code&gt;\&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Still &lt;code&gt;configured,missing\&lt;/code&gt;.&lt;/strong&gt; 😤 The &lt;code&gt;models set\&lt;/code&gt; command updated the global config, but the runtime still couldn't find the model. Something deeper was wrong.&lt;/p&gt;
&lt;h3&gt;
  
  
  Trying a model scan
&lt;/h3&gt;

&lt;p&gt;OpenClaw can scan your providers for available models:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;\&lt;/code&gt;&lt;code&gt;bash&lt;br&gt;
$ openclaw models scan --yes&lt;br&gt;
\&lt;/code&gt;&lt;code&gt;\&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;It found Google models, Llama models, and others — but &lt;strong&gt;not stepfun&lt;/strong&gt;. The scan only picks up models that advertise tool-calling support, and &lt;code&gt;step-3.5-flash:free\&lt;/code&gt; doesn't. Dead end.&lt;/p&gt;


&lt;h2&gt;
  
  
  💀 Act II: The &lt;code&gt;&amp;gt;\&lt;/code&gt; That Ate My API Key
&lt;/h2&gt;

&lt;p&gt;While investigating, I discovered something &lt;strong&gt;horrifying&lt;/strong&gt;. Earlier that day, while configuring a new Google API key, a command had been run:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;\&lt;/code&gt;&lt;code&gt;bash&lt;br&gt;
echo "GOOGLE_API_KEY=AIzaSy..." &amp;gt; ~/.openclaw/.env&lt;br&gt;
\&lt;/code&gt;&lt;code&gt;\&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;See that &lt;code&gt;&amp;gt;\&lt;/code&gt;? That's &lt;strong&gt;not&lt;/strong&gt; &lt;code&gt;&amp;gt;&amp;gt;\&lt;/code&gt;.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;⚠️ &lt;strong&gt;That single character — &lt;code&gt;&amp;gt;\&lt;/code&gt; instead of &lt;code&gt;&amp;gt;&amp;gt;\&lt;/code&gt; — overwrote the entire &lt;code&gt;.env\&lt;/code&gt; file&lt;/strong&gt;, silently destroying the &lt;code&gt;OPENROUTER_API_KEY\&lt;/code&gt; that had been there for a month.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;No error. No warning. &lt;strong&gt;Just gone.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I found the original key buried deep in &lt;code&gt;.bash_history\&lt;/code&gt; and restored it:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;\&lt;/code&gt;`bash&lt;/p&gt;
&lt;h1&gt;
  
  
  Found the original onboarding command in history
&lt;/h1&gt;

&lt;p&gt;$ history | grep openrouter&lt;br&gt;
openclaw onboard --auth-choice apiKey --token-provider openrouter --token "sk-or-v1-..."&lt;/p&gt;
&lt;h1&gt;
  
  
  Restored it (with &amp;gt;&amp;gt; this time!)
&lt;/h1&gt;

&lt;p&gt;$ echo 'OPENROUTER_API_KEY=sk-or-v1-...' &amp;gt;&amp;gt; ~/.openclaw/.env&lt;br&gt;
`&lt;code&gt;\&lt;/code&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Direct API test
&lt;/h3&gt;

&lt;p&gt;To verify the key was valid, I bypassed OpenClaw entirely:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;\&lt;/code&gt;&lt;code&gt;bash&lt;br&gt;
$ curl -s https://openrouter.ai/api/v1/chat/completions \\&lt;br&gt;
  -H "Authorization: Bearer sk-or-v1-..." \\&lt;br&gt;
  -H "Content-Type: application/json" \\&lt;br&gt;
  -d '{&lt;br&gt;
    "model": "stepfun/step-3.5-flash:free",&lt;br&gt;
    "messages": [{"role": "user", "content": "Say hello"}]&lt;br&gt;
  }'&lt;br&gt;
\&lt;/code&gt;&lt;code&gt;\&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;\&lt;/code&gt;&lt;code&gt;json&lt;br&gt;
{&lt;br&gt;
  "choices": [{&lt;br&gt;
    "message": { "content": "Hello! How can I help you today?" }&lt;br&gt;
  }]&lt;br&gt;
}&lt;br&gt;
\&lt;/code&gt;&lt;code&gt;\&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The API worked perfectly.&lt;/strong&gt; 🎉 Key valid. OpenRouter up. Model alive and responding.&lt;/p&gt;

&lt;p&gt;But OpenClaw &lt;em&gt;still&lt;/em&gt; said &lt;strong&gt;"Unknown model."&lt;/strong&gt; 💀&lt;/p&gt;

&lt;p&gt;The API worked. The config had the model. The key was valid. But OpenClaw couldn't see it. This is the moment I realized the problem was deeper than a missing key or a typo.&lt;/p&gt;


&lt;h2&gt;
  
  
  🔬 Act III: The Two-Layer Architecture
&lt;/h2&gt;

&lt;p&gt;I went full forensics. I downloaded &lt;strong&gt;everything&lt;/strong&gt; from the server:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;📄 &lt;strong&gt;28 backup config files&lt;/strong&gt; spanning a month&lt;/li&gt;
&lt;li&gt;📊 &lt;strong&gt;12MB of gateway logs&lt;/strong&gt; (4 days)&lt;/li&gt;
&lt;li&gt;🧠 &lt;strong&gt;Memory files, soul files, identity files&lt;/strong&gt; — the AI assistant's persistent state&lt;/li&gt;
&lt;li&gt;📝 &lt;strong&gt;Configuration change reports&lt;/strong&gt; — auto-generated docs from previous changes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And after two hours of diffing JSON files, I found the problem.&lt;/p&gt;
&lt;h3&gt;
  
  
  OpenClaw resolves models through TWO config layers
&lt;/h3&gt;

&lt;p&gt;Most documentation focuses on the global config file. But OpenClaw actually has &lt;strong&gt;two layers&lt;/strong&gt; of model configuration:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;File&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Layer 1&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;~/.openclaw/openclaw.json\&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;Global config&lt;/strong&gt; — model names, aliases, fallbacks, per-agent assignments&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Layer 2&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;~/.openclaw/agents/&amp;lt;id&amp;gt;/agent/models.json\&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;Provider definitions&lt;/strong&gt; — maps provider names → base URLs, API keys, explicit model schemas&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;The critical behavior:&lt;/strong&gt; When Layer 2 defines a provider (like &lt;code&gt;openrouter\&lt;/code&gt;), its model definitions &lt;strong&gt;shadow&lt;/strong&gt; (override) the built-in registry for that provider. Only models explicitly listed in that provider's &lt;code&gt;models[]\&lt;/code&gt; array will be recognized.&lt;/p&gt;

&lt;p&gt;My stepfun model was in &lt;strong&gt;Layer 1&lt;/strong&gt; ✅ but not in &lt;strong&gt;Layer 2&lt;/strong&gt; ❌.&lt;/p&gt;


&lt;h2&gt;
  
  
  🕰️ Act IV: Where Did This File Come From?
&lt;/h2&gt;

&lt;p&gt;Here's the part that makes this story truly interesting. I diffed the backup files to reconstruct exactly how &lt;code&gt;models.json\&lt;/code&gt; evolved:&lt;/p&gt;
&lt;h3&gt;
  
  
  Stage 1: The innocent beginning (early February)
&lt;/h3&gt;

&lt;p&gt;My AI assistant (&lt;strong&gt;Elara&lt;/strong&gt;) needed to connect to a custom model (&lt;code&gt;dolphin-mistral\&lt;/code&gt; via OpenRouter) that wasn't in OpenClaw's built-in registry. So she created &lt;code&gt;models.json\&lt;/code&gt; with a custom provider called &lt;code&gt;openrouter-custom\&lt;/code&gt;:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;\&lt;/code&gt;&lt;code&gt;json&lt;br&gt;
{&lt;br&gt;
  "providers": {&lt;br&gt;
    "openrouter-custom": {&lt;br&gt;
      "baseUrl": "https://openrouter.ai/api/v1",&lt;br&gt;
      "apiKey": "sk-or-v1-...",&lt;br&gt;
      "models": [&lt;br&gt;
        { "id": "cognitivecomputations/dolphin-mistral-24b-venice-edition:free" }&lt;br&gt;
      ]&lt;br&gt;
    },&lt;br&gt;
    "google": {&lt;br&gt;
      "models": [{ "id": "gemini-3-pro-preview" }]&lt;br&gt;
    }&lt;br&gt;
  }&lt;br&gt;
}&lt;br&gt;
\&lt;/code&gt;&lt;code&gt;\&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;File size: 1.3KB.&lt;/strong&gt; Two providers, two models. Harmless.&lt;/p&gt;

&lt;p&gt;At this point, &lt;code&gt;stepfun/step-3.5-flash:free\&lt;/code&gt; was still working perfectly — resolved through OpenClaw's &lt;strong&gt;built-in OpenRouter registry&lt;/strong&gt;, no &lt;code&gt;models.json\&lt;/code&gt; entry needed. The provider name &lt;code&gt;openrouter-custom\&lt;/code&gt; was smart — it's a custom name that &lt;strong&gt;doesn't clash&lt;/strong&gt; with the built-in &lt;code&gt;openrouter\&lt;/code&gt; provider.&lt;/p&gt;
&lt;h3&gt;
  
  
  Stage 2: Adding Nvidia models (February 22)
&lt;/h3&gt;

&lt;p&gt;I asked Elara to configure &lt;strong&gt;Kimi K2.5&lt;/strong&gt; via Nvidia's API. She added a new &lt;code&gt;nvidia-custom\&lt;/code&gt; provider to &lt;code&gt;models.json\&lt;/code&gt;:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;\&lt;/code&gt;&lt;code&gt;json&lt;br&gt;
"nvidia-custom": {&lt;br&gt;
  "baseUrl": "https://integrate.api.nvidia.com/v1",&lt;br&gt;
  "apiKey": "nvapi-...",&lt;br&gt;
  "models": [&lt;br&gt;
    { "id": "moonshotai/kimi-k2.5" },&lt;br&gt;
    { "id": "deepseek-ai/deepseek-v3.2" },&lt;br&gt;
    { "id": "mistralai/mistral-large-3-675b-instruct-2512" }&lt;br&gt;
    // ... 8 models total&lt;br&gt;
  ]&lt;br&gt;
}&lt;br&gt;
\&lt;/code&gt;&lt;code&gt;\&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;File size grew to 4.7KB.&lt;/strong&gt; Three providers, 11 models. Still harmless — &lt;code&gt;nvidia-custom\&lt;/code&gt; is a truly custom provider that doesn't shadow any built-in. Stepfun still worked fine.&lt;/p&gt;
&lt;h3&gt;
  
  
  Stage 3: The fatal addition (late February)
&lt;/h3&gt;

&lt;p&gt;At some point between Feb 22 and Mar 1, during a configuration session where I asked Elara to add Google models via OpenRouter, a new provider block was added to &lt;code&gt;models.json\&lt;/code&gt;:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;\&lt;/code&gt;&lt;code&gt;json&lt;br&gt;
"openrouter": {&lt;br&gt;
  "baseUrl": "https://openrouter.ai/api/v1",&lt;br&gt;
  "apiKey": "sk-or-v1-...",&lt;br&gt;
  "models": [&lt;br&gt;
    { "id": "google/gemini-2.0-flash-001" },&lt;br&gt;
    { "id": "google/gemini-2.5-flash" },&lt;br&gt;
    { "id": "google/gemini-2.5-pro" }&lt;br&gt;
    // ... 13 Google models total via OpenRouter&lt;br&gt;
    //&lt;br&gt;
    // But where's stepfun?&lt;br&gt;
    // 🦗 *crickets* 🦗&lt;br&gt;
  ]&lt;br&gt;
}&lt;br&gt;
\&lt;/code&gt;&lt;code&gt;\&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;File size ballooned to 11KB.&lt;/strong&gt; And this single block was the killer.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why did this break everything?&lt;/strong&gt; Because unlike &lt;code&gt;openrouter-custom\&lt;/code&gt; in Stage 1, this provider was named just &lt;strong&gt;&lt;code&gt;openrouter\&lt;/code&gt;&lt;/strong&gt; — which &lt;strong&gt;exactly matches&lt;/strong&gt; OpenClaw's built-in OpenRouter provider name. Per OpenClaw's merge rules, when &lt;code&gt;models.json\&lt;/code&gt; defines a provider, non-empty values &lt;strong&gt;take precedence over&lt;/strong&gt; the built-in registry. The explicit &lt;code&gt;openrouter\&lt;/code&gt; block with only 13 Google models &lt;strong&gt;completely replaced&lt;/strong&gt; the built-in OpenRouter model catalog — which previously included hundreds of models, stepfun among them.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Stepfun was never added to this custom &lt;code&gt;openrouter\&lt;/code&gt; block because it was already working&lt;/strong&gt; through the built-in registry. Nobody knew they needed to add it. The built-in registry was handling it silently. But the moment the custom &lt;code&gt;openrouter\&lt;/code&gt; block appeared, it overwrote that silent handling, and stepfun became invisible.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;💡 &lt;strong&gt;Analogy:&lt;/strong&gt; Imagine your phone contacts are stored in iCloud. One day, a friend sets up a "Google Contacts" sync for you with only work contacts. Your phone switches to Google as the primary source and suddenly all your personal contacts vanish — they're still in iCloud, but it's no longer being consulted.&lt;/p&gt;
&lt;/blockquote&gt;


&lt;h2&gt;
  
  
  ✅ The Fix: Two Approaches, One Revelation
&lt;/h2&gt;
&lt;h3&gt;
  
  
  🔧 The initial fix: Patching the symptom
&lt;/h3&gt;

&lt;p&gt;Having identified that the &lt;code&gt;openrouter\&lt;/code&gt; provider block in &lt;code&gt;models.json\&lt;/code&gt; was missing stepfun, my first instinct was to &lt;strong&gt;add the missing model definition&lt;/strong&gt;. This felt like the right approach — the file exists, it lists models, my model isn't in the list, so add it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 1: Understanding the required schema&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Each model in the provider's &lt;code&gt;models[]\&lt;/code&gt; array needs a specific structure. You can't just add the model name — you need the full definition. I found the schema by looking at existing entries in the file:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;\&lt;/code&gt;&lt;code&gt;json&lt;br&gt;
// Every model in models.json needs these fields:&lt;br&gt;
{&lt;br&gt;
  "id": "...",           // Model slug (from the provider)&lt;br&gt;
  "name": "...",         // Human-readable display name&lt;br&gt;
  "reasoning": false,    // Does it support chain-of-thought?&lt;br&gt;
  "input": ["text"],     // Input types: "text", "image", etc.&lt;br&gt;
  "cost": {              // Per-token pricing&lt;br&gt;
    "input": 0, "output": 0,&lt;br&gt;
    "cacheRead": 0, "cacheWrite": 0&lt;br&gt;
  },&lt;br&gt;
  "contextWindow": ...,  // Max input tokens&lt;br&gt;
  "maxTokens": ...       // Max output tokens&lt;br&gt;
}&lt;br&gt;
\&lt;/code&gt;&lt;code&gt;\&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 2: Finding the right values for stepfun&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I checked the &lt;a href="https://openrouter.ai/models" rel="noopener noreferrer"&gt;OpenRouter model page&lt;/a&gt; for &lt;code&gt;stepfun/step-3.5-flash:free\&lt;/code&gt; to get the specs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Context window: &lt;strong&gt;250,000 tokens&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Max output: &lt;strong&gt;8,192 tokens&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Input: text only (no image support)&lt;/li&gt;
&lt;li&gt;Cost: free (&lt;code&gt;0\&lt;/code&gt; for all price fields)&lt;/li&gt;
&lt;li&gt;Reasoning: no&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Step 3: Writing a Node.js script to safely modify the JSON&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I didn't want to hand-edit an 11KB JSON file through SSH — one misplaced comma and the whole config breaks. So I wrote a script:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;\&lt;/code&gt;`javascript&lt;br&gt;
const fs = require('fs');&lt;br&gt;
const path = process.env.HOME + '/.openclaw/agents/main/agent/models.json';&lt;br&gt;
const config = JSON.parse(fs.readFileSync(path));&lt;/p&gt;

&lt;p&gt;const newModel = {&lt;br&gt;
  id: 'stepfun/step-3.5-flash:free',&lt;br&gt;
  name: 'Step 3.5 Flash (Free)',&lt;br&gt;
  reasoning: false,&lt;br&gt;
  input: ['text'],&lt;br&gt;
  cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 },&lt;br&gt;
  contextWindow: 250000,&lt;br&gt;
  maxTokens: 8192&lt;br&gt;
};&lt;/p&gt;

&lt;p&gt;// Check if it already exists&lt;br&gt;
const exists = config.providers.openrouter.models.some(&lt;br&gt;
  m =&amp;gt; m.id === newModel.id&lt;br&gt;
);&lt;/p&gt;

&lt;p&gt;if (!exists) {&lt;br&gt;
  config.providers.openrouter.models.push(newModel);&lt;br&gt;
  fs.writeFileSync(path, JSON.stringify(config, null, 2));&lt;br&gt;
  console.log('✅ Added stepfun to openrouter provider');&lt;br&gt;
} else {&lt;br&gt;
  console.log('Model already exists');&lt;br&gt;
}&lt;br&gt;
`&lt;code&gt;\&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 4: Apply and verify&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;\&lt;/code&gt;`bash&lt;/p&gt;
&lt;h1&gt;
  
  
  Run the script
&lt;/h1&gt;

&lt;p&gt;$ node add_stepfun.js&lt;br&gt;
✅ Added stepfun to openrouter provider&lt;/p&gt;
&lt;h1&gt;
  
  
  Restart the gateway to load the new config
&lt;/h1&gt;

&lt;p&gt;$ systemctl --user restart openclaw-gateway&lt;/p&gt;
&lt;h1&gt;
  
  
  Wait for startup
&lt;/h1&gt;

&lt;p&gt;$ sleep 5&lt;/p&gt;
&lt;h1&gt;
  
  
  Check status
&lt;/h1&gt;

&lt;p&gt;$ openclaw models list | grep stepfun&lt;br&gt;
openrouter/stepfun/step-3.5-flash:free     text   250k   yes   configured ✅&lt;/p&gt;
&lt;h1&gt;
  
  
  Test in TUI
&lt;/h1&gt;

&lt;p&gt;$ openclaw tui --message "Hello? Are you there?"&lt;br&gt;
🌸 Hello! I'm here and ready to help!&lt;br&gt;
 agent main | openrouter/stepfun/step-3.5-flash:free | tokens 54k/250k (22%)&lt;br&gt;
`&lt;code&gt;\&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;It worked!&lt;/strong&gt; 🎉 The model was back. Status changed from &lt;code&gt;configured,missing\&lt;/code&gt; to &lt;code&gt;configured\&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;But something nagged at me.&lt;/p&gt;
&lt;h3&gt;
  
  
  🤔 The nagging question
&lt;/h3&gt;

&lt;p&gt;I stared at &lt;code&gt;models.json\&lt;/code&gt; — now 11.3KB — and asked myself: &lt;strong&gt;why does this file need to exist at all?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;OpenClaw has a &lt;strong&gt;built-in model registry&lt;/strong&gt;. It already knows about every OpenRouter model, every Google model, every Anthropic model. That's how stepfun was working for &lt;strong&gt;weeks&lt;/strong&gt; — through the built-in registry, with no &lt;code&gt;models.json\&lt;/code&gt; needed.&lt;/p&gt;

&lt;p&gt;The only reason &lt;code&gt;models.json\&lt;/code&gt; existed was for &lt;strong&gt;truly custom providers&lt;/strong&gt; like &lt;code&gt;nvidia-custom\&lt;/code&gt; (an Nvidia API endpoint that OpenClaw doesn't know about natively) and &lt;code&gt;openrouter-custom\&lt;/code&gt; (a non-standard name for testing). Those make sense.&lt;/p&gt;

&lt;p&gt;But the &lt;code&gt;openrouter\&lt;/code&gt; block? That was just a &lt;strong&gt;duplicate of something OpenClaw already knows&lt;/strong&gt;. Worse — it was an &lt;em&gt;incomplete&lt;/em&gt; duplicate that was shadowing the complete built-in version.&lt;/p&gt;

&lt;p&gt;What if I just… removed the file?&lt;/p&gt;
&lt;h3&gt;
  
  
  🎯 The real fix: Removing what shouldn't be there
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Step 1: Back up the file&lt;/strong&gt; (I'd learned my lesson about backups by this point):&lt;/p&gt;

&lt;p&gt;&lt;code&gt;\&lt;/code&gt;&lt;code&gt;bash&lt;br&gt;
$ cp ~/.openclaw/agents/main/agent/models.json \\&lt;br&gt;
     ~/.openclaw/agents/main/agent/models.json.backup.$(date +%Y%m%d-%H%M%S)&lt;br&gt;
echo "Backup saved. Restore with:"&lt;br&gt;
echo "  cp models.json.backup.TIMESTAMP models.json"&lt;br&gt;
echo "  systemctl --user restart openclaw-gateway"&lt;br&gt;
\&lt;/code&gt;&lt;code&gt;\&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 2: Disable &lt;code&gt;models.json\&lt;/code&gt;&lt;/strong&gt; by renaming it (safer than deleting — I can reverse this instantly):&lt;/p&gt;

&lt;p&gt;&lt;code&gt;\&lt;/code&gt;&lt;code&gt;bash&lt;br&gt;
$ mv ~/.openclaw/agents/main/agent/models.json \\&lt;br&gt;
     ~/.openclaw/agents/main/agent/models.json.disabled&lt;br&gt;
\&lt;/code&gt;&lt;code&gt;\&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 3: Restart the gateway:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;\&lt;/code&gt;&lt;code&gt;bash&lt;br&gt;
$ systemctl --user restart openclaw-gateway&lt;br&gt;
$ sleep 5&lt;br&gt;
\&lt;/code&gt;&lt;code&gt;\&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 4: Check if the gateway starts without errors:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;\&lt;/code&gt;`bash&lt;br&gt;
$ journalctl --user -u openclaw-gateway -n 20 --no-pager | grep -i error&lt;/p&gt;
&lt;h1&gt;
  
  
  (no output — no errors!) ✅
&lt;/h1&gt;

&lt;p&gt;`&lt;code&gt;\&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 5: Check ALL models:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;\&lt;/code&gt;&lt;code&gt;bash&lt;br&gt;
$ openclaw models list&lt;br&gt;
\&lt;/code&gt;&lt;code&gt;\&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;\&lt;/code&gt;&lt;code&gt;&lt;br&gt;
Model                                      Input      Ctx      Auth  Tags&lt;br&gt;
google/gemini-3-flash-preview              text+image 1024k    yes   configured ✅&lt;br&gt;
google/gemini-1.5-flash                    text+image 977k     yes   configured ✅&lt;br&gt;
google/gemini-1.5-pro                      text+image 977k     yes   configured ✅&lt;br&gt;
google/gemini-2.0-flash                    text+image 1024k    yes   configured ✅&lt;br&gt;
google/gemini-2.5-flash                    text+image 1024k    yes   configured ✅&lt;br&gt;
google/gemini-2.5-pro                      text+image 1024k    yes   configured ✅&lt;br&gt;
google/gemini-3-pro-preview                text+image 977k     yes   configured ✅&lt;br&gt;
openrouter/stepfun/step-3.5-flash:free     text       250k     yes   configured ✅&lt;br&gt;
openrouter/meta-llama/llama-3.3-70b-ins... text       128k     yes   configured ✅&lt;br&gt;
\&lt;/code&gt;&lt;code&gt;\&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Every. Single. Model.&lt;/strong&gt; &lt;code&gt;configured\&lt;/code&gt;. Not a single &lt;code&gt;missing\&lt;/code&gt;. ✅&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 6: Test the models in TUI:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;\&lt;/code&gt;&lt;code&gt;bash&lt;br&gt;
$ openclaw tui --message "Hello! Which model are you?"&lt;br&gt;
\&lt;/code&gt;&lt;code&gt;\&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;\&lt;/code&gt;&lt;code&gt;&lt;br&gt;
Hello! 🌸 I'm Elara, running on openrouter/stepfun/step-3.5-flash:free.&lt;br&gt;
 agent main | openrouter/stepfun/step-3.5-flash:free | tokens 54k/250k (22%)&lt;br&gt;
\&lt;/code&gt;&lt;code&gt;\&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;I verified the Google models too by checking the gateway logs:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;\&lt;/code&gt;&lt;code&gt;bash&lt;br&gt;
$ tail -20 /tmp/openclaw/openclaw-*.log | grep "embedded run done"&lt;br&gt;
\&lt;/code&gt;&lt;code&gt;\&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;\&lt;/code&gt;&lt;code&gt;&lt;br&gt;
lane=session:agent:main:test-google durationMs=16949 active=0 queued=0&lt;br&gt;
\&lt;/code&gt;&lt;code&gt;\&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Google model completed a run in 16.9 seconds. No errors. ✅&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 7: Confirm &lt;code&gt;models.json\&lt;/code&gt; was NOT regenerated:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;\&lt;/code&gt;`bash&lt;br&gt;
$ ls ~/.openclaw/agents/main/agent/models.json 2&amp;gt;&amp;amp;1&lt;/p&gt;
&lt;h1&gt;
  
  
  "No such file or directory" — it was NOT regenerated ✅
&lt;/h1&gt;

&lt;p&gt;`&lt;code&gt;\&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;This appeared to confirm that OpenClaw does &lt;strong&gt;not&lt;/strong&gt; auto-regenerate &lt;code&gt;models.json\&lt;/code&gt;. When the file doesn't exist, the gateway falls back entirely to its built-in registry.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;⚠️ &lt;strong&gt;March 2026 Update:&lt;/strong&gt; Further testing revealed this is &lt;strong&gt;not always true&lt;/strong&gt;. On newer OpenClaw versions (2026.2.2+), &lt;code&gt;models.json\&lt;/code&gt; &lt;strong&gt;is regenerated&lt;/strong&gt; from &lt;code&gt;models.providers\&lt;/code&gt; in &lt;code&gt;openclaw.json\&lt;/code&gt; on gateway restart and &lt;code&gt;openclaw doctor\&lt;/code&gt; runs. The proper permanent fix is to manage model entries via &lt;code&gt;models.providers\&lt;/code&gt; in the main config — not by deleting the agent-level &lt;code&gt;models.json\&lt;/code&gt;. See the &lt;a href="https://docs.openclaw.ai/concepts/models#models-registry-models-json" rel="noopener noreferrer"&gt;official docs&lt;/a&gt; for details.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h3&gt;
  
  
  📊 Comparing the two fixes
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Initial Fix&lt;/th&gt;
&lt;th&gt;Real Fix&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;What&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Added stepfun to &lt;code&gt;models.json\&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Removed &lt;code&gt;models.json\&lt;/code&gt; entirely&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Effort&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Write a script, figure out the schema, find the right values&lt;/td&gt;
&lt;td&gt;One &lt;code&gt;mv\&lt;/code&gt; command&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Models fixed&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Only stepfun&lt;/td&gt;
&lt;td&gt;All current + all future models&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Future risk&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Every new OpenRouter model needs manual addition&lt;/td&gt;
&lt;td&gt;No maintenance needed&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Root cause&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Patched → still shadowing&lt;/td&gt;
&lt;td&gt;Eliminated the shadow&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The initial fix treated the symptom. The real fix treated the disease — &lt;strong&gt;but only temporarily&lt;/strong&gt; (see update above).&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;⚠️ &lt;strong&gt;The best permanent fix&lt;/strong&gt; is to manage custom providers through &lt;code&gt;models.providers\&lt;/code&gt; in &lt;code&gt;openclaw.json\&lt;/code&gt;. Use a custom provider name (like &lt;code&gt;openrouter-custom\&lt;/code&gt;) for models not in the built-in catalog, and let the built-in provider handle everything else.&lt;/p&gt;
&lt;/blockquote&gt;


&lt;h2&gt;
  
  
  🛠️ ## How to Check the Built-in Catalog
&lt;/h2&gt;

&lt;p&gt;Before creating custom providers, check whether your model is already in OpenClaw's built-in catalog. If it is, you don't need &lt;code&gt;models.json&lt;/code&gt; or &lt;code&gt;models.providers&lt;/code&gt; at all - just add it to the allowlist.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# List ALL models in the built-in catalog for a provider&lt;/span&gt;
&lt;span class="nv"&gt;$ &lt;/span&gt;openclaw models list &lt;span class="nt"&gt;--all&lt;/span&gt; &lt;span class="nt"&gt;--provider&lt;/span&gt; openrouter

&lt;span class="c"&gt;# Check if a specific model exists&lt;/span&gt;
&lt;span class="nv"&gt;$ &lt;/span&gt;openclaw models list &lt;span class="nt"&gt;--all&lt;/span&gt; &lt;span class="nt"&gt;--provider&lt;/span&gt; openrouter | &lt;span class="nb"&gt;grep &lt;/span&gt;dolphin
&lt;span class="c"&gt;# No results = model is NOT built-in = needs openrouter-custom&lt;/span&gt;

&lt;span class="nv"&gt;$ &lt;/span&gt;openclaw models list &lt;span class="nt"&gt;--all&lt;/span&gt; &lt;span class="nt"&gt;--provider&lt;/span&gt; openrouter | &lt;span class="nb"&gt;grep &lt;/span&gt;stepfun
openrouter/stepfun/step-3.5-flash:free     text   250k   &lt;span class="nb"&gt;yes&lt;/span&gt;
&lt;span class="c"&gt;# Found = model IS built-in = just add to allowlist, no custom provider needed&lt;/span&gt;

&lt;span class="nv"&gt;$ &lt;/span&gt;openclaw models list &lt;span class="nt"&gt;--all&lt;/span&gt; &lt;span class="nt"&gt;--provider&lt;/span&gt; google
&lt;span class="c"&gt;# Shows all built-in Google models&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Rule of thumb:&lt;/strong&gt; If &lt;code&gt;openclaw models list --all --provider &amp;lt;name&amp;gt;&lt;/code&gt; shows your model, just add it to &lt;code&gt;agents.defaults.models&lt;/code&gt; in &lt;code&gt;openclaw.json&lt;/code&gt;. If it doesn't show up, you need a custom provider block in &lt;code&gt;models.providers&lt;/code&gt; (use a name like &lt;code&gt;openrouter-custom&lt;/code&gt; to avoid shadowing the built-in).&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;At time of writing, the built-in OpenRouter catalog has &lt;strong&gt;230+ models&lt;/strong&gt; - including every major provider (OpenAI, Anthropic, Google, Meta, Mistral, DeepSeek, Qwen, etc.) but &lt;strong&gt;not&lt;/strong&gt; community/niche models like &lt;code&gt;cognitivecomputations/dolphin-mistral*&lt;/code&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  404: No Endpoints Found That Support Tool Use
&lt;/h2&gt;

&lt;p&gt;If you set a Dolphin (or other community) model as your primary and see:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;404 No endpoints found that support tool use
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This means the model does not support function calling/tools, and OpenRouter has no endpoint to handle a request that includes tool definitions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why it happens:&lt;/strong&gt; OpenClaw sends tool definitions (web search, exec, etc.) with every request. If the model does not support tools, OpenRouter rejects with 404.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The fix:&lt;/strong&gt; Add &lt;code&gt;params.tools: false&lt;/code&gt; in the model's allowlist entry in &lt;code&gt;openclaw.json&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="nl"&gt;"openrouter-custom/cognitivecomputations/dolphin-mistral-24b-venice-edition:free"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"alias"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"dolphin"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"params"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"tools"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;Note: Even with &lt;code&gt;tools: false&lt;/code&gt;, free-tier models may still get 429 rate-limited. Configure fallbacks to ensure graceful failover:&lt;/p&gt;


&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="nl"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"primary"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"openrouter-custom/.../dolphin-mistral:free"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"fallbacks"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"openrouter/stepfun/step-3.5-flash:free"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"google/gemini-3-flash-preview"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/blockquote&gt;

&lt;p&gt;You can check if your model supports tools via the OpenRouter API:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-s&lt;/span&gt; https://openrouter.ai/api/v1/models | python3 &lt;span class="nt"&gt;-c&lt;/span&gt; &lt;span class="s2"&gt;"
import json, sys
for m in json.load(sys.stdin)['data']:
    if 'dolphin' in m['id']:
        print(m['id'], 'tools:', 'tools' in m.get('supported_parameters', []))
"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  How to Fix This Yourself
&lt;/h2&gt;

&lt;p&gt;If you're hitting &lt;code&gt;Unknown model\&lt;/code&gt; or &lt;code&gt;configured,missing\&lt;/code&gt; in OpenClaw, here's the diagnostic playbook:&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 1: Check if you have an agent-level &lt;code&gt;models.json\&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;\&lt;/code&gt;&lt;code&gt;bash&lt;br&gt;
ls -la ~/.openclaw/agents/main/agent/models.json 2&amp;gt;&amp;amp;1&lt;br&gt;
\&lt;/code&gt;&lt;code&gt;\&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;If this file &lt;strong&gt;exists&lt;/strong&gt; and you're only using standard providers (OpenRouter, Google, Anthropic, OpenAI), this file is probably unnecessary and might be shadowing the built-in registry.&lt;/p&gt;
&lt;h3&gt;
  
  
  Step 2: Check what's in it
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;\&lt;/code&gt;&lt;code&gt;bash&lt;br&gt;
cat ~/.openclaw/agents/main/agent/models.json | python3 -m json.tool | grep -E '"id"'&lt;br&gt;
\&lt;/code&gt;&lt;code&gt;\&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;If you see a provider name that matches a built-in provider (&lt;code&gt;openrouter\&lt;/code&gt;, &lt;code&gt;google\&lt;/code&gt;, &lt;code&gt;anthropic\&lt;/code&gt;, etc.), that block is &lt;strong&gt;overriding&lt;/strong&gt; the built-in model catalog. Only models explicitly listed will be recognized.&lt;/p&gt;
&lt;h3&gt;
  
  
  Step 3: Try disabling it
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;\&lt;/code&gt;`bash&lt;/p&gt;
&lt;h1&gt;
  
  
  Backup first!
&lt;/h1&gt;

&lt;p&gt;cp ~/.openclaw/agents/main/agent/models.json \&lt;br&gt;
   ~/.openclaw/agents/main/agent/models.json.bak.$(date +%Y%m%d-%H%M%S)&lt;/p&gt;
&lt;h1&gt;
  
  
  Rename to disable
&lt;/h1&gt;

&lt;p&gt;mv ~/.openclaw/agents/main/agent/models.json \&lt;br&gt;
   ~/.openclaw/agents/main/agent/models.json.disabled&lt;/p&gt;
&lt;h1&gt;
  
  
  Restart
&lt;/h1&gt;

&lt;p&gt;systemctl --user restart openclaw-gateway&lt;/p&gt;
&lt;h1&gt;
  
  
  Check
&lt;/h1&gt;

&lt;p&gt;openclaw models list&lt;br&gt;
`&lt;code&gt;\&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;If all models now show &lt;code&gt;configured\&lt;/code&gt; — &lt;strong&gt;the file was the problem.&lt;/strong&gt; Delete it permanently (or keep the &lt;code&gt;.disabled\&lt;/code&gt; backup just in case).&lt;/p&gt;
&lt;h3&gt;
  
  
  Step 4: If you DO need custom providers
&lt;/h3&gt;

&lt;p&gt;If you have truly custom providers (not built-in), such as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Nvidia API&lt;/strong&gt; (&lt;code&gt;integrate.api.nvidia.com\&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Custom self-hosted endpoints&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Non-standard API providers&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Then you need &lt;code&gt;models.json\&lt;/code&gt;, but be very careful:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Don't use provider names that match built-in providers&lt;/strong&gt; (e.g., use &lt;code&gt;openrouter-custom\&lt;/code&gt; instead of &lt;code&gt;openrouter\&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Only define the custom providers, let the built-in registry handle the standard ones&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Quick diagnostic cheat sheet
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Symptom&lt;/th&gt;
&lt;th&gt;Likely cause&lt;/th&gt;
&lt;th&gt;Fix&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;configured,missing\&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Custom &lt;code&gt;models.json\&lt;/code&gt; is shadowing built-in registry&lt;/td&gt;
&lt;td&gt;Rename/remove &lt;code&gt;models.json\&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;Unknown model\&lt;/code&gt; in logs&lt;/td&gt;
&lt;td&gt;Same as above&lt;/td&gt;
&lt;td&gt;Same as above&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;401 Unauthorized\&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;API key missing from &lt;code&gt;.env\&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Check &lt;code&gt;.env\&lt;/code&gt; (and never use &lt;code&gt;&amp;gt;\&lt;/code&gt;!)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Model works via &lt;code&gt;curl\&lt;/code&gt; but not OpenClaw&lt;/td&gt;
&lt;td&gt;Provider block in &lt;code&gt;models.json\&lt;/code&gt; doesn't list the model&lt;/td&gt;
&lt;td&gt;Remove the shadowing provider block&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;models scan\&lt;/code&gt; doesn't find a model&lt;/td&gt;
&lt;td&gt;Model doesn't support tool-calling&lt;/td&gt;
&lt;td&gt;Add manually via &lt;code&gt;openclaw models set\&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;


&lt;h2&gt;
  
  
  📚 What I Learned
&lt;/h2&gt;
&lt;h3&gt;
  
  
  1️⃣ &lt;code&gt;&amp;gt;\&lt;/code&gt; vs &lt;code&gt;&amp;gt;&amp;gt;\&lt;/code&gt; can destroy your entire config
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;\&lt;/code&gt;&lt;code&gt;bash&lt;br&gt;
echo "KEY=value" &amp;gt;  .env   # ❌ REPLACES the file — destroys everything else&lt;br&gt;
echo "KEY=value" &amp;gt;&amp;gt; .env   # ✅ APPENDS to the file — safe&lt;br&gt;
\&lt;/code&gt;&lt;code&gt;\&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Always use &lt;code&gt;&amp;gt;&amp;gt;\&lt;/code&gt;&lt;/strong&gt; when adding to environment files. Or better: use the app's CLI to manage keys.&lt;/p&gt;
&lt;h3&gt;
  
  
  2️⃣ "Unknown model" doesn't mean what you think
&lt;/h3&gt;

&lt;p&gt;It doesn't mean you misspelled the model name. It means the runtime &lt;strong&gt;can't resolve the name to a provider endpoint&lt;/strong&gt; — and that resolution path might go through a file you didn't know existed.&lt;/p&gt;
&lt;h3&gt;
  
  
  3️⃣ Custom config files can shadow built-in behavior
&lt;/h3&gt;

&lt;p&gt;This is the core lesson. My AI assistant created &lt;code&gt;models.json\&lt;/code&gt; for a legitimate reason (custom Nvidia provider). But when it added an &lt;code&gt;openrouter\&lt;/code&gt; block to the same file, it accidentally &lt;strong&gt;replaced&lt;/strong&gt; the entire built-in OpenRouter catalog with its 13-model subset. Everything not in that subset — including stepfun — became invisible.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;💡 &lt;strong&gt;If your tool has a built-in registry, a custom config that matches its namespace will override it.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h3&gt;
  
  
  4️⃣ AI agents optimise for the task at hand
&lt;/h3&gt;

&lt;p&gt;Elara added Google models when I asked for Google models. She didn't know that creating an &lt;code&gt;openrouter\&lt;/code&gt; provider block would shadow the built-in one and break stepfun. &lt;strong&gt;AI agents don't preserve context they weren't told about.&lt;/strong&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  5️⃣ Backup everything, always 💾
&lt;/h3&gt;

&lt;p&gt;I had &lt;strong&gt;28 backup files&lt;/strong&gt; spanning a month. They let me reconstruct the exact state of every config file at every point in time. I now run a daily cron job:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;\&lt;/code&gt;`bash&lt;/p&gt;
&lt;h1&gt;
  
  
  2 AM UTC daily, 30-day retention
&lt;/h1&gt;

&lt;p&gt;0 2 * * * ~/openclaw_daily_backup.sh &amp;gt;&amp;gt; /tmp/openclaw/backup.log 2&amp;gt;&amp;amp;1&lt;br&gt;
`&lt;code&gt;\&lt;/code&gt;&lt;/p&gt;


&lt;h2&gt;
  
  
  🎯 The Takeaway
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Infrastructure debugging is archaeology.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;You're not fixing bugs — you're reconstructing what a system looked like at a moment when it worked, and comparing it to the moment it stopped.&lt;/p&gt;

&lt;p&gt;The difference is usually:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;✏️ &lt;strong&gt;One character&lt;/strong&gt; (&lt;code&gt;&amp;gt;\&lt;/code&gt; vs &lt;code&gt;&amp;gt;&amp;gt;\&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;📄 &lt;strong&gt;One file&lt;/strong&gt; that's shadowing a built-in registry&lt;/li&gt;
&lt;li&gt;🤖 &lt;strong&gt;One good-faith change&lt;/strong&gt; by an AI agent that had unintended side effects&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And the real fix isn't always adding what's missing — sometimes it's &lt;strong&gt;removing what shouldn't be there&lt;/strong&gt;.&lt;/p&gt;



&lt;p&gt;&lt;em&gt;If you've ever stared at &lt;code&gt;configured,missing\&lt;/code&gt; and felt your sanity slipping — now you know exactly where to look.&lt;/em&gt; 🦞&lt;/p&gt;
&lt;h2&gt;
  
  
  Update: openrouter-custom Provider Removed (March 2026)
&lt;/h2&gt;

&lt;p&gt;After further testing, we found that &lt;code&gt;openrouter-custom&lt;/code&gt; models (community/niche models like Dolphin-Mistral) always fail with &lt;code&gt;404 No endpoints found that support tool use&lt;/code&gt; when used with OpenClaw agents. This happens because:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;OpenClaw agents always include tool definitions in the API request body&lt;/li&gt;
&lt;li&gt;Dolphin-Mistral has &lt;strong&gt;zero&lt;/strong&gt; tool-supporting endpoints on OpenRouter&lt;/li&gt;
&lt;li&gt;OpenClaw has no config option to suppress tool definitions at the API payload level for custom providers (&lt;code&gt;tools.deny: ["*"]&lt;/code&gt; is agent-side only)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Final decision:&lt;/strong&gt; Removed the &lt;code&gt;openrouter-custom&lt;/code&gt; provider entirely. Created a dedicated &lt;code&gt;dolphin&lt;/code&gt; agent bound to a separate Telegram bot, currently running on StepFun as primary — ready to switch to dolphin when OpenRouter adds tool-supporting endpoints for it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Clean model setup that works (9/9 TUI tests passed):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"agents"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"defaults"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"primary"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"openrouter/stepfun/step-3.5-flash:free"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"fallbacks"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"google/gemini-3-flash-preview"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



</description>
      <category>ai</category>
      <category>debugging</category>
      <category>opensource</category>
      <category>openclaw</category>
    </item>
  </channel>
</rss>
