Forem: Shifu

The End of Manual QA Writing? How an OpenClaw Skill Automates Testing Strategy

Shifu — Fri, 13 Mar 2026 19:17:24 +0000

The End of Manual QA Writing? How an OpenClaw Skill Automates Testing Strategy

Discover how the QA Architecture Auditor OpenClaw skill generates comprehensive testing strategies from scratch, freeing QA engineers from manual test case writing — and what it means for the future of QA roles.

Introduction

If you've ever been knee‑deep in a codebase, tasked with writing test cases for the first time, you know the drill: sift through modules, guess what needs testing, write repetitive boilerplate, and hope you didn't miss that one edge case that'll blow up in production. Quality Assurance is essential, but the manual labor of test case creation is a notorious bottleneck. What if an AI could read your code and instantly produce a comprehensive, independent testing strategy — complete with risk scores, security maps, and ready‑to‑run test examples?

Enter the QA Architecture Auditor, an OpenClaw skill that performs forensic analysis of any repository and spits out an exhaustive QA strategy report. This isn't just another code coverage tool; it's a full‑blown QA architect that operates under a zero‑trust policy, ignoring any existing tests and designing everything from scratch. The result? A multi‑methodology testing matrix that covers everything from black‑box to mutation testing, all tailored to your tech stack.

In this article, we'll explore why traditional QA test writing is failing modern development, how this OpenClaw skill changes the game, and what it means for the future of QA roles. Spoiler: it doesn't make testers redundant — it makes them strategists.

The Problem with Manual QA Test Creation

Let's face reality: writing test cases is often a Sisyphean task. Here's why:

Time‑consuming and repetitive – For every function you write, you need to craft happy paths, edge cases, error handling, and integration hooks. Multiply that across a growing codebase and you've got weeks of effort.
Inconsistent coverage – Different QA engineers have different standards. One might miss boundary values, another might forget security scenarios. Maintaining uniform coverage across teams is nearly impossible.
Scalability nightmare – As microservices proliferate, keeping test suites up to date becomes a full‑time job. Any sprint that adds features must also extend tests, leading to technical debt or shortcuts.
Blind spots – Humans naturally gravitate toward the familiar (unit tests) and neglect less obvious but critical areas: fuzzing, mutation testing, accessibility, localization, performance under load, and compatibility across browsers/OSes.
Bottleneck for releases – QA is often the gatekeeper. If test writing lags, releases slip. Companies either ship with insufficient tests or delay features.
Audit & compliance headaches – Auditors demand evidence of structured testing, ITGC controls, and risk‑based test plans. Manually assembling this documentation is error‑prone and time‑intensive.

The ideal solution would be an independent, automated QA architect that can examine any codebase and produce a prioritized, comprehensive testing blueprint — one that covers all methodologies, is tailored to the detected stack, and can be regenerated whenever the code evolves.

Meet the QA Architecture Auditor

The QA Architecture Auditor is an OpenClaw skill that does exactly that. It's a Python‑based CLI tool (qa-audit) that you can invoke directly or via slash command in OpenClaw. It performs deep static analysis and generates an HTML or Markdown report that serves as a complete QA strategy.

Core capabilities

Forensic codebase analysis – Detects languages, frameworks, architecture pattern (monolith, microservices, serverless, etc.), dependencies, modules, cyclomatic complexity, and more.
Risk assessment – Scores each module from 0‑100 based on complexity, external calls, authentication handling, data persistence, cryptography, file I/O, coupling, and public API surface. High‑risk modules surface for prioritized testing.
Security surface mapping – Identifies modules that touch authentication, authorization, input validation, output encoding, session management, cryptography, file ops, network ops, and database ops.
Entry point discovery – Finds main, app.py, manage.py, index.js, etc., to focus end‑to‑end and smoke tests.
Data flow mapping – Traces imports/dependencies to expose integration points.
ITGC controls – Generates a tailored checklist of IT General Controls compliance items (change management, access control, testing requirements, security scanning, code signing, deployment gates, etc.) based on your tech stack.
Report generation – Produces a beautifully formatted HTML or Markdown report crammed with actionable insights, including:
- Executive summary
- Codebase statistics (languages, file counts, dependencies)
- Frameworks detected
- Risk assessment table (severity, type, module, score, description)
- Security surface mapping table
- Testing methodology matrix with independent baseline, vulnerability & risk assessment, strategy, and from‑scratch test cases for each of 20+ methodologies
- Tooling recommendations (pytest/Jest/JUnit/etc.) tailored to your stack
- ITGC controls checklist
- Dependencies analysis (if available)
Zero‑trust policy – The skill ignores any existing tests. It assumes you're starting from zero and designs everything accordingly. This is crucial for audits and for turning around neglected codebases.

All of this runs locally; your code never leaves your machine unless a remote URL is provided, in which case only a standard git clone occurs.

What Makes This Skill Unique?

The QA ecosystem is no stranger to static analysis tools (linters, complexity analyzers, OWASP ZAP, etc.). But the QA Architecture Auditor fills a critical gap: a holistic, methodology‑agnostic testing strategy generator. Let's break down its distinctive features.

20+ Testing Methodologies Covered

The report includes dedicated sections for each major testing approach, complete with an independent baseline definition, risk assessment, strategy, and from‑scratch test examples:

Core execution: Black Box, White Box, Manual, Automated
Functional & structural: Unit, Integration, System, Functional, Smoke, Sanity, E2E, Regression, API, Database Integrity
Non‑functional: Performance, Security, Usability, Compatibility, Accessibility, Localization
Specialized: Acceptance (UAT), Exploratory, Boundary Value Analysis, Monkey/Random Testing, Fuzz Testing, Mutation Testing, Non‑Functional General

That's not just a list — each section contains test cases written in the language of your stack (Python, JavaScript, Java, Go, etc.) showing exactly how to validate those dimensions. For example, the Fuzz Testing section shows how to use atheris or libFuzzer to feed malformed data to your APIs; the Mutation Testing section suggests mutmut, Stryker, or PITest and targets an 80%+ mutation score.

Zero‑Trust Baseline

Many tools pretend to “assess” a project by looking at its coverage reports. This skill deliberately ignores existing tests. Its premise: trust nothing, start from first principles. That independence is gold for audits and for teams that suspect they're not covering enough.

Risk‑Based Prioritization

The skill assigns a risk score to each module, combining complexity and security factors. The highest‑scoring modules get explicit attention in the risk assessment table, and the methodology recommendations are tailored accordingly (e.g., more security and database tests for data‑intensive modules). This tells you exactly where to focus your effort first.

Tailored Tooling Recommendations

Instead of a generic tool list, the skill recommends specific tools based on the languages and frameworks it detects. Python project? It suggests pytest, pytest‑cov, bandit, safety, locust or k6. Java? JUnit 5, Spring Boot Test, SonarQube. JavaScript/TypeScript? Jest or Vitest, Cypress/Playwright, ESLint security plugins. This makes the report immediately actionable.

All‑Local, No External AI

The analysis is purely deterministic; no queries to ChatGPT or any cloud service. It respects your privacy and avoids external dependencies. That's a relief for sensitive codebases.

How QA Engineers Transform, Not Disappear

Will this skill make QA testers redundant? Not at all — it elevates them. The skill produces raw test strategies; it doesn't execute tests or integrate with CI automatically (though that could be a next step). QA engineers become QA architects who:

Review the generated strategy for business‑logic nuances
Refine risk scores based on domain knowledge
Implement the suggested test skeletons, filling in domain‑specific data and assertions
Integrate the tests into CI/CD pipelines
Triage and investigate failures discovered by the new tests
Continuously improve the skill itself (since it's open source)

The time saved from manual test authoring can be redirected toward higher‑value activities: exploratory testing, usability studies, performance tuning, and security hardening. In other words, the boring part gets automated, and the creative, investigative work remains human‑centric.

A Real‑World Walkthrough

Let's see the skill in action on a tiny Flask API sample:

qa-audit --repo ./flask-demo --output report.html --format html

The generated report.html opens to a clean UI. The Executive Summary tells us we have 12 modules, 3 languages (Python, HTML, SQL), and highlights the login module as the highest risk (score 78). The Risk Assessment table shows the critical authentication module, some data‑intensive endpoints, and a couple of high‑complexity utility functions.

The Security Surface reveals 5 areas: authentication, input_validation, database_operations, output_encoding, session_management. So we know we need strong auth and input tests.

Scrolling to the Testing Methodology Matrix, we find:

Black Box: baseline "no internal knowledge", strategy "equivalence partitioning, boundary value analysis, decision tables", and test cases showing how to structure API tests for endpoints.
API: specific suggestions like "test all routes with method overrides, validate status codes, schemas, auth headers, error handling". The example uses pytest and requests to hit the endpoints with valid, missing, and malformed payloads.
Security: OWASP Top 10 validation checklist with code snippets for SQL injection, XSS, authentication bypass.
Performance: load test script using locust that simulates 1000 users hitting the login endpoint with a 2‑second SLA.
Accessibility: for the UI, it suggests axe-core and keyboard navigation checks, complete with a pytest integration example.

Each section also includes a Vulnerability & Risk Assessment paragraph tailored to our codebase, e.g.:

"The 12 entry points represent the primary black-box testing surface. Focus on 5 authentication modules and 3 database interaction points."

The Tooling Recommendations section lists: pytest + pytest‑cov, locust, bandit, safety, OWASP ZAP, plus CI/CD suggestions.

Finally, the ITGC Controls section enumerates change management, access control, testing requirements, security scanning, dependency management, code signing, audit trail, deployment controls, incident response — all with specific notes for our detected stack (Python, Flask). This is gold for SOC2 or ISO27001 prep.

In short, you get a ready‑to‑implement test plan that would otherwise take weeks of manual effort.

Sample Report Excerpts

To give you a taste of what the report looks like, here's a trimmed excerpt from the Risk Assessment table:

Severity	Risk Type	Module	Risk Score	Description
CRITICAL	security	auth/login.py	85	Authentication handling detected — requires rigorous security testing
HIGH	code_complexity	services/order.py	72	High complexity module with many branches — needs path coverage
MEDIUM	dependency	requirements.txt	60	Unpinned dependencies detected

And from the Testing Methodology Matrix, the Fuzz Testing section:

Independent Baseline: Feed malformed, unexpected, or extreme data to the system to expose vulnerabilities like buffer overflows or injection flaws.
Vulnerability & Risk Assessment: Fuzz testing needed for any input parsing modules. Focus on 12 modules that handle user‑supplied data.
Strategy: Use fuzzing tools to generate semi‑valid inputs that stress parsers and data handlers.
From‑Scratch Test Cases:

Fuzz Testing – Malformed Data

```python
import atheris
from example_api import app

def TestOneInput(data):
fdp = atheris.FuzzedDataProvider(data)
endpoint = fdp.PickValueInList(['/api/users', '/api/orders'])
method = fdp.PickValueInList(['GET', 'POST'])
# Build random malformed payload...
response = requests.request(method, f'http://localhost:8080{endpoint}', json=payload)
assert response.status_code < 500
atheris.Setup(sys.argv, TestOneInput)
atheris.Fuzz()
   *Validation: Fuzzing finds no crashes or memory leaks; all malformed inputs handled safely.*

These concrete examples show you how to jump straight into implementation without guessing.

Why Not Just Use X?

You might wonder: "Can't we already do this with SonarQube or OWASP ZAP?" Those tools address specific facets — static analysis, dependency checks, dynamic scanning. They don't produce a holistic testing strategy that spans unit, integration, security, performance, accessibility, compliance, and the more exotic methodologies like mutation and fuzz testing. Nor do they provide the from‑scratch test cases ready for adaptation. The QA Architecture Auditor consolidates all that into one coherent, prioritized plan. Think of it as the missing link between static analysis and actual test implementation.

How to Get Started

Ready to try it out? Here's how to install and run the skill:

Installation from ClawHub (once published)

clawhub install shifulegend/qa-architecture-auditor

Manual install from GitHub

git clone https://github.com/shifulegend/qa-architecture-auditor.git \
  ~/.openclaw/workspace/skills/qa-architecture-auditor

Running the skill

Use the slash command in your OpenClaw chat or call the CLI directly:

/qa-audit --repo /path/to/your/project --format html --output qa-report.html

You can also generate Markdown:

/qa-audit --repo https://github.com/yourorg/yourrepo.git --format md --output audit.md

Common options:

--security-scan – performs additional security vulnerability analysis (uses local scanners)
--compliance soc2|iso27001|hipaa|gdpr – tailors the ITGC section to the target framework
--exclude node_modules,.git,build – exclude directories
--include-test-cases – (default) includes ready‑to‑copy test examples

Check qa-audit --help for all flags.

The Bigger Picture: AI‑Driven QA Strategies

The QA Architecture Auditor is more than a one‑off tool; it's a glimpse into the future of AI‑augmented software engineering. Imagine:

Continuous auditing: The skill runs on every push, updating the risk assessment and flagging newly introduced high‑risk modules.
CI/CD integration: Auto‑generate test stubs for new code, then let developers fill in the specifics.
Compliance as code: The ITGC controls become part of your compliance documentation, automatically refreshed.
Multi‑repo aggregation: Run it across microservices and aggregate risk into a dashboard.

All of these are natural extensions that the open‑source community could build. The skill is published under the MIT license and welcomes contributions.

Conclusion

Manual test case writing doesn't have to remain the bottleneck. The QA Architecture Auditor OpenClaw skill offers a practical, immediate way to generate a comprehensive, independent QA strategy from a single command. It covers more methodologies than any human checklist, adapts to your stack, and delivers both strategic insights (risk scores, security surface) and tactical artifacts (test examples). For QA engineers, it's not replacement — it's an elevation to QA architect. For teams, it's a shortcut to robust, audit‑ready testing.

Give it a try on your next codebase. You might just find that your QA workload becomes not only manageable but also more strategic and impactful.

Call to Action

Install the skill from ClawHub or GitHub today.
Run it on a project you care about and explore the report.
Contribute: Found a bug? Have an idea for a new methodology? Open an issue or PR on the GitHub repo: https://github.com/shifulegend/qa-architecture-auditor
Share: Forward this article to your QA team and let them try it out.

Let's make testing smarter, faster, and more comprehensive — together.

Published on DEV.to • 12 min read

How I Automated Python Documentation Using AST Parsing and Multi-Provider LLMs

Shifu — Fri, 13 Mar 2026 19:17:15 +0000

We've all been there. You just spent three intense days crafting a highly optimized, beautifully architected new feature. The code is elegant. The tests are passing. The linter is perfectly silent. You push your branch, open a Pull Request, and then reality hits you like a truck:

"Oh right. I need to update the documentation."

Let’s be honest: writing documentation is the chore that developers love to hate. In an ideal world, documentation evolves alongside the code. In reality, it stays stuck in 2023, while your application code races toward 2025.

For the longest time, the solution has been either drudgery (doing it manually) or using brittle, regex-based parsers that break the moment you introduce a slightly complex Python decorator or a nested asynchronous function.

I decided I was done with both options. So, I spent the last few weeks building AutoDocGen (pypiautodocgen on PyPI).

Instead of searching for strings like a glorified grep command, AutoDocGen parses your Python code into an Abstract Syntax Tree (AST). It knows what’s a class, what’s a private method, and how your modules are intrinsically linked. It takes that blueprint and feeds it to the Large Language Model of your choice to generate human-readable, perfectly formatted Markdown documentation.

Here is the story of how I built it, the technical hurdles I faced, and why I believe AST parsing combined with AI is the future of code documentation.

1. The Problem with Regex-Based Documentation

Historically, many lightweight documentation tools have relied on Regular Expressions. They scan a file line-by-line looking for def or class, extract the following string, and try to grab the docstring block below it.

This approach is fundamentally flawed for modern Python development. Why? Because Python syntax is incredibly expressive.

Consider this snippet:

@cache(ttl=3600)
@validate_schema(UserSchema)
async def fetch_user_data(
    user_id: uuid.UUID, 
    include_history: bool = False
) -> Dict[str, Any]:
    """Fetches user data from the primary replica."""
    pass

A regex parser has to somehow know that the decorators belong to the function, correctly identify it as asynchronous, handle the multi-line signature, parse the type hints, and extract the docstring. Add in nested classes, closures, and complex return types, and your regex quickly devolves into an unmaintainable nightmare.

Regex doesn't understand code; it only recognizes patterns in text. I needed a tool that understood the structure of Python itself.

2. Enter the Abstract Syntax Tree (AST)

Python includes a built-in module called ast. It allows you to parse Python source code into a tree of nodes representing the syntactic structure of the program.

Instead of reading lines of text, AutoDocGen uses ast.parse() to read the "DNA" of your code.

When you feed the above snippet into an AST parser, it doesn't see a string of text. It sees an AsyncFunctionDef node. It knows that this node has a decorator_list containing Call nodes. It maps out the arguments (complete with their type annotations) and gracefully extracts the exact docstring using ast.get_docstring().

By extracting this structured data, AutoDocGen builds a high-fidelity "blueprint" of your codebase. We extract:

Module-level variables and logic
Class definitions, their base classes (inheritance), and methods
Standalone functions (sync and async)
Exact signatures and type hints

We then serialize this blueprint into a structured format (JSON or YAML representation of the AST summary).

This is the secret sauce. We aren't asking the AI to read your code from scratch and guess what it does. We are giving the AI a structural map and asking it to explain the map. This drastically reduces LLM hallucinations and dramatically improves the quality of the generated documentation.

3. Breaking Free from Vendor Lock-in: Multi-Provider Support

When I started building the AI generation step, I realized a major frustration with the current landscape of AI developer tools: almost all of them hardcode OpenAI's API.

While GPT-4o is incredible, we are living in a golden age of open-weight models and blistering fast inference APIs. I didn't want users to be locked into OpenAI if they preferred Google's tools, or if they wanted the incredible speed of Groq.

So, I built an abstraction layer within AutoDocGen to support multiple LLM providers:

OpenAI: The standard fallback.
Groq: If you want documentation generated in literally 2 seconds per file, using resources like Llama-3 on LPUs is life-changing.
Google Gemini: Excellent context windows for deeply understanding complex module interdependencies.
OpenRouter: The ultimate freedom. This allows you to route requests to dozens of different models (including free tiers like Stepfun) without changing your core integration.

The configuration hierarchy is flexible. You can set everything via environment variables (GROQ_API_KEY), a local .env file, an autodocgen.yaml config, or directly in your pyproject.toml.

# autodocgen.yaml
version: 1
ai:
  provider: groq
  model: llama3-70b-8192
output:
  dir: ./docs
  format: markdown

4. Templating the Output: Jinja2 for Premium Style

The final piece of the puzzle was the output format. Most automated documentation tools generate dull, uninspired text blocks. I wanted documentation that looked like it was handcrafted by a technical writer.

Instead of relying on the LLM to format the Markdown (which often leads to inconsistent headings and broken tables), AutoDocGen strictly separates generation from presentation.

The LLM returns structured data (a summary of the module, bullet points of functionality, etc.). AutoDocGen then injects this data into Jinja2 templates.

By using Jinja2 (module.md.j2 and index.md.j2), the CLI guarantees a consistent, premium aesthetic across your entire documentation site. It perfectly formats function signatures, builds an automatic Table of Contents, and cross-links related modules.

If you don't like my default template, you can easily fork the templates/ directory and build your own.

5. Security First: The "Zero-Trust" QA Audit

Because I was releasing an AI tool that reads source code, I knew security and stability had to be paramount. I didn't just write some unit tests and call it a day.

Before hitting v0.1.0, the project underwent what I call a "Zero-Trust Forensic QA Audit". I assumed the initial proof-of-concept code was entirely broken and built a test suite from scratch.

We utilized:

pytest for comprehensive unit and integration testing.
bandit for security scanning to ensure API keys are never leaked in logs and file I/O operations are secure.
Extensive mocking of all LLM providers so the CLI could be tested deeply in CI/CD without burning API credits.
Edge-case testing including handling of exotic Unicode identifiers (yes, def grüne_äpfel() parses perfectly).

The repository is now fully integrated with Codecov, maintaining a strict baseline for any future pull requests.

How to Get Started

If you are tired of your README files falling out of sync with your codebase, I highly encourage you to give AutoDocGen a spin.

It's live now on PyPI.

You can install it directly via pip:

pip install pypiautodocgen

To run it against your current directory and output to ./docs:

autodocgen -o ./docs --provider groq # Or openai, gemini, openrouter

The Roadmap

Currently, AutoDocGen creates fantastic Markdown files perfectly suited for static site generators like MkDocs or direct consumption on GitHub.

Looking forward, I want to explore:

Framework-specific parsing: Specialized templates for FastAPI endpoints or Django models.
Diff-based updating: Only regenerating documentation for the specific functions that changed in a commit, rather than full-file regeneration.
Mermaid diagram generation: Automatically creating architecture flowcharts based on AST imports.

Let's Connect!

I built AutoDocGen to solve my own pain point, but I know the community has incredible ideas on how to push it further.

Check out the source code on GitHub (and drop a star if you find it useful!):
https://github.com/shifulegend/autodocgen

I would love to hear your feedback in the comments. Are you still writing documentation by hand? What has been your biggest frustration with existing auto-generated documentation tools? Let me know!

Why I stopped writing manual test cases: This OpenClaw skill does it for me 🤖✨

Shifu — Fri, 13 Mar 2026 17:00:11 +0000

Introduction 🚀

Quality Assurance is essential, but let's be honest: the manual labor of test case creation is a notorious bottleneck. It's slow, error-prone, and frankly, a bit soul-crushing.

What if an AI could read your code and instantly produce a comprehensive, independent testing strategy — complete with risk scores, security maps, and ready‑to‑run test examples?

The result? A multi‑methodology testing matrix that covers everything from black‑box to mutation testing, all tailored to your tech stack.

The Problem with Manual QA Test Creation 😫

Let's face reality: writing test cases is often a Sisyphean task. Here's why you probably hate it (and why your boss should care):

Time‑consuming and repetitive ⏳ – For every function you write, you need to craft happy paths, edge cases, error handling, and integration hooks. Multiply that across a growing codebase and you've got weeks of effort.
Inconsistent coverage 📉 – Different QA engineers have different standards. One might miss boundary values, another might forget security scenarios. Maintaining uniform coverage across teams is nearly impossible.
Scalability nightmare 📈 – As microservices proliferate, keeping test suites up to date becomes a full‑time job. Any sprint that adds features must also extend tests, leading to technical debt or shortcuts.
Blind spots 🙈 – Humans naturally gravitate toward the familiar (unit tests) and neglect less obvious but critical areas: fuzzing, mutation testing, accessibility, localization, performance under load, and compatibility across browsers/OSes.
Bottleneck for releases 🚧 – QA is often the gatekeeper. If test writing lags, releases slip. Companies either ship with insufficient tests or delay features.
Audit & compliance headaches 📋 – Auditors demand evidence of structured testing, ITGC controls, and risk‑based test plans. Manually assembling this documentation is error‑prone and time‑intensive.

Meet the QA Architecture Auditor (The Skill that Saves Weeks) 🛠️

The QA Architecture Auditor is an OpenClaw skill that does exactly that.

It's a Python‑based CLI tool (qa-audit) that you can invoke directly or via slash command in OpenClaw. It performs deep static analysis and generates an HTML or Markdown report that serves as a complete QA strategy. ⚡

Core capabilities

Forensic codebase analysis 🔍 – Detects languages, frameworks, architecture pattern (monolith, microservices, serverless, etc.), dependencies, modules, cyclomatic complexity, and more.
Risk assessment ⚠️ – Scores each module from 0‑100 based on complexity, external calls, authentication handling, data persistence, cryptography, file I/O, coupling, and public API surface. High‑risk modules surface for prioritized testing.
Security surface mapping 🛡️ – Identifies modules that touch authentication, authorization, input validation, output encoding, session management, cryptography, file ops, network ops, and database ops.
Entry point discovery 📍 – Finds main, app.py, manage.py, index.js, etc., to focus end‑to‑end and smoke tests.
Data flow mapping 🔄 – Traces imports/dependencies to expose integration points.
ITGC controls ✅ – Generates a tailored checklist of IT General Controls compliance items (change management, access control, testing requirements, security scanning, code signing, deployment gates, etc.) based on your tech stack.
Report generation 📊 – Produces a beautifully formatted HTML or Markdown report crammed with actionable insights.
Zero‑trust policy 🚫 – The skill ignores any existing tests. It assumes you're starting from zero and designs everything accordingly. This is crucial for audits and for turning around neglected codebases.

All of this runs locally; your code never leaves your machine unless a remote URL is provided, in which case only a standard git clone occurs.

What Makes This OpenClaw Skill Unique? 💎

20+ Testing Methodologies Covered

The report includes dedicated sections for each major testing approach, complete with an independent baseline definition, risk assessment, strategy, and from‑scratch test examples:

Core execution: Black Box, White Box, Manual, Automated
Functional & structural: Unit, Integration, System, Functional, Smoke, Sanity, E2E, Regression, API, Database Integrity
Non‑functional: Performance, Security, Usability, Compatibility, Accessibility, Localization
Specialized: Acceptance (UAT), Exploratory, Boundary Value Analysis, Monkey/Random Testing, Fuzz Testing, Mutation Testing, Non‑Functional General

That's not just a list — each section contains test cases written in the language of your stack (Python, JavaScript, Java, Go, etc.) showing exactly how to validate those dimensions. 🧑‍💻

Zero‑Trust Baseline

Risk‑Based Prioritization

The skill assigns a risk score to each module, combining complexity and security factors. The highest‑scoring modules get explicit attention in the risk assessment table, and the methodology recommendations are tailored accordingly. This tells you exactly where to focus your effort first. 🎯

How QA Engineers Transform, Not Disappear 👨‍🔬

Will this skill make QA testers redundant? Not at all — it elevates them.

The skill produces raw test strategies; it doesn't execute tests or integrate with CI automatically (though that could be a next step). QA engineers become QA architects who:

Review the generated strategy for business‑logic nuances.
Refine risk scores based on domain knowledge.
Implement the suggested test skeletons, filling in domain‑specific data and assertions.
Integrate the tests into CI/CD pipelines.
Triage and investigate failures discovered by the new tests. 🕵️

The time saved from manual test authoring can be redirected toward higher‑value activities: exploratory testing, usability studies, performance tuning, and security hardening. In other words, the boring part gets automated, and the creative, investigative work remains human‑centric.

A Real‑World Walkthrough 🚶‍♂️

Let's see the skill in action on a tiny Flask API sample:

qa-audit --repo ./flask-demo --output report.html --format html

Scrolling to the Testing Methodology Matrix, we find:

API: specific suggestions like "test all routes with method overrides, validate status codes, schemas, auth headers, error handling".
Security: OWASP Top 10 validation checklist with code snippets for SQL injection, XSS, authentication bypass.
Performance: load test script using locust that simulates 1000 users hitting the login endpoint with a 2‑second SLA. 🏎️

In short, you get a ready‑to‑implement test plan that would otherwise take weeks of manual effort.

Sample Report Excerpts 📝

To give you a taste of what the report looks like, here's a trimmed excerpt from the Risk Assessment table:

Severity	Risk Type	Module	Risk Score	Description
CRITICAL	security	auth/login.py	85	Authentication handling detected — requires rigorous security testing
HIGH	code_complexity	services/order.py	72	High complexity module with many branches — needs path coverage
MEDIUM	dependency	requirements.txt	60	Unpinned dependencies detected

And from the Testing Methodology Matrix, the Fuzz Testing section:

Independent Baseline: Feed malformed, unexpected, or extreme data to the system to expose vulnerabilities like buffer overflows or injection flaws.
Vulnerability & Risk Assessment: Fuzz testing needed for any input parsing modules. Focus on 12 modules that handle user‑supplied data.

These concrete examples show you how to jump straight into implementation without guessing.

Why Not Just Use SonarQube? 🤔

You might wonder: "Can't we already do this with SonarQube or OWASP ZAP?" Those tools address specific facets — static analysis, dependency checks, dynamic scanning.

They don't produce a holistic testing strategy that spans unit, integration, security, performance, accessibility, compliance, and the more exotic methodologies like mutation and fuzz testing. Nor do they provide the from‑scratch test cases ready for adaptation.

The QA Architecture Auditor is the missing link between static analysis and actual test implementation.

How to Get Started 🏁

Ready to try it out? Here's how to install and run the skill:

Installation from ClawHub 🚀

If you are already using OpenClaw, the easiest way to get started is via ClawHub:

clawhub install qa-architecture-auditor

Manual install from GitHub

If you prefer the old-school way or want to hack on the source code:

git clone https://github.com/shifulegend/qa-architecture-auditor.git \
  ~/.openclaw/workspace/skills/qa-architecture-auditor

Running the skill

Use the slash command in your OpenClaw chat or call the CLI directly:

/qa-audit --repo /path/to/your/project --format html --output qa-report.html

The Bigger Picture: AI‑Driven QA Strategies 🌐

The QA Architecture Auditor is more than a one‑off tool; it's a glimpse into the future of AI‑augmented software engineering. Imagine:

Continuous auditing: The skill runs on every push, updating the risk assessment.
CI/CD integration: Auto‑generate test stubs for new code. 🔄
Compliance as code: The ITGC controls become part of your compliance documentation.

All of these are natural extensions that the open‑source community could build. The skill is published under the MIT license and welcomes contributions.

Conclusion 🏁

For QA engineers, it's not replacement — it's an elevation to QA architect. For teams, it's a shortcut to robust, audit‑ready testing.

Give it a try on your next codebase. You might just find that your QA workload becomes not only manageable but also more strategic and impactful. 💖

Call to Action 📢

Install the skill from ClawHub today.
Run it on a project you care about and explore the report.
Contribute: Have an idea for a new methodology? Open an issue or PR on the GitHub repo.
Share: Forward this article to your QA team and let them try it out!

Let's make testing smarter, faster, and more comprehensive — together.

Published on DEV.to • 10 min read

The OpenClaw Heartbeat Trap: How a Simple Health Check Cost Me 300+ LLM Calls Per Day

Shifu — Sun, 01 Mar 2026 15:51:21 +0000

🧵 I thought my AI agent was just casually checking system health. Instead, it was running a full-blown medical drama every 55 minutes—and racking up massive token usage behind my back. 🎬

💸 The Fear of the Runaway API Bill

If you're building autonomous AI agents with frameworks like OpenClaw, LangChain, or AutoGPT, you already know the existential dread of waking up to a massive API billing alert.

When we give an LLM the ability to autonomously call tools in a loop to "achieve a goal," we hand over the keys to our wallets.

This week, my AI assistant—running on OpenClaw using Google's Gemini models—started throwing 429 RESOURCE_EXHAUSTED errors. At first, I was just annoyed by the rate limits. But when I looked at the dashboard, my annoyance turned to panic.

The daily quota of 1,500 requests was seemingly exhausted.

The terrifying part? I hadn't even talked to the agent all day.

The only automated task running? A "simple" system health heartbeat set to trigger every 55 minutes. That's just ~26 pings a day. Where were all these hundreds of requests coming from? I needed to know exactly where those tokens were flying off to.

🕵️ The Investigation: Digging Through the JSON Logs

My first assumption was a configuration error—maybe the heartbeat frequency was accidentally set to 5 minutes instead of 55? I checked my openclaw.json config file. Nope, strictly set to "every": "55m".

So, I brought out the heavy machinery: the raw agent logs.

I downloaded the 5MB openclaw.log file from the server. OpenClaw logs everything in structured JSON, which is great for machines but terrible for human eyes. Staring at raw JSON wasn't going to cut it, so I wrote two custom Node.js parser scripts (extract_events.js and trace_sessions.js) to reconstruct the crime scene.

Here is what the scripts did:

Regex-matched every embedded run start and embedded run done to capture the LLM execution times.
Grouped every event by sessionId to track long-running conversations.
Extracted every single tool invocation (exec, read_file, web_search) attached to those runs.

When the scripts spit out the final timeline, my jaw dropped. 😲

What I found was a textbook case of uncontrollable LLM tool looping—the silent killer of API budgets. 🌪️

🔪 The Smoking Gun: The System Health Definition

My agent is designed to run autonomously. Every 55 minutes, a cron job wakes it up and tells it to read a file called HEARTBEAT.md.

Here was the fateful instruction inside that file:

"System Health Check: Monitor for stalled interactive processes and kill them. Check memory usage (free -h)."

To a human sysadmin, this is a 10-second task. You run ps aux, maybe free -h, and you're done.

But to a deterministic, stateless LLM agent using a tool-chain architecture? It's a multi-round forensic team. 🕵️

Here is the exact timeline of a single 55-minute heartbeat check my script extracted:

Time	Action	What the LLM was doing
`07:51:55`	🛠️ Tool: `exec`	Ran `ps aux` to list all processes
`07:52:15`	🛠️ Tool: `exec`	Ran `grep` to filter the list
`07:56:49`	🛠️ Tool: `exec`	Checked a specific process
`07:56:54`	🛠️ Tool: `exec`	Checked memory with `free -h`
`07:57:02`	🌐 Tool: `web_search`	Looked something up on the internet!?
`07:57:24`	🛠️ Tool: `exec`	Checked disk space (`df -h`)
`07:58:10`	🛠️ Tool: `exec`	Final cleanup/verification
`07:58:12`	✅ Done	Summarized findings

Total duration: 6.2 minutes.
Total tool calls: 12.

❄️ The Context Snowball Effect (How the tokens multiply)

Here is the critical architectural quirk I had overlooked (and why so many AutoGPT users end up with massive API bills): In an LLM tool-calling loop, every single tool execution is a brand new API request.

When the agent ran ps aux, it fetched the result. To decide what to do next, it had to send the entire conversation history (including the massive ps aux output) back to the LLM. Then it decided to run free -h. It executed it, got the result, and sent the history back again.

With each step, the context ballooned.

Instead of 26 lightweight pings a day, my "simple" health check was generating 300+ massive LLM round-trips daily, each with a larger context window than the last. 🏔️

My agent was silently burning through hundreds of thousands of tokens every single day just to check if the server was okay.

⛈️ The Retry Storm

This aggressive tool usage is also what caused the rate limits. When the agent hit its 12-tool streak in 6 minutes, it bumped into Google's per-minute quota (~15 requests/min).

When the API returned a 429 Rate Limit error, OpenClaw (as designed) initiated an exponential backoff retry. But during those retry windows, other scheduled checks queued up.

At exactly 11:15 UTC, the dam broke. The logs showed 12 API requests firing in 40 seconds as the system panic-retried a backlog of tool calls.

I wasn't being rate-limited because of daily usage. I was being rate-limited because my agent was behaving like an over-caffeinated sysadmin slamming the terminal with 12 commands a minute. ☕💥

🛠️ The Fix: Taking the Keys Away

When building autonomous agents, it's tempting to give the LLM control over everything. Why write a bash script when the AI can just figure it out dynamically?

This incident is exactly why. Some tasks don't need "reasoning." They just need execution.

The Solution:

I opened HEARTBEAT.md and completely deleted the actionable instructions. I left it as a comment-only file so the LLM wakes up, sees nothing to do, and goes immediately back to sleep (1 API call instead of 12).
I moved the actual system monitoring to a dumb, reliable cron bash script:

#!/bin/bash
AVAILABLE=$(free -m | awk '/Mem:/ {print $7}')
if [ "$AVAILABLE" -lt 200 ]; then
  echo "[$(date)] LOW MEMORY: ${AVAILABLE}MB" >> /tmp/health_alerts.log
fi

Now, a traditional cron job runs every 55 minutes, takes 0.1 seconds, costs 0 API tokens, and logs any issues to a file. The LLM only gets involved if a human explicitly asks to read that file.

🧠 The Takeaway for Agent Builders

If you are building LLM agents with access to real tools (exec, browser, search), remember:

Every tool call is a full LLM round-trip. A 5-step thought process is 5 API calls. Set hard caps (max_iterations) on your agent loops to prevent them from digging a bottomless pit in your wallet.
Never give an LLM a monitoring job a config or bash script can do. Reserve the expensive AI reasoning for when things actually break and need diagnosing, not for the routine patrol.
Log your tool chains. If I hadn't built custom JS scripts to trace the session IDs and see exactly which tools were being called in sequence, I would have had no idea my agent was hallucinating 12-step system audits in the background.

📚 Diagnostic Playbook: Fixing "Unknown Model" and Configured,Missing

If you're hitting configured,missing or Unknown model in OpenClaw, here's the exact playbook I used:

Step 1: Check if you have an agent-level `models.json`

ls -la ~/.openclaw/agents/main/agent/models.json 2>&1

If this file exists and you're only using standard providers (OpenRouter, Google, Anthropic, OpenAI), this file is probably unnecessary and might be shadowing the built-in registry.

Step 2: Check what's in it

cat ~/.openclaw/agents/main/agent/models.json | python3 -m json.tool | grep -E '"id"'

If you see a provider name that matches a built-in provider (openrouter, google, anthropic, etc.), that block is overriding the built-in model catalog. Only models explicitly listed will be recognized.

Step 3: Try disabling it (with backup)

# Backup first!
cp ~/.openclaw/agents/main/agent/models.json \\
   ~/.openclaw/agents/main/agent/models.json.bak.$(date +%Y%m%d-%H%M%S)

# Rename to disable
mv ~/.openclaw/agents/main/agent/models.json \\
   ~/.openclaw/agents/main/agent/models.json.disabled

# Restart
systemctl --user restart openclaw-gateway

# Check
openclaw models list

If all models now show configured — the file was the problem. Delete it permanently (or keep the .disabled backup just in case).

Step 4: If you DO need custom providers

If you have truly custom providers (not built-in), such as:

Nvidia API (integrate.api.nvidia.com)
Custom self-hosted endpoints
Non-standard API providers

Then you need models.json, but be very careful:

Don't use provider names that match built-in providers (e.g., use openrouter-custom instead of openrouter)
Only define the custom providers, let the built-in registry handle the standard ones

Quick diagnostic cheat sheet

Symptom	Likely cause	Fix
`configured,missing`	Custom `models.json` is shadowing built-in registry	Rename/remove `models.json`
`Unknown model` in logs	Same as above	Same as above
`401 Unauthorized`	API key missing from `.env`	Check `.env` (and never use `>`!)
Model works via `curl` but not OpenClaw	Provider block in `models.json` doesn't list the model	Remove the shadowing provider block
`models scan` doesn't find a model	Model doesn't support tool-calling	Add manually via `openclaw models set`

🎯 The Takeaway

Infrastructure debugging is archaeology.

You're not fixing bugs — you're reconstructing what a system looked like at a moment when it worked, and comparing it to the moment it stopped.

The difference is usually:

✏️ One character (> vs >>)
📄 One file that's shadowing a built-in registry
🤖 One good-faith change by an AI agent that had unintended side effects

And the real fix isn't always adding what's missing — sometimes it's removing what shouldn't be there.

If you've ever stared at configured,missing and felt your sanity slipping — now you know exactly where to look. 🦞

TL;DR

My OpenClaw agent's 55-minute heartbeat check was running 12 shell commands per cycle, costing 300+ daily LLM calls. The root cause? A models.json file with an openrouter provider block that shadowed the built-in catalog. The fix: remove the unnecessary file, use cron + bash for monitoring, and let the built-in registry handle standard providers.

Has your AI agent ever surprised you with a massive API bill? Share your horror stories below! 👇

OpenClaw's "Unknown Model" Error — How One Missing JSON Entry Broke My AI Assistant for 4 Hours

Shifu — Sun, 01 Mar 2026 08:19:09 +0000

🧵 I chased a phantom through two config files, three API keys, and 47 SSH sessions. The initial "fix" was one line of JSON. The real fix? Deleting the file entirely.

🤖 What's OpenClaw?

Before I dive in — if you haven't heard of OpenClaw, it's an open-source AI agent framework that lets you run persistent AI assistants on your own server. Think of it as your self-hosted ChatGPT, but with memory, personality, tools, scheduled tasks, and multi-channel support (Telegram, Discord, WhatsApp, TUI, etc.).

You configure which LLM models power your agents — GPT-4, Gemini, Claude, or any model via OpenRouter — and OpenClaw handles the orchestration: routing messages, managing sessions, executing tools, and maintaining long-term memory across conversations.

I run my personal AI assistant (Elara) on an AWS EC2 instance using OpenClaw. The model I'd been using for weeks: stepfun/step-3.5-flash:free\ via OpenRouter — a solid, free, 250K-context model that worked beautifully.

Until one Saturday morning, when it just… stopped.

🔇 The Silence

I opened my OpenClaw TUI (the terminal-based chat interface) and typed Hello\:

\`
🦞 OpenClaw 2026.2.2-3 — Think different. Actually think.

openclaw tui - ws://127.0.0.1:18789 - agent main - session main
connecting | idle
`\

The spinner appeared — ⠴ kerfuffling…\ — and just kept going. And going. And going.

No error. No timeout message. No response. Just an infinite spinner and silence.

🕵️ Act I: The Obvious Suspects

Checking the gateway logs

First instinct: check the logs. OpenClaw writes daily log files to /tmp/openclaw/\:

\bash cat /tmp/openclaw/openclaw-2026-03-01.log | grep -i "error" | tail -5 \\

And there it was:

\json { "error": "Error: Unknown model: openrouter/stepfun/step-3.5-flash:free", "lane": "main", "durationMs": 55 } \\

"Unknown model." But… that model was in my config. I'd been using it for weeks. How could OpenClaw suddenly not recognize it?

The mysterious `configured,missing\` status

OpenClaw has a CLI command to list all configured models:

\bash $ openclaw models list \\

\Model Input Context Auth Tags openrouter/stepfun/step-3.5-flash:free text 250k yes configured,missing google/gemini-2.0-flash text 1000k yes configured google/gemini-3-flash-preview text 1024k yes configured \\

There it is: configured,missing\. 🤨

I'd never seen this status before. In OpenClaw:

configured\ = the model is listed in your config and the runtime can resolve it ✅
configured,missing\ = the model is listed in your config, but the runtime can't resolve it to a working provider endpoint ❌

The model exists on paper but is invisible at runtime. Like a ghost in the machine.

Trying the obvious fixes

\`bash

Re-register the model via CLI

$ openclaw models set openrouter/stepfun/step-3.5-flash:free
Updated successfully ✅

Restart the gateway

$ systemctl --user restart openclaw-gateway

Check again...

$ openclaw models list | grep stepfun
openrouter/stepfun/step-3.5-flash:free text 250k yes configured,missing
`\

Still configured,missing\. 😤 The models set\ command updated the global config, but the runtime still couldn't find the model. Something deeper was wrong.

Trying a model scan

OpenClaw can scan your providers for available models:

\bash $ openclaw models scan --yes \\

It found Google models, Llama models, and others — but not stepfun. The scan only picks up models that advertise tool-calling support, and step-3.5-flash:free\ doesn't. Dead end.

💀 Act II: The `>\` That Ate My API Key

While investigating, I discovered something horrifying. Earlier that day, while configuring a new Google API key, a command had been run:

\bash echo "GOOGLE_API_KEY=AIzaSy..." > ~/.openclaw/.env \\

See that >\? That's not >>\.

⚠️ That single character — >\ instead of >>\ — overwrote the entire .env\ file, silently destroying the OPENROUTER_API_KEY\ that had been there for a month.

No error. No warning. Just gone.

I found the original key buried deep in .bash_history\ and restored it:

\`bash

Found the original onboarding command in history

$ history | grep openrouter
openclaw onboard --auth-choice apiKey --token-provider openrouter --token "sk-or-v1-..."

Restored it (with >> this time!)

$ echo 'OPENROUTER_API_KEY=sk-or-v1-...' >> ~/.openclaw/.env
`\

Direct API test

To verify the key was valid, I bypassed OpenClaw entirely:

\bash $ curl -s https://openrouter.ai/api/v1/chat/completions \\ -H "Authorization: Bearer sk-or-v1-..." \\ -H "Content-Type: application/json" \\ -d '{ "model": "stepfun/step-3.5-flash:free", "messages": [{"role": "user", "content": "Say hello"}] }' \\

\json { "choices": [{ "message": { "content": "Hello! How can I help you today?" } }] } \\

The API worked perfectly. 🎉 Key valid. OpenRouter up. Model alive and responding.

But OpenClaw still said "Unknown model." 💀

The API worked. The config had the model. The key was valid. But OpenClaw couldn't see it. This is the moment I realized the problem was deeper than a missing key or a typo.

🔬 Act III: The Two-Layer Architecture

I went full forensics. I downloaded everything from the server:

📄 28 backup config files spanning a month
📊 12MB of gateway logs (4 days)
🧠 Memory files, soul files, identity files — the AI assistant's persistent state
📝 Configuration change reports — auto-generated docs from previous changes

And after two hours of diffing JSON files, I found the problem.

OpenClaw resolves models through TWO config layers

Most documentation focuses on the global config file. But OpenClaw actually has two layers of model configuration:

Layer	File	Purpose
Layer 1	`~/.openclaw/openclaw.json\`	Global config — model names, aliases, fallbacks, per-agent assignments
Layer 2	`~/.openclaw/agents/<id>/agent/models.json\`	Provider definitions — maps provider names → base URLs, API keys, explicit model schemas

The critical behavior: When Layer 2 defines a provider (like openrouter\), its model definitions shadow (override) the built-in registry for that provider. Only models explicitly listed in that provider's models[]\ array will be recognized.

My stepfun model was in Layer 1 ✅ but not in Layer 2 ❌.

🕰️ Act IV: Where Did This File Come From?

Here's the part that makes this story truly interesting. I diffed the backup files to reconstruct exactly how models.json\ evolved:

Stage 1: The innocent beginning (early February)

My AI assistant (Elara) needed to connect to a custom model (dolphin-mistral\ via OpenRouter) that wasn't in OpenClaw's built-in registry. So she created models.json\ with a custom provider called openrouter-custom\:

\json { "providers": { "openrouter-custom": { "baseUrl": "https://openrouter.ai/api/v1", "apiKey": "sk-or-v1-...", "models": [ { "id": "cognitivecomputations/dolphin-mistral-24b-venice-edition:free" } ] }, "google": { "models": [{ "id": "gemini-3-pro-preview" }] } } } \\

File size: 1.3KB. Two providers, two models. Harmless.

At this point, stepfun/step-3.5-flash:free\ was still working perfectly — resolved through OpenClaw's built-in OpenRouter registry, no models.json\ entry needed. The provider name openrouter-custom\ was smart — it's a custom name that doesn't clash with the built-in openrouter\ provider.

Stage 2: Adding Nvidia models (February 22)

I asked Elara to configure Kimi K2.5 via Nvidia's API. She added a new nvidia-custom\ provider to models.json\:

\json "nvidia-custom": { "baseUrl": "https://integrate.api.nvidia.com/v1", "apiKey": "nvapi-...", "models": [ { "id": "moonshotai/kimi-k2.5" }, { "id": "deepseek-ai/deepseek-v3.2" }, { "id": "mistralai/mistral-large-3-675b-instruct-2512" } // ... 8 models total ] } \\

File size grew to 4.7KB. Three providers, 11 models. Still harmless — nvidia-custom\ is a truly custom provider that doesn't shadow any built-in. Stepfun still worked fine.

Stage 3: The fatal addition (late February)

At some point between Feb 22 and Mar 1, during a configuration session where I asked Elara to add Google models via OpenRouter, a new provider block was added to models.json\:

\json "openrouter": { "baseUrl": "https://openrouter.ai/api/v1", "apiKey": "sk-or-v1-...", "models": [ { "id": "google/gemini-2.0-flash-001" }, { "id": "google/gemini-2.5-flash" }, { "id": "google/gemini-2.5-pro" } // ... 13 Google models total via OpenRouter // // But where's stepfun? // 🦗 *crickets* 🦗 ] } \\

File size ballooned to 11KB. And this single block was the killer.

Why did this break everything? Because unlike openrouter-custom\ in Stage 1, this provider was named just openrouter\ — which exactly matches OpenClaw's built-in OpenRouter provider name. Per OpenClaw's merge rules, when models.json\ defines a provider, non-empty values take precedence over the built-in registry. The explicit openrouter\ block with only 13 Google models completely replaced the built-in OpenRouter model catalog — which previously included hundreds of models, stepfun among them.

Stepfun was never added to this custom openrouter\ block because it was already working through the built-in registry. Nobody knew they needed to add it. The built-in registry was handling it silently. But the moment the custom openrouter\ block appeared, it overwrote that silent handling, and stepfun became invisible.

💡 Analogy: Imagine your phone contacts are stored in iCloud. One day, a friend sets up a "Google Contacts" sync for you with only work contacts. Your phone switches to Google as the primary source and suddenly all your personal contacts vanish — they're still in iCloud, but it's no longer being consulted.

✅ The Fix: Two Approaches, One Revelation

🔧 The initial fix: Patching the symptom

Having identified that the openrouter\ provider block in models.json\ was missing stepfun, my first instinct was to add the missing model definition. This felt like the right approach — the file exists, it lists models, my model isn't in the list, so add it.

Step 1: Understanding the required schema

Each model in the provider's models[]\ array needs a specific structure. You can't just add the model name — you need the full definition. I found the schema by looking at existing entries in the file:

\json // Every model in models.json needs these fields: { "id": "...", // Model slug (from the provider) "name": "...", // Human-readable display name "reasoning": false, // Does it support chain-of-thought? "input": ["text"], // Input types: "text", "image", etc. "cost": { // Per-token pricing "input": 0, "output": 0, "cacheRead": 0, "cacheWrite": 0 }, "contextWindow": ..., // Max input tokens "maxTokens": ... // Max output tokens } \\

Step 2: Finding the right values for stepfun

I checked the OpenRouter model page for stepfun/step-3.5-flash:free\ to get the specs:

Context window: 250,000 tokens
Max output: 8,192 tokens
Input: text only (no image support)
Cost: free (0\ for all price fields)
Reasoning: no

Step 3: Writing a Node.js script to safely modify the JSON

I didn't want to hand-edit an 11KB JSON file through SSH — one misplaced comma and the whole config breaks. So I wrote a script:

\`javascript
const fs = require('fs');
const path = process.env.HOME + '/.openclaw/agents/main/agent/models.json';
const config = JSON.parse(fs.readFileSync(path));

const newModel = {
id: 'stepfun/step-3.5-flash:free',
name: 'Step 3.5 Flash (Free)',
reasoning: false,
input: ['text'],
cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 },
contextWindow: 250000,
maxTokens: 8192
};

// Check if it already exists
const exists = config.providers.openrouter.models.some(
m => m.id === newModel.id
);

if (!exists) {
config.providers.openrouter.models.push(newModel);
fs.writeFileSync(path, JSON.stringify(config, null, 2));
console.log('✅ Added stepfun to openrouter provider');
} else {
console.log('Model already exists');
}
`\

Step 4: Apply and verify

\`bash

Run the script

$ node add_stepfun.js
✅ Added stepfun to openrouter provider

Restart the gateway to load the new config

$ systemctl --user restart openclaw-gateway

Wait for startup

$ sleep 5

Check status

$ openclaw models list | grep stepfun
openrouter/stepfun/step-3.5-flash:free text 250k yes configured ✅

Test in TUI

$ openclaw tui --message "Hello? Are you there?"
🌸 Hello! I'm here and ready to help!
agent main | openrouter/stepfun/step-3.5-flash:free | tokens 54k/250k (22%)
`\

It worked! 🎉 The model was back. Status changed from configured,missing\ to configured\.

But something nagged at me.

🤔 The nagging question

I stared at models.json\ — now 11.3KB — and asked myself: why does this file need to exist at all?

OpenClaw has a built-in model registry. It already knows about every OpenRouter model, every Google model, every Anthropic model. That's how stepfun was working for weeks — through the built-in registry, with no models.json\ needed.

The only reason models.json\ existed was for truly custom providers like nvidia-custom\ (an Nvidia API endpoint that OpenClaw doesn't know about natively) and openrouter-custom\ (a non-standard name for testing). Those make sense.

But the openrouter\ block? That was just a duplicate of something OpenClaw already knows. Worse — it was an incomplete duplicate that was shadowing the complete built-in version.

What if I just… removed the file?

🎯 The real fix: Removing what shouldn't be there

Step 1: Back up the file (I'd learned my lesson about backups by this point):

\bash $ cp ~/.openclaw/agents/main/agent/models.json \\ ~/.openclaw/agents/main/agent/models.json.backup.$(date +%Y%m%d-%H%M%S) echo "Backup saved. Restore with:" echo " cp models.json.backup.TIMESTAMP models.json" echo " systemctl --user restart openclaw-gateway" \\

Step 2: Disable models.json\ by renaming it (safer than deleting — I can reverse this instantly):

\bash $ mv ~/.openclaw/agents/main/agent/models.json \\ ~/.openclaw/agents/main/agent/models.json.disabled \\

Step 3: Restart the gateway:

\bash $ systemctl --user restart openclaw-gateway $ sleep 5 \\

Step 4: Check if the gateway starts without errors:

\`bash
$ journalctl --user -u openclaw-gateway -n 20 --no-pager | grep -i error

(no output — no errors!) ✅

`\

Step 5: Check ALL models:

\bash $ openclaw models list \\

\Model Input Ctx Auth Tags google/gemini-3-flash-preview text+image 1024k yes configured ✅ google/gemini-1.5-flash text+image 977k yes configured ✅ google/gemini-1.5-pro text+image 977k yes configured ✅ google/gemini-2.0-flash text+image 1024k yes configured ✅ google/gemini-2.5-flash text+image 1024k yes configured ✅ google/gemini-2.5-pro text+image 1024k yes configured ✅ google/gemini-3-pro-preview text+image 977k yes configured ✅ openrouter/stepfun/step-3.5-flash:free text 250k yes configured ✅ openrouter/meta-llama/llama-3.3-70b-ins... text 128k yes configured ✅ \\

Every. Single. Model. configured\. Not a single missing\. ✅

Step 6: Test the models in TUI:

\bash $ openclaw tui --message "Hello! Which model are you?" \\

\Hello! 🌸 I'm Elara, running on openrouter/stepfun/step-3.5-flash:free. agent main | openrouter/stepfun/step-3.5-flash:free | tokens 54k/250k (22%) \\

I verified the Google models too by checking the gateway logs:

\bash $ tail -20 /tmp/openclaw/openclaw-*.log | grep "embedded run done" \\

\lane=session:agent:main:test-google durationMs=16949 active=0 queued=0 \\

Google model completed a run in 16.9 seconds. No errors. ✅

Step 7: Confirm models.json\ was NOT regenerated:

\`bash
$ ls ~/.openclaw/agents/main/agent/models.json 2>&1

"No such file or directory" — it was NOT regenerated ✅

`\

This appeared to confirm that OpenClaw does not auto-regenerate models.json\. When the file doesn't exist, the gateway falls back entirely to its built-in registry.

⚠️ March 2026 Update: Further testing revealed this is not always true. On newer OpenClaw versions (2026.2.2+), models.json\ is regenerated from models.providers\ in openclaw.json\ on gateway restart and openclaw doctor\ runs. The proper permanent fix is to manage model entries via models.providers\ in the main config — not by deleting the agent-level models.json\. See the official docs for details.

📊 Comparing the two fixes

	Initial Fix	Real Fix
What	Added stepfun to `models.json\`	Removed `models.json\` entirely
Effort	Write a script, figure out the schema, find the right values	One `mv\` command
Models fixed	Only stepfun	All current + all future models
Future risk	Every new OpenRouter model needs manual addition	No maintenance needed
Root cause	Patched → still shadowing	Eliminated the shadow

The initial fix treated the symptom. The real fix treated the disease — but only temporarily (see update above).

⚠️ The best permanent fix is to manage custom providers through models.providers\ in openclaw.json\. Use a custom provider name (like openrouter-custom\) for models not in the built-in catalog, and let the built-in provider handle everything else.

🛠️ ## How to Check the Built-in Catalog

Before creating custom providers, check whether your model is already in OpenClaw's built-in catalog. If it is, you don't need models.json or models.providers at all - just add it to the allowlist.

# List ALL models in the built-in catalog for a provider
$ openclaw models list --all --provider openrouter

# Check if a specific model exists
$ openclaw models list --all --provider openrouter | grep dolphin
# No results = model is NOT built-in = needs openrouter-custom

$ openclaw models list --all --provider openrouter | grep stepfun
openrouter/stepfun/step-3.5-flash:free     text   250k   yes
# Found = model IS built-in = just add to allowlist, no custom provider needed

$ openclaw models list --all --provider google
# Shows all built-in Google models

Rule of thumb: If openclaw models list --all --provider <name> shows your model, just add it to agents.defaults.models in openclaw.json. If it doesn't show up, you need a custom provider block in models.providers (use a name like openrouter-custom to avoid shadowing the built-in).

At time of writing, the built-in OpenRouter catalog has 230+ models - including every major provider (OpenAI, Anthropic, Google, Meta, Mistral, DeepSeek, Qwen, etc.) but not community/niche models like cognitivecomputations/dolphin-mistral*.

404: No Endpoints Found That Support Tool Use

If you set a Dolphin (or other community) model as your primary and see:

404 No endpoints found that support tool use

This means the model does not support function calling/tools, and OpenRouter has no endpoint to handle a request that includes tool definitions.

Why it happens: OpenClaw sends tool definitions (web search, exec, etc.) with every request. If the model does not support tools, OpenRouter rejects with 404.

The fix: Add params.tools: false in the model's allowlist entry in openclaw.json:

"openrouter-custom/cognitivecomputations/dolphin-mistral-24b-venice-edition:free": {
  "alias": "dolphin",
  "params": {
    "tools": false
  }
}

Note: Even with tools: false, free-tier models may still get 429 rate-limited. Configure fallbacks to ensure graceful failover:
"model": {
  "primary": "openrouter-custom/.../dolphin-mistral:free",
  "fallbacks": ["openrouter/stepfun/step-3.5-flash:free", "google/gemini-3-flash-preview"]
}

You can check if your model supports tools via the OpenRouter API:

curl -s https://openrouter.ai/api/v1/models | python3 -c "
import json, sys
for m in json.load(sys.stdin)['data']:
    if 'dolphin' in m['id']:
        print(m['id'], 'tools:', 'tools' in m.get('supported_parameters', []))
"

How to Fix This Yourself

If you're hitting Unknown model\ or configured,missing\ in OpenClaw, here's the diagnostic playbook:

Step 1: Check if you have an agent-level `models.json\`

\bash ls -la ~/.openclaw/agents/main/agent/models.json 2>&1 \\

If this file exists and you're only using standard providers (OpenRouter, Google, Anthropic, OpenAI), this file is probably unnecessary and might be shadowing the built-in registry.

Step 2: Check what's in it

\bash cat ~/.openclaw/agents/main/agent/models.json | python3 -m json.tool | grep -E '"id"' \\

If you see a provider name that matches a built-in provider (openrouter\, google\, anthropic\, etc.), that block is overriding the built-in model catalog. Only models explicitly listed will be recognized.

Step 3: Try disabling it

\`bash

Backup first!

cp ~/.openclaw/agents/main/agent/models.json \
~/.openclaw/agents/main/agent/models.json.bak.$(date +%Y%m%d-%H%M%S)

Rename to disable

mv ~/.openclaw/agents/main/agent/models.json \
~/.openclaw/agents/main/agent/models.json.disabled

Restart

systemctl --user restart openclaw-gateway

Check

openclaw models list
`\

If all models now show configured\ — the file was the problem. Delete it permanently (or keep the .disabled\ backup just in case).

Step 4: If you DO need custom providers

If you have truly custom providers (not built-in), such as:

Nvidia API (integrate.api.nvidia.com\)
Custom self-hosted endpoints
Non-standard API providers

Then you need models.json\, but be very careful:

Don't use provider names that match built-in providers (e.g., use openrouter-custom\ instead of openrouter\)
Only define the custom providers, let the built-in registry handle the standard ones

Quick diagnostic cheat sheet

Symptom	Likely cause	Fix
`configured,missing\`	Custom `models.json\` is shadowing built-in registry	Rename/remove `models.json\`
`Unknown model\` in logs	Same as above	Same as above
`401 Unauthorized\`	API key missing from `.env\`	Check `.env\` (and never use `>\`!)
Model works via `curl\` but not OpenClaw	Provider block in `models.json\` doesn't list the model	Remove the shadowing provider block
`models scan\` doesn't find a model	Model doesn't support tool-calling	Add manually via `openclaw models set\`

📚 What I Learned

1️⃣ `>\` vs `>>\` can destroy your entire config

\bash echo "KEY=value" > .env # ❌ REPLACES the file — destroys everything else echo "KEY=value" >> .env # ✅ APPENDS to the file — safe \\

Always use >>\ when adding to environment files. Or better: use the app's CLI to manage keys.

2️⃣ "Unknown model" doesn't mean what you think

It doesn't mean you misspelled the model name. It means the runtime can't resolve the name to a provider endpoint — and that resolution path might go through a file you didn't know existed.

3️⃣ Custom config files can shadow built-in behavior

This is the core lesson. My AI assistant created models.json\ for a legitimate reason (custom Nvidia provider). But when it added an openrouter\ block to the same file, it accidentally replaced the entire built-in OpenRouter catalog with its 13-model subset. Everything not in that subset — including stepfun — became invisible.

💡 If your tool has a built-in registry, a custom config that matches its namespace will override it.

4️⃣ AI agents optimise for the task at hand

Elara added Google models when I asked for Google models. She didn't know that creating an openrouter\ provider block would shadow the built-in one and break stepfun. AI agents don't preserve context they weren't told about.

5️⃣ Backup everything, always 💾

I had 28 backup files spanning a month. They let me reconstruct the exact state of every config file at every point in time. I now run a daily cron job:

\`bash

2 AM UTC daily, 30-day retention

0 2 * * * ~/openclaw_daily_backup.sh >> /tmp/openclaw/backup.log 2>&1
`\

🎯 The Takeaway

Infrastructure debugging is archaeology.

You're not fixing bugs — you're reconstructing what a system looked like at a moment when it worked, and comparing it to the moment it stopped.

The difference is usually:

✏️ One character (>\ vs >>\)
📄 One file that's shadowing a built-in registry
🤖 One good-faith change by an AI agent that had unintended side effects

And the real fix isn't always adding what's missing — sometimes it's removing what shouldn't be there.

If you've ever stared at configured,missing\ and felt your sanity slipping — now you know exactly where to look. 🦞

Update: openrouter-custom Provider Removed (March 2026)

After further testing, we found that openrouter-custom models (community/niche models like Dolphin-Mistral) always fail with 404 No endpoints found that support tool use when used with OpenClaw agents. This happens because:

OpenClaw agents always include tool definitions in the API request body
Dolphin-Mistral has zero tool-supporting endpoints on OpenRouter
OpenClaw has no config option to suppress tool definitions at the API payload level for custom providers (tools.deny: ["*"] is agent-side only)

Final decision: Removed the openrouter-custom provider entirely. Created a dedicated dolphin agent bound to a separate Telegram bot, currently running on StepFun as primary — ready to switch to dolphin when OpenRouter adds tool-supporting endpoints for it.

Clean model setup that works (9/9 TUI tests passed):

{
  "agents": {
    "defaults": {
      "model": {
        "primary": "openrouter/stepfun/step-3.5-flash:free",
        "fallbacks": ["google/gemini-3-flash-preview"]
      }
    }
  }
}

Forem: Shifu

The End of Manual QA Writing? How an OpenClaw Skill Automates Testing Strategy

The End of Manual QA Writing? How an OpenClaw Skill Automates Testing Strategy

Introduction

The Problem with Manual QA Test Creation

Meet the QA Architecture Auditor

Core capabilities

What Makes This Skill Unique?

20+ Testing Methodologies Covered

Zero‑Trust Baseline

Risk‑Based Prioritization

Tailored Tooling Recommendations

All‑Local, No External AI

How QA Engineers Transform, Not Disappear

A Real‑World Walkthrough

Sample Report Excerpts

Why Not Just Use X?

How to Get Started

Installation from ClawHub (once published)

Manual install from GitHub

Running the skill

The Bigger Picture: AI‑Driven QA Strategies

Conclusion

Call to Action

How I Automated Python Documentation Using AST Parsing and Multi-Provider LLMs

1. The Problem with Regex-Based Documentation

2. Enter the Abstract Syntax Tree (AST)

3. Breaking Free from Vendor Lock-in: Multi-Provider Support

4. Templating the Output: Jinja2 for Premium Style

5. Security First: The "Zero-Trust" QA Audit

How to Get Started

The Roadmap

Let's Connect!

Why I stopped writing manual test cases: This OpenClaw skill does it for me 🤖✨

Introduction 🚀

The Problem with Manual QA Test Creation 😫

Meet the QA Architecture Auditor (The Skill that Saves Weeks) 🛠️

Core capabilities

What Makes This OpenClaw Skill Unique? 💎

20+ Testing Methodologies Covered

Zero‑Trust Baseline

Risk‑Based Prioritization

How QA Engineers Transform, Not Disappear 👨‍🔬

A Real‑World Walkthrough 🚶‍♂️

Sample Report Excerpts 📝

Why Not Just Use SonarQube? 🤔

How to Get Started 🏁

Installation from ClawHub 🚀

Manual install from GitHub

Running the skill

The Bigger Picture: AI‑Driven QA Strategies 🌐

Conclusion 🏁

Call to Action 📢

The OpenClaw Heartbeat Trap: How a Simple Health Check Cost Me 300+ LLM Calls Per Day

💸 The Fear of the Runaway API Bill

🕵️ The Investigation: Digging Through the JSON Logs

🔪 The Smoking Gun: The System Health Definition

❄️ The Context Snowball Effect (How the tokens multiply)

⛈️ The Retry Storm

🛠️ The Fix: Taking the Keys Away

🧠 The Takeaway for Agent Builders

📚 Diagnostic Playbook: Fixing "Unknown Model" and Configured,Missing

Step 1: Check if you have an agent-level models.json

Step 2: Check what's in it

Step 3: Try disabling it (with backup)

Step 4: If you DO need custom providers

Quick diagnostic cheat sheet

🎯 The Takeaway

TL;DR

OpenClaw's "Unknown Model" Error — How One Missing JSON Entry Broke My AI Assistant for 4 Hours

🤖 What's OpenClaw?

🔇 The Silence

🕵️ Act I: The Obvious Suspects

Checking the gateway logs

The mysterious configured,missing\ status

Trying the obvious fixes

Re-register the model via CLI

Restart the gateway

Check again...

Trying a model scan

Step 1: Check if you have an agent-level `models.json`

The mysterious `configured,missing\` status

💀 Act II: The `>\` That Ate My API Key

Step 1: Check if you have an agent-level `models.json\`

1️⃣ `>\` vs `>>\` can destroy your entire config