<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Conan</title>
    <description>The latest articles on Forem by Conan (@conanttu).</description>
    <link>https://forem.com/conanttu</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3950931%2Ffc3d4878-ce13-4c48-b50e-3098d282ef41.png</url>
      <title>Forem: Conan</title>
      <link>https://forem.com/conanttu</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/conanttu"/>
    <language>en</language>
    <item>
      <title>Introducing PlanCollab: AI-Powered Cross-Agent Code Planning &amp; Review</title>
      <dc:creator>Conan</dc:creator>
      <pubDate>Tue, 26 May 2026 15:29:32 +0000</pubDate>
      <link>https://forem.com/conanttu/introducing-plancollab-ai-powered-cross-agent-code-planning-review-597m</link>
      <guid>https://forem.com/conanttu/introducing-plancollab-ai-powered-cross-agent-code-planning-review-597m</guid>
      <description>&lt;p&gt;Ever wished you could get a second AI opinion on your implementation plans before writing code? &lt;strong&gt;PlanCollab&lt;/strong&gt; makes this possible by orchestrating collaborative planning sessions between Claude Code 🐶 and Codex 🐱.&lt;/p&gt;

&lt;h2&gt;
  
  
  🎯 What is PlanCollab?
&lt;/h2&gt;

&lt;p&gt;PlanCollab is an agent skill that enables &lt;strong&gt;adversarial collaboration&lt;/strong&gt; between two different AI model families. One agent creates an implementation plan, the other reviews it, and they iterate automatically until reaching consensus - or escalate disagreements to you for final decision.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why Two AIs Are Better Than One
&lt;/h3&gt;

&lt;p&gt;A single AI can have blind spots - it won't challenge its own decisions. By bringing in a second AI with a different model architecture, you get:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Alternative approaches&lt;/strong&gt; that weren't initially considered&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Missing steps&lt;/strong&gt; or incorrect dependency ordering caught early&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Risk assessment&lt;/strong&gt; from a fresh perspective&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Test coverage gaps&lt;/strong&gt; identified before coding&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The greater the difference between model families (Claude vs GPT), the more valuable the review becomes.&lt;/p&gt;

&lt;h2&gt;
  
  
  🚀 Quick Start
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Prerequisites
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;claude&lt;/code&gt; CLI (Claude Code)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;codex&lt;/code&gt; CLI (for cross-agent communication)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Basic Usage
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Start a collaborative planning session&lt;/span&gt;
/plancollab implement user authentication module

&lt;span class="c"&gt;# List all sessions&lt;/span&gt;
/plancollab list

&lt;span class="c"&gt;# Check current status&lt;/span&gt;
/plancollab status

&lt;span class="c"&gt;# Resume an interrupted session&lt;/span&gt;
/plancollab resume

&lt;span class="c"&gt;# Delete sessions&lt;/span&gt;
/plancollab delete
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Auto-Review Mode
&lt;/h3&gt;

&lt;p&gt;When you generate a complex implementation plan in conversation, PlanCollab can automatically ask:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"I just generated an implementation plan. Would you like the other agent to review it?"&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Choose &lt;strong&gt;"Yes, always"&lt;/strong&gt; to enable auto-review for all future plans.&lt;/p&gt;

&lt;h2&gt;
  
  
  🔄 How It Works
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User provides task
    ↓
Baseline check (first-time project scan, reused later)
    ↓
┌─ 🐶 Creates plan ──────────────────────────────┐
│    ↓                                            │
│ 🐱 Reviews (automatic)                          │
│    ↓                                            │
│ Agreed sections locked, disputed sections continue │
│    ↓                                            │
│ Communication sends only disputed sections      │
│    ↓                                            │
│ Next round focuses on disagreements ────────────┘
    ↓
Result: Consensus reached / Conflicts escalated to user
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Key Design Principles
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Projection Communication&lt;/strong&gt;: Each round sends a "projection" to the other agent - agreed sections become one-liners, only disputed sections are sent in full. As consensus grows, communication shrinks, preventing context overflow.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Consensus Tracking&lt;/strong&gt;: After each review, a consensus file records which sections are agreed and which are still disputed. This drives the projection and conflict resolution.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Discussion Log&lt;/strong&gt;: Tracks what was discussed each round, what was resolved, and what remains. The reviewer uses this to avoid re-raising already-addressed issues.&lt;/p&gt;

&lt;h2&gt;
  
  
  ⚡ Conflict Resolution
&lt;/h2&gt;

&lt;p&gt;If consensus isn't reached after 3 rounds (default), PlanCollab:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Extracts specific disagreement points&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Shows both positions&lt;/strong&gt; in a structured table&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Lists reasoning&lt;/strong&gt; from both agents&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;References source files&lt;/strong&gt; for verification&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Example conflict summary:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;🐶 ⚡ 🐱 3 rounds without full consensus. Your decision needed.

1. Token Storage Method

|        | 🐶 Claude              | 🐱 Codex                          |
|--------|------------------------|
| Position | httpOnly cookie       | localStorage                       |
| Concession | Round 2 added SameSite | Round 3 acknowledged security |

🐶 Reasoning:
1. XSS cannot read cookies, natural protection
2. Combined with SameSite=Strict prevents CSRF
3. Server-controlled expiry, reliable forced logout

🐱 Reasoning:
1. Simpler implementation, no backend coordination needed
2. Cross-domain scenarios have SameSite restrictions
3. Short expiry + refresh token makes risk manageable
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can then:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Decide each point&lt;/strong&gt; - Make a call on each disagreement&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Continue discussion&lt;/strong&gt; - Let them negotiate more rounds&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Accept current version&lt;/strong&gt; - Use the latest plan as-is&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  🔧 Bidirectional Support
&lt;/h2&gt;

&lt;p&gt;PlanCollab works from both Claude Code and Codex:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Environment&lt;/th&gt;
&lt;th&gt;Default Role&lt;/th&gt;
&lt;th&gt;Calls Other Agent&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;In Claude Code&lt;/td&gt;
&lt;td&gt;🐶 plans, 🐱 reviews&lt;/td&gt;
&lt;td&gt;&lt;code&gt;codex exec&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;In Codex&lt;/td&gt;
&lt;td&gt;🐱 plans, 🐶 reviews&lt;/td&gt;
&lt;td&gt;&lt;code&gt;claude -p&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The skill auto-detects the environment via &lt;code&gt;$CLAUDECODE&lt;/code&gt; variable. Roles can be swapped anytime by saying "swap roles" or "let the other agent plan".&lt;/p&gt;

&lt;h2&gt;
  
  
  📊 Review Criteria
&lt;/h2&gt;

&lt;p&gt;Plans are evaluated on 6 dimensions:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Completeness&lt;/strong&gt; - Covers all aspects of the task&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Correctness&lt;/strong&gt; - Technically sound for this codebase&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Feasibility&lt;/strong&gt; - Each step is implementable as described&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Step ordering&lt;/strong&gt; - Correct dependency order&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Risk coverage&lt;/strong&gt; - Edge cases addressed&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Testing&lt;/strong&gt; - Adequate verification strategy&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Verdict: &lt;strong&gt;APPROVED&lt;/strong&gt; (no critical/major issues) or &lt;strong&gt;NEEDS_REVISION&lt;/strong&gt; (any critical/major issue).&lt;/p&gt;

&lt;h2&gt;
  
  
  📁 File Structure
&lt;/h2&gt;

&lt;p&gt;Each collaboration creates a session directory with complete history:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;.plancollab/
├── config.json                    # User preferences (auto_review)
├── baseline.md                    # Project architecture baseline
├── 2026-04-25-lru-cache/          # A collaboration session
│   ├── state.json                 # State (rounds, consensus, timestamps)
│   ├── plan.md                    # Final approved plan
│   ├── plans/
│   │   ├── round-1-cc.md          # 🐶 Claude round 1 plan
│   │   └── round-2-cc.md          # 🐶 Claude round 2 plan (revised)
│   └── reviews/
│       ├── round-1-cx.md          # 🐱 Codex round 1 review
│       ├── round-1-consensus-cx.md # Consensus: agreed/disputed
│       └── round-2-cx.md
└── 2026-04-25-auth/               # Another session
    └── ...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;File naming: &lt;code&gt;-cc&lt;/code&gt; = Claude created, &lt;code&gt;-cx&lt;/code&gt; = Codex created.&lt;/p&gt;

&lt;h2&gt;
  
  
  🎨 Identity System
&lt;/h2&gt;

&lt;p&gt;Clear visual markers throughout:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;🐶 = Claude (always)&lt;/li&gt;
&lt;li&gt;🐱 = Codex (always)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Session output examples:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;🐶 🐱 PlanCollab Started
Task:     Implement user authentication module
Planner:  🐶 Claude
Reviewer: 🐱 Codex
Rounds:   max 3
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;🐶 🐱 PlanCollab Ended
Result:  ✅ approved
Rounds:  2/3
Final:   .plancollab/2026-04-25-auth/plan.md
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  📦 Installation
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Using npx skills (Recommended)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx skills add conanttu/skills/plancollab &lt;span class="nt"&gt;-g&lt;/span&gt; &lt;span class="nt"&gt;-y&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Manual Installation
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Clone the skills repository&lt;/span&gt;
git clone https://github.com/conanttu/skills.git
&lt;span class="nb"&gt;cd &lt;/span&gt;skills

&lt;span class="c"&gt;# Symlink to Claude Code&lt;/span&gt;
&lt;span class="nb"&gt;ln&lt;/span&gt; &lt;span class="nt"&gt;-s&lt;/span&gt; &lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;pwd&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;/plancollab ~/.claude/skills/plancollab

&lt;span class="c"&gt;# Also symlink to Codex for bidirectional support&lt;/span&gt;
&lt;span class="nb"&gt;ln&lt;/span&gt; &lt;span class="nt"&gt;-s&lt;/span&gt; &lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;pwd&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;/plancollab ~/.agents/skills/plancollab
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Prerequisites&lt;/strong&gt;: Both &lt;code&gt;codex&lt;/code&gt; CLI and &lt;code&gt;claude&lt;/code&gt; CLI must be installed and in PATH.&lt;/p&gt;

&lt;h2&gt;
  
  
  🆚 vs Traditional Code Review
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Traditional Code Review&lt;/th&gt;
&lt;th&gt;PlanCollab&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Review Target&lt;/td&gt;
&lt;td&gt;Code&lt;/td&gt;
&lt;td&gt;Implementation plan (before code)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Reviewer&lt;/td&gt;
&lt;td&gt;Human&lt;/td&gt;
&lt;td&gt;Another AI Agent&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Iteration&lt;/td&gt;
&lt;td&gt;Manual back-and-forth&lt;/td&gt;
&lt;td&gt;Automatic multi-round discussion&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Consensus Management&lt;/td&gt;
&lt;td&gt;Mental notes&lt;/td&gt;
&lt;td&gt;File-based consensus tracking&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Conflict Resolution&lt;/td&gt;
&lt;td&gt;Offline communication&lt;/td&gt;
&lt;td&gt;Structured display + user arbitration&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;History&lt;/td&gt;
&lt;td&gt;In PR comments&lt;/td&gt;
&lt;td&gt;Independent session directory, fully preserved&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  🌟 Use Cases
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Perfect for:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Complex features with multiple implementation approaches&lt;/li&gt;
&lt;li&gt;Architectural decisions that need validation&lt;/li&gt;
&lt;li&gt;Refactoring plans that affect many files&lt;/li&gt;
&lt;li&gt;Security-sensitive implementations&lt;/li&gt;
&lt;li&gt;Performance-critical code paths&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Not ideal for:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Simple bug fixes&lt;/li&gt;
&lt;li&gt;Single-file changes&lt;/li&gt;
&lt;li&gt;Well-established patterns in your codebase&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  💡 Tips for Best Results
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Be specific in your task description&lt;/strong&gt; - The clearer the task, the better the plan&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Keep baseline updated&lt;/strong&gt; - Rescan after major architectural changes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Review conflict summaries carefully&lt;/strong&gt; - Both agents often have valid points&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use auto-review for non-trivial plans&lt;/strong&gt; - Catches issues early&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Archive approved plans&lt;/strong&gt; - They serve as implementation documentation&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  📚 Learn More
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/conanttu/skills" rel="noopener noreferrer"&gt;GitHub Repository&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/conanttu/skills/tree/main/plancollab" rel="noopener noreferrer"&gt;Full Documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/conanttu/skills/blob/main/plancollab/docs/product-guide-zh.md" rel="noopener noreferrer"&gt;Chinese Guide&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;strong&gt;Ready to level up your implementation planning?&lt;/strong&gt; Try PlanCollab today and experience the power of AI-to-AI collaboration!&lt;/p&gt;

&lt;p&gt;🐶 🐱 Happy Collaborating!&lt;/p&gt;




&lt;p&gt;&lt;em&gt;What are your thoughts on AI-to-AI collaboration? Have you tried similar approaches? Share your experiences in the comments below!&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>claude</category>
      <category>codex</category>
      <category>collaboration</category>
    </item>
    <item>
      <title>skill-insp: A Skill That Scores Other Skills</title>
      <dc:creator>Conan</dc:creator>
      <pubDate>Mon, 25 May 2026 16:48:54 +0000</pubDate>
      <link>https://forem.com/conanttu/skill-insp-a-skill-that-scores-other-skills-3gga</link>
      <guid>https://forem.com/conanttu/skill-insp-a-skill-that-scores-other-skills-3gga</guid>
      <description>&lt;h1&gt;
  
  
  skill-insp: A Skill That Scores Other Skills
&lt;/h1&gt;

&lt;p&gt;If you've been building &lt;a href="https://claude.com/claude-code" rel="noopener noreferrer"&gt;Claude Code&lt;/a&gt; skills for a while, you've probably noticed a pattern: every skill author makes the same mistakes the first few times. Vague descriptions that fail to trigger. Workflows that don't say what to do when files are missing. &lt;code&gt;allowed-tools&lt;/code&gt; that ask for &lt;code&gt;Bash&lt;/code&gt; with no glob restriction. No eval scenarios, so you have no idea if the skill actually works.&lt;/p&gt;

&lt;p&gt;I built &lt;strong&gt;skill-insp&lt;/strong&gt; to catch those mistakes automatically. It's a skill that inspects other skills, scores them across 8 dimensions, and tells you what to fix.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Source: &lt;a href="https://github.com/conanttu/skills" rel="noopener noreferrer"&gt;github.com/conanttu/skills&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  What It Does
&lt;/h2&gt;

&lt;p&gt;You point skill-insp at a folder containing a &lt;code&gt;SKILL.md&lt;/code&gt; and it gives you:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;A &lt;strong&gt;0–100 score&lt;/strong&gt; across 8 weighted dimensions (Structure, Triggering, Usability, Completeness, Progressive Disclosure, Testability, Maintainability, Safety &amp;amp; Trust)&lt;/li&gt;
&lt;li&gt;A &lt;strong&gt;prioritized list of findings&lt;/strong&gt; (High / Medium / Low) with file:line references&lt;/li&gt;
&lt;li&gt;An &lt;strong&gt;HTML report&lt;/strong&gt; you can open in a browser&lt;/li&gt;
&lt;li&gt;The ability to &lt;strong&gt;apply&lt;/strong&gt; high-priority fixes with automatic backup and &lt;strong&gt;revert&lt;/strong&gt; with SHA-256 hash verification&lt;/li&gt;
&lt;li&gt;A &lt;strong&gt;functional eval runner&lt;/strong&gt; that spawns sub-agents against fixture skills to verify the skill's own logic
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;✨ skill-insp ✨

Overall: my-skill scores 66/100 — risk low, readiness usable-with-improvements.

Key strengths
- Minimal, appropriate permissions (Read and Write only)
- Clear 3-step workflow
- Clean YAML frontmatter with version tracking

Scorecard
| Dimension              | Score |
|------------------------|-------|
| Structure              |  8/10 |
| Triggering             |  9/15 |
| Usability              |  9/15 |
| Completeness           |  7/15 |
| Progressive Disclosure |  7/10 |
| Testability            |  4/10 |
| Maintainability        |  8/10 |
| Safety &amp;amp; Trust         | 14/15 |
| Total                  | 66/100 |

Recommendations
  Medium  Add error-handling for missing or unreadable files.
  Medium  Create evals/ with at least one input and expected output.
  Low     Expand README with usage examples or remove the placeholder.

HTML report: /abs/path/to/cache/my-skill/latest.html
✨ skill-insp ✨
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  The Scoring Rubric
&lt;/h2&gt;

&lt;p&gt;The 100 points are weighted toward what actually breaks skills in practice:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Dimension&lt;/th&gt;
&lt;th&gt;Max&lt;/th&gt;
&lt;th&gt;Why it matters&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Structure&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;Frontmatter parses, folder layout makes sense&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Triggering&lt;/td&gt;
&lt;td&gt;15&lt;/td&gt;
&lt;td&gt;Description gets the skill invoked in the right contexts&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Usability&lt;/td&gt;
&lt;td&gt;15&lt;/td&gt;
&lt;td&gt;Workflow steps are concrete and runnable&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Completeness&lt;/td&gt;
&lt;td&gt;15&lt;/td&gt;
&lt;td&gt;Edge cases, inputs/outputs, failure handling&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Progressive Disclosure&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;SKILL.md stays lean; details live in references&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Testability&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;Evals or success criteria exist&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Maintainability&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;No duplication, no stale placeholders&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Safety &amp;amp; Trust&lt;/td&gt;
&lt;td&gt;15&lt;/td&gt;
&lt;td&gt;Permissions scoped, no hidden network, no destructive ops&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Safety &amp;amp; Trust&lt;/strong&gt; is the dimension where pattern-matching tools fall apart, so skill-insp does semantic analysis here. It distinguishes between documentation and executable code: a SKILL.md that says "check for &lt;code&gt;rm -rf&lt;/code&gt; usage" in a safety checklist is &lt;em&gt;not&lt;/em&gt; a destructive operation. A &lt;code&gt;scripts/cleanup.sh&lt;/code&gt; that actually runs &lt;code&gt;rm -rf "$TEMP_DIR"&lt;/code&gt; &lt;em&gt;is&lt;/em&gt;, and gets flagged with the file:line reference.&lt;/p&gt;

&lt;h2&gt;
  
  
  How It's Built
&lt;/h2&gt;

&lt;p&gt;The skill follows the "model is the analyzer" pattern. There's no Python or Node script that parses YAML and counts characters. Instead:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;skill-insp/
├── SKILL.md                       # Workflow + the 4 modes (default, detailed, apply, revert) + Run Evals
├── README.md
├── references/
│   ├── rubric.md                  # Scoring dimensions and what good looks like
│   └── output-format.md           # JSON schema for analysis.json
├── scripts/
│   ├── render-html.js             # analysis.json → latest.html
│   └── run-evals.js               # fixture setup + sub-agent prompt generation
├── assets/
│   └── report_template.html       # Self-contained HTML template
└── evals/
    └── evals.json                 # 8 functional eval scenarios
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Two deterministic scripts handle the parts that should be deterministic:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;render-html.js&lt;/code&gt;&lt;/strong&gt; turns a JSON analysis into a self-contained HTML report. The template uses CSS custom properties for theming, conic-gradient score rings, and an auto-hidden eval results section.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;run-evals.js&lt;/code&gt;&lt;/strong&gt; creates a fixture skill in &lt;code&gt;cache/_fixtures/&amp;lt;id&amp;gt;/&lt;/code&gt;, copies a snapshot of skill-insp itself into &lt;code&gt;_skill_home/&lt;/code&gt; so sub-agents can resolve &lt;code&gt;&amp;lt;this-skill&amp;gt;&lt;/code&gt; references, and prints a self-contained sub-agent prompt.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Everything else — reading files, parsing YAML, evaluating the rubric, distinguishing documentation from code — is done by the model. This sounds slower than a parser, but it's actually the only way to do it correctly. A regex doesn't know that &lt;code&gt;rm -rf&lt;/code&gt; inside a markdown code fence labeled "examples to flag" is not the same as &lt;code&gt;rm -rf&lt;/code&gt; inside an executable script.&lt;/p&gt;

&lt;h2&gt;
  
  
  Progressive Disclosure in Practice
&lt;/h2&gt;

&lt;p&gt;Earlier versions of skill-insp had a 300-line SKILL.md with the entire rubric inline. That hit context budget hard and made the skill harder to edit. The current layout pushes details to references:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;SKILL.md&lt;/strong&gt; holds the workflow and mode entry points. ~150 lines.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;rubric.md&lt;/strong&gt; holds the scoring dimensions, priority levels, compactness rules, and safety guidance. Only loaded when the skill actually scores something.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;output-format.md&lt;/strong&gt; holds the JSON schema. Only loaded when writing &lt;code&gt;analysis.json&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The result: SKILL.md is small enough to read in one sitting, and the model only loads the rubric when it's about to score.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Eval Runner
&lt;/h2&gt;

&lt;p&gt;This is the part that took the most iteration. The idea is simple: skill-insp ships with 8 eval scenarios in &lt;code&gt;evals/evals.json&lt;/code&gt;, each describing a user prompt, fixture files to create, and expectations to verify. To run them:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;node scripts/run-evals.js &amp;lt;skill-path&amp;gt; list
node scripts/run-evals.js &amp;lt;skill-path&amp;gt; setup &amp;lt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;setup&lt;/code&gt; creates a fixture directory, writes the fixture files, copies skill-insp's own resources into &lt;code&gt;_skill_home/&lt;/code&gt;, and prints a JSON payload that includes a &lt;code&gt;sub_agent_prompt&lt;/code&gt;. The parent agent reads that JSON, spawns a sub-agent with the prompt, and after the sub-agent finishes, checks each expectation:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;File expectations&lt;/strong&gt; ("analysis.json is written") → &lt;code&gt;find&lt;/code&gt; over the fixture directory.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Content expectations&lt;/strong&gt; ("a high-priority finding is raised") → read the output files.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Behavioral expectations&lt;/strong&gt; ("the model reports an error and stops") → judge from the sub-agent's text output.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  The Sandbox Gotcha
&lt;/h3&gt;

&lt;p&gt;In the first version, fixtures were created under &lt;code&gt;os.tmpdir()&lt;/code&gt;. This worked when I ran it manually, but sub-agents spawned by the harness were sandboxed to the project root — they got "permission denied" on every &lt;code&gt;Read&lt;/code&gt; and &lt;code&gt;Bash&lt;/code&gt; call against &lt;code&gt;/var/folders/.../T/eval-skill-insp-*&lt;/code&gt;. Three out of eight evals failed for sandbox reasons that had nothing to do with the skill's logic.&lt;/p&gt;

&lt;p&gt;The fix was a one-line change: move fixtures into &lt;code&gt;cache/_fixtures/&amp;lt;id&amp;gt;/&lt;/code&gt; inside the project. Now sub-agents inherit the project's filesystem permissions, and the cache directory is &lt;code&gt;.gitignore&lt;/code&gt;d so it doesn't pollute commits. After the change, all 8 evals run cleanly.&lt;/p&gt;

&lt;p&gt;Lesson worth remembering: &lt;strong&gt;if you're going to spawn sub-agents, keep their working directory inside the parent's sandbox&lt;/strong&gt;. Temp directories outside the project tree look like a clean choice but break under tighter permission policies.&lt;/p&gt;

&lt;h2&gt;
  
  
  Apply and Revert
&lt;/h2&gt;

&lt;p&gt;When you say "apply recommendations", skill-insp:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Re-reads the target files (state may have changed since the inspection).&lt;/li&gt;
&lt;li&gt;Copies each file it's about to modify into &lt;code&gt;&amp;lt;cache_dir&amp;gt;/last-apply/&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Computes SHA-256 hashes before and after, recording them in &lt;code&gt;manifest.json&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Makes the minimal edits.&lt;/li&gt;
&lt;li&gt;Re-runs the analysis so you see the new score.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The manifest looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"applied_at"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2026-05-25T23:16:00Z"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"recommendations_applied"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"priority"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"high"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"dimension"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"triggering"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"text"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Replace vague description..."&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"priority"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"high"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"dimension"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"usability"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"text"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Add a ## Workflow section..."&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"priority"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"high"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"dimension"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"completeness"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"text"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Add allowed-tools list..."&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"files"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"relative_path"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"SKILL.md"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"before_sha256"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"e698920f94613f8fc335cd0e941938e0990bedd72cea66e52a6b956d4ff47845"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"after_sha256"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="s2"&gt;"10c1fafffbfd2b0089a85e72aafc43432374ef627cb0b41602f8396083fa2800"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Revert is the inverse: read the manifest, verify the current file hash matches the recorded &lt;code&gt;after_sha256&lt;/code&gt; (so we don't blow away edits made after the apply), then restore from backup. If the hash doesn't match, skill-insp reports the conflict instead of overwriting. It never falls back to &lt;code&gt;git reset --hard&lt;/code&gt; or &lt;code&gt;git checkout --&lt;/code&gt; — those are blast-radius operations that don't belong in a recovery path.&lt;/p&gt;

&lt;h2&gt;
  
  
  Self-Inspection
&lt;/h2&gt;

&lt;p&gt;Because skill-insp is itself a skill, it can score itself:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;You: 评估 .claude/skills/skill-insp
Claude: ✨ skill-insp ✨
        Overall: skill-insp scores 94/100 — risk low, readiness ready.
        ...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The first time I did this, the report flagged things I'd already half-noticed but not bothered to fix: the cache slug derivation rule was dense without an example, the description didn't mention "fix" or "improve" as triggers, and there was no Node.js version floor documented. All of these became Low/Medium recommendations, which I then applied — and the score went up.&lt;/p&gt;

&lt;p&gt;This is the most useful feedback loop I've found for skill authoring: write the skill, run skill-insp against it, apply the high-priority recommendations, repeat. The eval suite then verifies the workflow still works end-to-end.&lt;/p&gt;

&lt;h2&gt;
  
  
  When NOT to Use It
&lt;/h2&gt;

&lt;p&gt;skill-insp's description explicitly says &lt;strong&gt;"Not for general code review."&lt;/strong&gt; It's not a linter for arbitrary Python or TypeScript. It's specifically tuned to the structure and conventions of Claude Code skills:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;It expects &lt;code&gt;SKILL.md&lt;/code&gt; with YAML frontmatter.&lt;/li&gt;
&lt;li&gt;It scores against a skill-specific rubric.&lt;/li&gt;
&lt;li&gt;Its safety analysis is calibrated to skills (permission scoping, undisclosed network access in scripts, prompt injection in instructions).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you point it at a normal source tree, it'll refuse to score because there's no &lt;code&gt;SKILL.md&lt;/code&gt; — that's the intended behavior, not a bug.&lt;/p&gt;

&lt;h2&gt;
  
  
  Trying It
&lt;/h2&gt;

&lt;p&gt;Clone the repo and drop the folder into your Claude Code skills directory:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/conanttu/skills.git
&lt;span class="nb"&gt;ln&lt;/span&gt; &lt;span class="nt"&gt;-s&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;pwd&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="s2"&gt;/skills/skill-insp"&lt;/span&gt; ~/.claude/skills/skill-insp
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then in any Claude Code session:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;inspect the skill at ./my-skill
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or in Chinese:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;evaluate ./my-skill
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The trigger phrases are listed in the description so the skill is invoked automatically. After the inspection, follow the numbered prompts:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;code&gt;detailed mode&lt;/code&gt; to expand evidence&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;apply recommendations&lt;/code&gt; to auto-fix high-priority findings&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;run evals&lt;/code&gt; to verify the skill with eval scenarios&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;revert&lt;/code&gt; to undo the last apply&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;p&gt;The current version is 1.0.0. A few things I'd like to add:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Diff view in the HTML report&lt;/strong&gt; showing what &lt;code&gt;apply&lt;/code&gt; actually changed.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cross-skill consistency checks&lt;/strong&gt; for repos that ship multiple skills.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Optional schema validation&lt;/strong&gt; for &lt;code&gt;evals.json&lt;/code&gt; and &lt;code&gt;analysis.json&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you build skills regularly, give it a try and let me know what falls over. The eval scenarios in &lt;code&gt;evals/evals.json&lt;/code&gt; are a good place to start if you want to extend it — adding a new scenario is just adding a JSON entry with a prompt, fixture files, and expectations.&lt;/p&gt;

</description>
      <category>claudecode</category>
      <category>ai</category>
      <category>devtools</category>
      <category>productivity</category>
    </item>
  </channel>
</rss>
