<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: xzawed</title>
    <description>The latest articles on Forem by xzawed (@xzawed).</description>
    <link>https://forem.com/xzawed</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3896824%2F58a4e9be-19f4-4cae-ae5a-23efa15ec65a.png</url>
      <title>Forem: xzawed</title>
      <link>https://forem.com/xzawed</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/xzawed"/>
    <language>en</language>
    <item>
      <title>I built a little harness around Claude Code 🛠️</title>
      <dc:creator>xzawed</dc:creator>
      <pubDate>Sun, 03 May 2026 15:47:07 +0000</pubDate>
      <link>https://forem.com/xzawed/i-built-a-little-harness-around-claude-code-5ae0</link>
      <guid>https://forem.com/xzawed/i-built-a-little-harness-around-claude-code-5ae0</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhci216hnoodbuy1p137c.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhci216hnoodbuy1p137c.jpg" alt=" "&gt;&lt;/a&gt;&lt;br&gt;
Hey 👋&lt;/p&gt;

&lt;p&gt;So I've been using Claude Code for a couple months now and somewhere along the way I ended up building this little... setup? framework? "harness" feels about right. Basically a pile of files and conventions that make Claude behave consistently across my projects.&lt;/p&gt;

&lt;p&gt;Gonna share what's in mine. Not because I think it's optimal — lol I'm like 2 months in — but I almost never see people post their actual setup. Maybe a piece of this is useful to you. Maybe you'll laugh at me. Either way 🤷&lt;/p&gt;
&lt;h2&gt;
  
  
  My (slightly weird?) philosophy 🤝
&lt;/h2&gt;

&lt;p&gt;Three words: &lt;strong&gt;equality, coexistence, cooperation&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;I know I know, that sounds dramatic for a dev tool 😅. What it actually means in practice: Claude does the writing/editing/deleting. I do the directing and decision-making. But — and I think this is the part most people miss — since Claude is the one &lt;em&gt;doing&lt;/em&gt; the work, the environment has to be designed for Claude to work well in. Not just for me to skim.&lt;/p&gt;

&lt;p&gt;If &lt;code&gt;CLAUDE.md&lt;/code&gt; is unclear, that's &lt;em&gt;my&lt;/em&gt; problem to fix. Not Claude's problem to silently work around.&lt;/p&gt;

&lt;p&gt;So my whole methodology fits in one sentence: &lt;strong&gt;make a comfortable workspace for the executor.&lt;/strong&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  What's in the harness 📦
&lt;/h2&gt;
&lt;h3&gt;
  
  
  1. &lt;code&gt;CLAUDE.md&lt;/code&gt; — the brain dump
&lt;/h3&gt;

&lt;p&gt;Every project has one. It's got:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What the project is and where it's at&lt;/li&gt;
&lt;li&gt;Architecture decisions (lightweight ADR vibes)&lt;/li&gt;
&lt;li&gt;Conventions — commit style, branch rules, when to test&lt;/li&gt;
&lt;li&gt;Stuff Claude should &lt;em&gt;not&lt;/em&gt; do automatically&lt;/li&gt;
&lt;li&gt;Schemas, security boundaries, weird gotchas&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is the first thing Claude reads. If a session keeps going "wait what is this project again?" — that's me failing at writing this file, not Claude failing at reading it.&lt;/p&gt;
&lt;h3&gt;
  
  
  2. The &lt;code&gt;.claude/&lt;/code&gt; folder 📂
&lt;/h3&gt;

&lt;p&gt;This is where the harness actually lives:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;.claude/
├── settings.json     # permissions, env, hook routing
├── hooks/            # shell scripts for lifecycle stuff
├── agents/           # sub-agents for narrow tasks
└── commands/         # slash commands I keep retyping
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Hooks are 🐐.&lt;/strong&gt; Highest-leverage piece for me, by a lot. My standard set across projects:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Session-start hook that injects fresh context (latest &lt;code&gt;CLAUDE.md&lt;/code&gt; + recent commits)&lt;/li&gt;
&lt;li&gt;TDD runner (tests run before certain edits go through)&lt;/li&gt;
&lt;li&gt;Type check + lint gate&lt;/li&gt;
&lt;li&gt;Pre-push security scan&lt;/li&gt;
&lt;li&gt;Commit message validator&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;They just... run. I never have to remember.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Sub-agents&lt;/strong&gt; I keep sparse. Honestly I made a bunch and deleted most of them because they weren't pulling their weight. The ones that survived have very narrow jobs — like a security-review agent that &lt;em&gt;only&lt;/em&gt; sees the diff and a checklist. Nothing else.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Slash commands&lt;/strong&gt; are literally just "prompts I'm tired of typing."&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Same stack, every project 🧱
&lt;/h3&gt;

&lt;p&gt;I have four active repos. They all share basically the same stack: Next.js + TypeScript strict + Tailwind + Zustand + Supabase/PostgreSQL + Railway + Playwright.&lt;/p&gt;

&lt;p&gt;Is this the &lt;em&gt;best&lt;/em&gt; stack? No clue tbh 🙃 But because it's consistent, my hooks and &lt;code&gt;CLAUDE.md&lt;/code&gt; patterns just... work in any of them. No rewriting from scratch every time.&lt;/p&gt;

&lt;p&gt;It's all implicit copy-paste though. Nothing extracted to a real template yet. Def on the todo list.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. The watcher 👁️
&lt;/h3&gt;

&lt;p&gt;One of those four repos is a thing I built that watches the others. Runs static analysis on my commits, does an AI code review pass, scores the code on a few axes, and pings me on Telegram if something's bad. Can block merges if the score tanks.&lt;/p&gt;

&lt;p&gt;Why is this part of the harness? Because without it I have zero signal on whether the harness is making my code &lt;em&gt;better&lt;/em&gt; over time, or just &lt;em&gt;faster&lt;/em&gt;. Might honestly be the most important piece.&lt;/p&gt;

&lt;h2&gt;
  
  
  Stuff I've figured out (so far) 💭
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Building the harness IS the work.&lt;/strong&gt; I really underestimated how much time goes into tweaking &lt;code&gt;CLAUDE.md&lt;/code&gt;, tuning hooks, killing sub-agents that turned out useless. It's not config-tweaking, it's actual engineering.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Claude reflects whatever I give it.&lt;/strong&gt; Garbage &lt;code&gt;CLAUDE.md&lt;/code&gt; → garbage output. Hooks don't run → quality gate doesn't exist. The system is exactly as good as the harness around it. Full stop.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Decisions stay with me.&lt;/strong&gt; I don't auto-accept Claude's suggestions wholesale. The harness exists to make the executor execute well — not to outsource my judgment. That line is way more important than any specific tool choice imo.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Take all this with salt 🧂.&lt;/strong&gt; I'm a couple months in. One person, one set of projects. Your harness will probably look different. Use this as a starting list, not a recipe.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's broken / missing 🚧
&lt;/h2&gt;

&lt;p&gt;A lot tbh:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;No real template extracted — every repo got hand-tuned&lt;/li&gt;
&lt;li&gt;Sub-agents are barely documented, I'm running on memory more than I should&lt;/li&gt;
&lt;li&gt;I don't actually measure "harness ROI" — just vibes + the watcher's scores&lt;/li&gt;
&lt;li&gt;Hook failure modes aren't really tested. If a hook silently fails I probably won't notice for a while 💀&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you've solved any of this in your own setup I would genuinely love to know how 🙏&lt;/p&gt;




&lt;p&gt;Anyway that's where I'm at. If your harness looks totally different, drop it in the comments — I'm in the phase where every other person's setup teaches me something.&lt;/p&gt;

&lt;p&gt;Peace ✌️&lt;/p&gt;

</description>
      <category>ai</category>
      <category>claudecode</category>
      <category>productivity</category>
      <category>webdev</category>
    </item>
    <item>
      <title>I built a tool that runs static analysis + Claude AI review on every GitHub Push/PR — SCAManager</title>
      <dc:creator>xzawed</dc:creator>
      <pubDate>Sat, 25 Apr 2026 13:44:00 +0000</pubDate>
      <link>https://forem.com/xzawed/i-built-a-tool-that-runs-static-analysis-claude-ai-review-on-every-github-pushpr-scamanager-252m</link>
      <guid>https://forem.com/xzawed/i-built-a-tool-that-runs-static-analysis-claude-ai-review-on-every-github-pushpr-scamanager-252m</guid>
      <description>&lt;p&gt;It started as a small annoyance&lt;br&gt;
PR reviews are always a chore. On a small team — or a side project I run alone — the "someone has to look at this" person is always me. And if you're pushing straight to main, code review effectively disappears.&lt;br&gt;
I started by stacking pylint and flake8 on top of GitHub Actions. But those don't answer the questions that actually matter: did this change do what I meant it to? Or does the commit message actually describe what changed? Static analysis catches grammar and style. It can't read intent.&lt;br&gt;
So I asked Claude to review the same diffs, fused both signals together, scored them out of 100, and pushed the result to Telegram. That became SCAManager.&lt;br&gt;
GitHub: &lt;a href="https://github.com/xzawed/SCAManager" rel="noopener noreferrer"&gt;https://github.com/xzawed/SCAManager&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;What it does&lt;br&gt;
When a GitHub Webhook fires for a Push or PR event, the following runs in parallel:&lt;/p&gt;

&lt;p&gt;Static analysis — pylint, flake8, bandit&lt;br&gt;
AI code review — Claude Haiku 4.5&lt;br&gt;
Commit message evaluation — Claude AI&lt;/p&gt;

&lt;p&gt;Results map to a 100-point score and an A–F grade, then ship to whichever of the nine channels you've configured: Telegram, GitHub PR Comment, GitHub Commit Comment, GitHub Issue, Discord, Slack, Email, Generic Webhook, n8n.&lt;br&gt;
For PRs, the score drives the gate automatically:&lt;/p&gt;

&lt;p&gt;Auto mode — Above threshold → GitHub APPROVE. Below → REQUEST_CHANGES.&lt;br&gt;
Semi-auto mode — Inline buttons in Telegram for manual approval.&lt;br&gt;
Auto-merge — Above a separate threshold → squash merge.&lt;/p&gt;

&lt;p&gt;The scoring system — why these weights&lt;br&gt;
ItemPointsEvaluatorCode quality25pylint + flake8Security20banditCommit message15Claude AIImplementation direction25Claude AITest coverage15Claude AITotal100&lt;br&gt;
Things machines see well go to machines (pylint, bandit). Things that need human judgment go to AI. AI evaluations come back on a 0–10 or 0–20 scale, then get re-weighted into the final score.&lt;br&gt;
If ANTHROPIC_API_KEY isn't set, the AI items default to a neutral middle, and static analysis alone can still hit 89 points (B grade) at most. The tool isn't useless without API spend.&lt;/p&gt;

&lt;p&gt;Architecture — the parts that were interesting to build&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;asyncio.gather() for parallelism
Running static analysis and AI review serially makes per-PR analysis time miserable. Wrapping them in asyncio.gather() collapses total wall-clock to whatever the slowest task is.
I use asyncio.gather(return_exceptions=True) for the nine notification channels too — but here the goal is isolation, not speed. If Telegram is down, that shouldn't block Slack.&lt;/li&gt;
&lt;li&gt;Idempotency — same SHA, no double work
GitHub Webhooks get retransmitted (response timeouts, retries, etc.). Running the same commit SHA twice costs money and produces no new information, so I dedupe by SHA at the DB layer.
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;GitHub Push/PR
  └─ POST /webhooks/github  (HMAC-SHA256 verification)
       └─ BackgroundTask: run_analysis_pipeline()
            ├─ Repo register · SHA dedup (idempotency)
            ├─ asyncio.gather() ── parallel
            │    ├─ analyze_file() × N  (pylint · flake8 · bandit)
            │    └─ review_code()       (Claude AI)
            ├─ calculate_score() → grade
            ├─ run_gate_check()  [PR only]
            └─ asyncio.gather(return_exceptions=True) → notification 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;channels&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Two ways to use the AI
Same review, two call paths:&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Server mode — Anthropic API. Needs ANTHROPIC_API_KEY. Costs money.&lt;br&gt;
Local hook mode — Claude Code CLI (claude -p). Runs locally, no API key needed.&lt;/p&gt;

&lt;p&gt;Local hook mode runs as a pre-push git hook. Output goes to terminal and to the dashboard. Environments without the CLI (Codespaces, mobile) silently skip the hook — exit 0 always, never blocks the push.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;DB Failover
I built a FailoverSessionFactory that switches over to a fallback PostgreSQL when primary dies. /health reports which DB is currently active.
Honestly, this is probably over-engineered. Whether a small side project actually needs failover is a separate question — building it was largely a learning exercise.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Limits and trade-offs&lt;br&gt;
This tool isn't going to fit every team. Being honest about it:&lt;/p&gt;

&lt;p&gt;Python-only — Static analysis is pylint/flake8/bandit. For non-Python repos, only the AI review piece gives you value.&lt;br&gt;
AI score consistency — LLM output isn't 100% deterministic. The score is for spotting trends, not as a hard, trustworthy number.&lt;br&gt;
API cost — Teams shipping big PRs frequently can rack up Claude API spend fast. File filters and thresholds give you some control, but it's a real cost line.&lt;br&gt;
Auto-merge risk — Score-driven squash merge is convenient and dangerous. Validate your threshold settings before turning it on. Start in semi-auto mode.&lt;/p&gt;

&lt;p&gt;If you want to try it&lt;/p&gt;

&lt;p&gt;Repo: &lt;a href="https://github.com/xzawed/SCAManager" rel="noopener noreferrer"&gt;https://github.com/xzawed/SCAManager&lt;/a&gt;&lt;br&gt;
License: MIT&lt;br&gt;
Required: Python 3.13 · PostgreSQL · GitHub OAuth App&lt;br&gt;
Optional: ANTHROPIC_API_KEY · Telegram Bot Token · SMTP&lt;/p&gt;

&lt;p&gt;Easiest deploy: Railway with the PostgreSQL plugin and your env vars filled in. For on-prem, uvicorn + nginx + systemd works fine.&lt;br&gt;
Feedback, issues, and "wait, is this actually how it should behave?" reports are all welcome.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>claude</category>
      <category>python</category>
    </item>
  </channel>
</rss>
