Forem: xzawed

I built a little harness around Claude Code 🛠️

xzawed — Sun, 03 May 2026 15:47:07 +0000

Hey 👋

So I've been using Claude Code for a couple months now and somewhere along the way I ended up building this little... setup? framework? "harness" feels about right. Basically a pile of files and conventions that make Claude behave consistently across my projects.

Gonna share what's in mine. Not because I think it's optimal — lol I'm like 2 months in — but I almost never see people post their actual setup. Maybe a piece of this is useful to you. Maybe you'll laugh at me. Either way 🤷

My (slightly weird?) philosophy 🤝

Three words: equality, coexistence, cooperation.

I know I know, that sounds dramatic for a dev tool 😅. What it actually means in practice: Claude does the writing/editing/deleting. I do the directing and decision-making. But — and I think this is the part most people miss — since Claude is the one doing the work, the environment has to be designed for Claude to work well in. Not just for me to skim.

If CLAUDE.md is unclear, that's my problem to fix. Not Claude's problem to silently work around.

So my whole methodology fits in one sentence: make a comfortable workspace for the executor.

What's in the harness 📦

1. `CLAUDE.md` — the brain dump

Every project has one. It's got:

What the project is and where it's at
Architecture decisions (lightweight ADR vibes)
Conventions — commit style, branch rules, when to test
Stuff Claude should not do automatically
Schemas, security boundaries, weird gotchas

This is the first thing Claude reads. If a session keeps going "wait what is this project again?" — that's me failing at writing this file, not Claude failing at reading it.

2. The `.claude/` folder 📂

This is where the harness actually lives:

.claude/
├── settings.json     # permissions, env, hook routing
├── hooks/            # shell scripts for lifecycle stuff
├── agents/           # sub-agents for narrow tasks
└── commands/         # slash commands I keep retyping

Hooks are 🐐. Highest-leverage piece for me, by a lot. My standard set across projects:

Session-start hook that injects fresh context (latest CLAUDE.md + recent commits)
TDD runner (tests run before certain edits go through)
Type check + lint gate
Pre-push security scan
Commit message validator

They just... run. I never have to remember.

Sub-agents I keep sparse. Honestly I made a bunch and deleted most of them because they weren't pulling their weight. The ones that survived have very narrow jobs — like a security-review agent that only sees the diff and a checklist. Nothing else.

Slash commands are literally just "prompts I'm tired of typing."

3. Same stack, every project 🧱

I have four active repos. They all share basically the same stack: Next.js + TypeScript strict + Tailwind + Zustand + Supabase/PostgreSQL + Railway + Playwright.

Is this the best stack? No clue tbh 🙃 But because it's consistent, my hooks and CLAUDE.md patterns just... work in any of them. No rewriting from scratch every time.

It's all implicit copy-paste though. Nothing extracted to a real template yet. Def on the todo list.

4. The watcher 👁️

One of those four repos is a thing I built that watches the others. Runs static analysis on my commits, does an AI code review pass, scores the code on a few axes, and pings me on Telegram if something's bad. Can block merges if the score tanks.

Why is this part of the harness? Because without it I have zero signal on whether the harness is making my code better over time, or just faster. Might honestly be the most important piece.

Stuff I've figured out (so far) 💭

Building the harness IS the work. I really underestimated how much time goes into tweaking CLAUDE.md, tuning hooks, killing sub-agents that turned out useless. It's not config-tweaking, it's actual engineering.

Claude reflects whatever I give it. Garbage CLAUDE.md → garbage output. Hooks don't run → quality gate doesn't exist. The system is exactly as good as the harness around it. Full stop.

Decisions stay with me. I don't auto-accept Claude's suggestions wholesale. The harness exists to make the executor execute well — not to outsource my judgment. That line is way more important than any specific tool choice imo.

Take all this with salt 🧂. I'm a couple months in. One person, one set of projects. Your harness will probably look different. Use this as a starting list, not a recipe.

What's broken / missing 🚧

A lot tbh:

No real template extracted — every repo got hand-tuned
Sub-agents are barely documented, I'm running on memory more than I should
I don't actually measure "harness ROI" — just vibes + the watcher's scores
Hook failure modes aren't really tested. If a hook silently fails I probably won't notice for a while 💀

If you've solved any of this in your own setup I would genuinely love to know how 🙏

Anyway that's where I'm at. If your harness looks totally different, drop it in the comments — I'm in the phase where every other person's setup teaches me something.

Peace ✌️

I built a tool that runs static analysis + Claude AI review on every GitHub Push/PR — SCAManager

xzawed — Sat, 25 Apr 2026 13:44:00 +0000

It started as a small annoyance
PR reviews are always a chore. On a small team — or a side project I run alone — the "someone has to look at this" person is always me. And if you're pushing straight to main, code review effectively disappears.
I started by stacking pylint and flake8 on top of GitHub Actions. But those don't answer the questions that actually matter: did this change do what I meant it to? Or does the commit message actually describe what changed? Static analysis catches grammar and style. It can't read intent.
So I asked Claude to review the same diffs, fused both signals together, scored them out of 100, and pushed the result to Telegram. That became SCAManager.
GitHub: https://github.com/xzawed/SCAManager

What it does
When a GitHub Webhook fires for a Push or PR event, the following runs in parallel:

Static analysis — pylint, flake8, bandit
AI code review — Claude Haiku 4.5
Commit message evaluation — Claude AI

Results map to a 100-point score and an A–F grade, then ship to whichever of the nine channels you've configured: Telegram, GitHub PR Comment, GitHub Commit Comment, GitHub Issue, Discord, Slack, Email, Generic Webhook, n8n.
For PRs, the score drives the gate automatically:

Auto mode — Above threshold → GitHub APPROVE. Below → REQUEST_CHANGES.
Semi-auto mode — Inline buttons in Telegram for manual approval.
Auto-merge — Above a separate threshold → squash merge.

The scoring system — why these weights
ItemPointsEvaluatorCode quality25pylint + flake8Security20banditCommit message15Claude AIImplementation direction25Claude AITest coverage15Claude AITotal100
Things machines see well go to machines (pylint, bandit). Things that need human judgment go to AI. AI evaluations come back on a 0–10 or 0–20 scale, then get re-weighted into the final score.
If ANTHROPIC_API_KEY isn't set, the AI items default to a neutral middle, and static analysis alone can still hit 89 points (B grade) at most. The tool isn't useless without API spend.

Architecture — the parts that were interesting to build

asyncio.gather() for parallelism Running static analysis and AI review serially makes per-PR analysis time miserable. Wrapping them in asyncio.gather() collapses total wall-clock to whatever the slowest task is. I use asyncio.gather(return_exceptions=True) for the nine notification channels too — but here the goal is isolation, not speed. If Telegram is down, that shouldn't block Slack.
Idempotency — same SHA, no double work GitHub Webhooks get retransmitted (response timeouts, retries, etc.). Running the same commit SHA twice costs money and produces no new information, so I dedupe by SHA at the DB layer.

GitHub Push/PR
  └─ POST /webhooks/github  (HMAC-SHA256 verification)
       └─ BackgroundTask: run_analysis_pipeline()
            ├─ Repo register · SHA dedup (idempotency)
            ├─ asyncio.gather() ── parallel
            │    ├─ analyze_file() × N  (pylint · flake8 · bandit)
            │    └─ review_code()       (Claude AI)
            ├─ calculate_score() → grade
            ├─ run_gate_check()  [PR only]
            └─ asyncio.gather(return_exceptions=True) → notification

channels

Two ways to use the AI Same review, two call paths:

Server mode — Anthropic API. Needs ANTHROPIC_API_KEY. Costs money.
Local hook mode — Claude Code CLI (claude -p). Runs locally, no API key needed.

Local hook mode runs as a pre-push git hook. Output goes to terminal and to the dashboard. Environments without the CLI (Codespaces, mobile) silently skip the hook — exit 0 always, never blocks the push.

DB Failover I built a FailoverSessionFactory that switches over to a fallback PostgreSQL when primary dies. /health reports which DB is currently active. Honestly, this is probably over-engineered. Whether a small side project actually needs failover is a separate question — building it was largely a learning exercise.

Limits and trade-offs
This tool isn't going to fit every team. Being honest about it:

Python-only — Static analysis is pylint/flake8/bandit. For non-Python repos, only the AI review piece gives you value.
AI score consistency — LLM output isn't 100% deterministic. The score is for spotting trends, not as a hard, trustworthy number.
API cost — Teams shipping big PRs frequently can rack up Claude API spend fast. File filters and thresholds give you some control, but it's a real cost line.
Auto-merge risk — Score-driven squash merge is convenient and dangerous. Validate your threshold settings before turning it on. Start in semi-auto mode.

If you want to try it

Repo: https://github.com/xzawed/SCAManager
License: MIT
Required: Python 3.13 · PostgreSQL · GitHub OAuth App
Optional: ANTHROPIC_API_KEY · Telegram Bot Token · SMTP

Easiest deploy: Railway with the PostgreSQL plugin and your env vars filled in. For on-prem, uvicorn + nginx + systemd works fine.
Feedback, issues, and "wait, is this actually how it should behave?" reports are all welcome.