Forem: Michael Tuszynski

The Five Failures That Shaped My Personal AI Stack

Michael Tuszynski — Tue, 26 May 2026 16:30:04 +0000

Every working stack is the residue of failures the operator did not see coming. The Saturday piece showed the architecture as it stands now. This piece is the inverse — the five specific incidents that produced the current shape. Each one started as a quiet bug and ended as a permanent change in how the system runs.

Failure 1: The Eleven-Day Stale Lock

On May 15 the session-end auto-commit hook tried to commit pending changes and failed. The commit attempt collided with a .git/index.lock file that had been sitting in the repo since May 3 — a zero-byte file created by a crashed git process eleven days earlier. The hook had been quietly failing every session in between, and nobody had noticed because the failure mode was silent.

Root cause: the hook had no defense against orphaned lock files. The original code assumed any .git/index.lock it encountered was held by a live git process, which is true ninety-nine times out of a hundred. The hundredth time was a process that died without releasing the lock.

Fix: a five-line stale-lock cleanup block. The hook checks for .git/index.lock before attempting the commit. If the lock exists, it checks the file's mtime against the current time — a lock older than five minutes is suspicious. If the mtime is old, the hook then verifies via lsof that no live process holds the file. Both conditions true: delete the lock. Either condition false: preserve it.

Healthy auto-commits complete in under a second. The five-minute threshold cannot race a real concurrent run. Tested across three scenarios — no lock, old lock with no holder, fresh lock with a live holder — before the change shipped.

The general lesson: hooks accumulate edge cases. The version of the hook that survives a year of daily use is the version that handles the failure modes you discovered along the way.

Failure 2: The Silently Forked Database

For eight days between May 4 and May 12, the content engine was writing to two different SQLite databases at the same time without anyone noticing. The cron pipeline at ~/services-local/content-engine/data/content.db was getting new topics from the daily trend-scan. The manual publish scripts in the same directory were also writing there. But a separate copy of the same database file at ~/.local/share/nexus/services-db/content-engine/content.db, which a broken Synology XSym symlink in the nexus path was silently resolving to, was getting the older trend-scan rows from the AI-driven path.

Both files had content rows, both had topics rows, both had publications rows, and the IDs overlapped. The reason this was not immediately catastrophic was that the disjoint content was bounded — temporal handoff between the two files happened cleanly on May 4 when the manual sprint began, so there were no genuine ID collisions, only orphaned rows on each side that the other side did not know about.

Root cause: a Synology XSym pointer in the nexus directory that had been treating one of the source files as a symlink to a different location than the canonical one. The XSym format does not behave the same way as a POSIX symlink across mount boundaries; the difference between the two had been silent.

Fix: an ID-offset merge that brought the orphaned rows from the older file into the canonical one (topics +1000, content/research/publications +100). The sqlite_sequence table got rebumped. PRAGMA foreign_key_check came back clean. Backups of both source databases were saved before the merge. The broken XSym symlink was replaced with a real POSIX symlink to the canonical path.

The general lesson: silent forks are the worst class of incident because they degrade trust in the data retroactively. Anything that reports counts, dedupes, or makes scheduling decisions against the table is suspect until reconciled.

Failure 3: The Re-Generated Drafts

On May 13 the 10 AM draft.ts cron produced two pending_review drafts for titles that had already been published in April. The system was about to ship a second copy of two pieces that had been live for weeks. The drafts sat in Slack for review and got caught before they shipped, but the failure mode was that the cron pipeline would have happily generated them again the next day and the day after that until someone noticed.

Root cause: two compounding gaps in the state machine. The content_approve handler in review-workflow.ts only advanced the content status; the topic status stayed at whatever the draft-runner left it, which meant a successfully published piece could leave its topic in drafted (happy path) or approved (if the Slack post mid-draft failed). Trend-scanner had a getPublishedContentTitles() dedupe; draft.ts did not. Then the May 12 DB merge brought two topics from the forked database in at status='approved', and the next day's 10 AM cron drained them.

Fix in two parts. A defensive guard in draft-runner.ts that imports getPublishedContentTitles, builds a lowercase Set once per run, and skips and archives any topic whose title matches an already-published title. Re-drafting becomes structurally impossible regardless of upstream state-machine leaks. A state-machine fix in review-workflow.ts that calls updateTopicStatus(content.topic_id, 'archived') when the content_approve case fires with a non-null topic_id.

The general lesson: a state machine is only safe when the invariants hold from both directions. The trend-scanner had the dedupe; the drafter did not. Now both do.

Failure 4: The 409 That Was a Success

On May 2 the Instagram carousel publish for a T3 piece returned an HTTP 409 from Late.dev — "exact content already scheduled," with an existingPostId field pointing at the post the request had just created. The carousel had successfully scheduled. The response said it had failed.

Root cause: Late.dev's API was returning a duplicate-detection error against requests it had itself just enqueued, before its internal scheduler reconciled them. The 409 was a race condition between insert and dedup-check.

Fix: a try/catch around the IG publish call that catches the 409, parses the existingPostId from the error response, and treats it as success — inserts a publication row pointing at the returned ID, marks the content row as status='published'. The fix lives in publisher.ts > publishToInstagram.

The general lesson: integrations with vendor APIs accumulate vendor-specific quirks. The fix is not to file a support ticket and wait. The fix is to handle the quirk inside your wrapper and move on. The May 2 incident produced Hard-Won Lesson #21 — the corpus reference to the broader pattern of catching false negatives at the integration layer.

Failure 5: The Named Foil

On May 4, the contrarian piece "Agentic Coding Isn't the Trap. Supervising From Your Head Is." named the writer of the original argument I was rebutting and proceeded to characterize their position in ways that pushed beyond what they had actually written. Twelve days later, on May 16, the author of the original piece pushed back publicly in the LinkedIn comments — quoting their own piece to show they had never advocated the specific thing I had implied they advocated.

The pushback was fair. The strawman risk had been highest precisely because their position was close enough to mine that the extrapolation felt safe. I acknowledged the correction publicly on LinkedIn, added an editor's note at the top of the original Ghost post linking back to the comment, and shipped a new reusable script (scripts/add-editors-note-faye.ts) that uses the Ghost JWT auth pattern to add notes idempotently to any post.

Root cause: a voice-and-discipline gap, not a code gap. Two patterns compounded — naming a foil author in the prose, and using the negative-parallelism title pattern ("X Isn't Y. Z is.") that depends on a strawman to work.

Fix: two new entries in the feedback memory. The first bans the "X isn't Y. Z is." title and lede pattern across the corpus. The second bans naming the contrarian target in prose — the link to the source piece can stay, the URL slug can carry the author's name, but the in-prose attribution does not. Both rules are now part of the auto-loaded session context. Subsequent pieces — the May 19 Goodhart piece responding to a field guide, the May 20 co-design piece responding to an academic article — followed both rules and shipped clean.

The general lesson: the corpus is the residue of editor's notes. Every voice-discipline rule worth keeping was learned from a specific incident where shipping without it produced a public correction.

What Survives

The current stack is the survivor of these five and a dozen smaller incidents I am not writing up. The pieces of it that look obvious in retrospect — the stale-lock defense, the canonical-DB symlink, the dedupe guard in the drafter, the 409 catch in the publisher, the named-foil ban in the lint — each one came from a specific incident the original design did not anticipate.

The stack is not what I planned. It is what is left after the failures pruned the parts that did not work. Anyone reading the Saturday architecture piece is looking at the convex hull of those five corrections, plus the smaller ones, plus the parts that worked the first time.

Show your stack. Show the failures that shaped it. Show the editor's notes. The thing that ships is the thing that survived.

What My AI Workflow Actually Costs Per Month

Michael Tuszynski — Mon, 25 May 2026 16:30:09 +0000

Most "AI is expensive" discourse is vague. The pieces that quote real numbers usually quote enterprise tier list prices for tools the writer does not personally run. The pieces that talk about personal use rarely quote any numbers at all.

This is the ledger for a working personal AI stack that ships a five-surface daily content pipeline, runs four cron jobs, holds a SQLite memory database, and supports about thirty published pieces a month. Real line items, monthly recurring, in the rough order of largest to smallest.

The Numbers

Claude Max subscription: $200/month. This covers Claude Code, the writing model for every draft, the editor on every revision, the OAuth identity for tool integrations. No separate API key needed for Pro/Max users on most workflows. The two-hundred-a-month tier gives me the rate limits I need for daily use; the lower Pro tier at $20 ran out of capacity inside the first heavy week.

Anthropic API spend, outside Max: roughly $30–60/month. Used for the cron-driven trend-scan, digest, and draft pipeline that runs without an interactive Claude Code session — Sonnet for T1/T2 drafts, Opus for T3, occasional Haiku for the lint-eval cycle. Spend tracks topic volume. Slow weeks land near $30; busy weeks with deep research on every piece push toward $60.

Late.dev paid plan: about $30/month. Hit the 20-post free cap on April 26. The upgrade was immediate because the alternative was bifurcating publishing across two manual flows for LinkedIn/X/IG. Current usage running 60+ posts a month across the three social surfaces means about $0.50 per post in distribution cost.

Ghost Pro hosting: $25/month. The standalone option for self-hosting at lower cost on a VPS exists, but Ghost Pro covers backups, CDN, email delivery, and admin auth for less than the time cost of running it myself. The five-surface piece always starts at Ghost as the canonical URL.

Firecrawl: $20/month. The base plan covers the trend-scan crawls and the per-piece research lookups (≥3 sources per T3 draft). Usage tracks topic generation rate, not piece count. Slow research months stay under the cap; weeks with five separate contrarian sources per piece can push over.

Dev.to, Cloudflare, GitHub, Cursor (for occasional sidecar work): $0–20/month. Dev.to is free for publishers. Cloudflare on the free tier handles DNS and Access for the dashboards. GitHub Free for personal repos. Cursor I use for one specific kind of work outside Claude Code; the free tier has been enough this month.

NAS storage and home server: roughly $15/month amortized. A Synology DS920+ bought outright in 2023, running the canonical content-engine path mount and a few Plex services. Cost is electricity plus a notional five-year amortization of the hardware. Not strictly AI spend; without the NAS the content DB would live on a $5/month VPS instead.

Domain registrations (mpt.solutions, mpt.codes, a couple others): about $5/month amortized. Annual renewals divided by twelve.

The Total

Run the line items: $200 + $45 + $30 + $25 + $20 + $10 + $15 + $5 ≈ $350/month, with seasonal variance pulling it to roughly $320–420 depending on usage.

That is the gross. The net story is more interesting.

What This Replaces

Look at the equivalent enterprise plan that would deliver the same operator experience.

Microsoft 365 Copilot Business is $21/user/month, but that covers AI in Word and Excel and Outlook — none of which is the agentic coding loop. Add ChatGPT Team at $30/user/month for the writing side. Add Cursor Business at $40/user/month for the coding side. Add an enterprise scheduling tool like Buffer for the social fanout at $100/month. Add a CMS subscription at Ghost-or-equivalent rates. Add an enterprise search API like Tavily or SerpAPI for the research crawls at $100/month.

That assembled stack runs roughly $300/user/month for the licenses alone, and produces an experience that does not include any of: persistent cross-session memory, lint enforcement against my voice guide, queue-driven scheduled publish, custom hooks against my git workflow, or the SQLite schema that makes the dedupe and the cross-platform reconciliation work. Those parts would still need to be built on top.

The personal stack costs less than the enterprise license stack and does more, because the wrapper is mine and the wrapper is where the payoff lives. This is the compose-the-stack argument from May 21 in dollar form. The vendor positioning charges you for the model and the surface. The personal stack pays the vendor for the model only, and you build the surface.

Where the Spend Lands Wrong

Honesty about waste matters more than the gross number.

The Anthropic API spend on the cron-driven pipeline is the line item with the worst yield. The AI-generated drafts get used about three days out of ten — the other seven, I write the piece by hand and run only the publisher path. The cron pipeline costs about $30/month in API calls to produce drafts that mostly get discarded in favor of human writing. I keep it running because the digest output is useful even when the drafts are not, but the marginal API call against an Opus draft I will not ship is the easiest line item to defend cutting.

The Firecrawl cost runs higher than it needs to because the trend-scan queries are overlapping — TechMeme returns about 40% the same stories as the Hacker News pull, and the Reddit subreddit list is wider than it needs to be. A focused trend-scan would hit the same signal at half the credit cost.

Late.dev is the line item with the worst risk profile. Single-vendor dependency on a fast-moving startup for three of the five surfaces. The May 2 IG 409 false-negative incident was an example of where the vendor's behavior diverges from documented contract, and the cost of switching is rewriting the publisher integration in publisher.ts. Not painful, but real.

Where the Spend Lands Right

The Claude Max subscription is the line item with the largest payoff by an order of magnitude. The work done in interactive Claude Code sessions — the actual writing, the actual drafting of these posts, the actual debugging of the publisher integration, the actual building of the personal stack itself — is what produces value. Cutting $200 there would tank the entire output. Doubling it would not change the output much, because the bottleneck is what I think and write, not the agent's rate limit.

Ghost Pro is a small line item that has zero failure modes. Self-hosting would save $25/month at the cost of recurring incidents that compound over a year. The premium for not having to think about CMS uptime is the right premium.

The lint-and-publisher infrastructure is free in operating cost and produces all of the consistency value. The compounding piece of this stack is not the line items that cost money. It is the parts I wrote myself that do not.

The Comparison That Matters

A senior engineer at a company that has not yet rolled out an enterprise AI plan can run a personal stack equivalent to mine for about $350/month, build it in a weekend, and ship faster than the team will officially be allowed to ship for the next eighteen months. The math against the eventual enterprise plan that lands in 2027 will look like a rounding error.

A senior engineer at a company that has rolled out an enterprise AI plan can run the same stack for the same $350/month, with the same independence, regardless of whether the official plan is good. The official plan being good is not a precondition for the personal stack working. The official plan being bad is not a reason to wait either.

The cost is real. The payoff is realer. Three hundred and fifty dollars a month is what it costs me to ship at this rate. The exact mix will be different for everyone. The order of magnitude will not be.

The Five Hooks That Change How You Ship With Claude Code

Michael Tuszynski — Sun, 24 May 2026 20:21:04 +0000

The dotfiles piece from May 22 named hooks as one component of a personal AI stack and moved on. They deserve more than a passing mention. Hooks are the primitive that turns taste into code — the editor's auto-format-on-save for AI work, run on the agent's actions instead of yours.

Anthropic's hooks documentation lists eight event types. Most published examples wire up one of them, demo a tiny safety check, and stop. The real payoff is in pairing the right hook to the right invariant for your work. Below are the five hooks I run, with the actual invariants they enforce.

Hook 1: PreToolUse — Guard the Destructive Commands

PreToolUse fires before any tool call, with the tool name and arguments. The hook can approve, deny, or rewrite. The high-yield use is denying classes of commands you never want the agent to run unattended — rm -rf, git reset --hard, git push --force to main, gcloud auth revoke, kubectl delete against production, anything with a flag that turns "ask first" into "do it now."

The shape of the hook is a small shell script that reads the tool call from stdin, pattern-matches on dangerous combinations, and exits with a non-zero status to deny. Examples in production:

Deny rm -rf against any path outside /tmp.
Deny git push --force to main or master regardless of remote.
Deny --no-verify on git commit unless ALLOW_NO_VERIFY=1 is set explicitly in the session env.
Deny any gh pr command with --admin or auto-merge flags.

The hook is a safety net, not a configuration. The agent already knows not to do these things. The hook catches the case where it almost did anyway.

Hook 2: PostToolUse — Auto-Lint Every Write

PostToolUse fires after a tool call completes, with the tool name, arguments, and result. The hook reads, runs whatever side effect you want, and returns. For file writes, this is where you run linting, formatting, type checking, and any project-specific guard.

The shape: a hook that filters for tool_name == "Write" || tool_name == "Edit", then runs the relevant linter against the file path that was written. In my setup this means prettier --write for JS/TS, ruff check --fix for Python, shellcheck for bash scripts. The hook does not block the agent's next action — by the time PostToolUse runs, the write has already happened. It does silently fix what it can and report what it cannot.

The win is consistency. Every file the agent writes ends up formatted the same way as every file I write. The agent's "I wrote this fast and ugly" output is indistinguishable from a deliberate commit.

Hook 3: Stop — Session-End Auto-Commit with Stale-Lock Defense

Stop fires when the agent finishes responding. This is the hook most people skip and where the highest payoff lives.

My Stop hook runs git add -A && git commit -m "<auto-commit message>" against any repo I have configured in ~/.claude/hooks/session-end-commit.sh. Every Claude Code session ends with a snapshot. I can always see what changed in the session because it is a real commit in real history.

The interesting part is the stale-lock defense. On May 15 I discovered the session-end hook had been silently failing for eleven days against a 0-byte .git/index.lock file left behind by a crashed git process on May 3. The fix was a five-line block that checks for the lock file, verifies its mtime is older than five minutes, verifies no process holds it via lsof, and only then removes it. Live process: preserved. Stale lock: cleaned. Healthy auto-commits complete in under a second, so the five-minute threshold cannot race a real concurrent run.

The lesson generalizes. Hooks accumulate edge cases. The first version of any hook works on the happy path. The version that survives a year of daily use is the one that handles the failure modes you discovered along the way.

Hook 4: SessionStart — Auto-Load Context

SessionStart fires when a new Claude Code session opens. This is where you pre-load the context your work needs every single time. The point is removing the recurring "read these three files first" prompt from your routine.

Mine loads:

The current state of the project's SESSION-STATE.md — what's in progress, what's blocked, what's next.
The relevant agent context file from ~/nexus/agents/ based on the directory I am working in.
A condensed log of yesterday's work — the last day's commits across the active projects.
The active tasks from the queue file if one exists in the project.

The hook returns text that gets injected into the session as a system reminder, the same shape as the auto-memory mechanism. By the time I type my first prompt, the agent already knows where I left off, what I am working on, and what is on the next-action list.

This is the hook that turns Claude Code from a stateless assistant into a continuous-with-me collaborator without changing anything about the underlying model.

Hook 5: UserPromptSubmit — Prompt-Level Guards

UserPromptSubmit fires before the agent sees your prompt. The hook can rewrite the prompt, append context, or block submission. Most uses I see in the wild are filters for safety words, which is the boring case. The interesting cases are project-specific guards.

Examples I run:

If the prompt contains "ship" or "publish" or "deploy" against the content-engine repo, the hook injects a reminder of the --ship flag protection and the manual-publish pattern.
If the prompt is a single command verb against a production directory (run, start, deploy), the hook injects the relevant CLAUDE.md section that explains the safer alternative.
If the prompt mentions a person whose name appears in ~/nexus/agents/personal-contacts.md, the hook injects the relevant context — old college roommate, current employer relationship, the prior interaction — so the agent does not treat the message as cold-outreach.

The hook does not block the prompt. It supplements it. The agent sees a richer version of what I typed, with context I would have had to remember to include otherwise.

What Hooks Are Actually For

The pattern across all five is the same. The hook encodes a rule I will not enforce manually because I will forget. The forcing function is that the hook runs every single time, regardless of whether I remembered to invoke it.

This is the same reason auto-format-on-save changed how teams write code in the 2010s. Not because format-on-save is technically interesting. Because the alternative — remembering to run the formatter every time — fails reliably enough that the team's code drifts from the style guide within a quarter.

Hooks for AI work are the same primitive at a different layer. They are how individual operators encode the rules the institutional plan is still drafting. The team that ships with the same lint, the same auto-commit, the same context-loading, every single Claude Code session — across every engineer — has built something the enterprise rollout document will be eighteen months catching up to.

Build yours. The five above are a working starting point.

Inside the Stack I Ship From Daily

Michael Tuszynski — Sat, 23 May 2026 16:30:04 +0000

Yesterday's piece prescribed building a personal AI stack instead of waiting for the enterprise plan. The natural objection — "fine, but what does that actually look like" — deserves a concrete answer. So here is mine, opened up.

This stack ships a five-surface content pipeline daily, on cron, with file-based memory, lint enforcement, and a queue-driven publish runner. None of it is exotic. All of it is small enough that one operator built it on evenings, and nothing in it depends on anyone else's roadmap.

The Directory Map

The whole thing lives in three top-level directories.

~/services-local/content-engine/ holds the active runtime — TypeScript ESM under src/, scripts under scripts/, drafts under drafts/, the SQLite DB at data/content.db, the LaunchAgent log paths under logs/. About 4,000 lines of TypeScript across roughly fifteen source files. Nothing in here is a framework. Each file does one thing.

~/.claude/ holds the Claude Code configuration that drives my interactive sessions — slash commands under commands/, hooks under hooks/, the keybindings file, the settings layers (global, project, local). The hooks are how I encode my own non-negotiables. The commands are how I encode the workflows I run every week.

~/nexus/ holds the agent context files and the memory index. MEMORY.md is a one-line-per-entry index that gets loaded into every Claude Code session via the auto-memory mechanism. The actual memory entries live next to it as one file each — feedback_*.md for behavior rules, project_*.md for ongoing work context, user_*.md for personal preferences, reference_*.md for pointers to external systems. Filesystem-backed, append-only, indexed, survives any model deprecation.

The Pipeline

Four cron jobs do the real work, scheduled via macOS LaunchAgents under ~/Library/LaunchAgents/ai.nexus.content-*.plist.

trend-scan at 7 AM PT pulls topics from TechMeme RSS, Hacker News Algolia, fifteen Reddit subreddits, and Firecrawl search queries. About 45 new topic rows land in topics each morning, scored on a relevance weight, status set to proposed.
digest at 9 AM PT, weekdays posts the top-scoring topics into a Slack channel with Approve/Reject/Tier buttons. I either approve a topic or reply with a URL of my own.
draft at 10 AM PT picks up approved topics, runs Firecrawl research to pull at least three sources, generates a draft via the writer module (Sonnet for T1/T2, Opus for T3), runs lint, posts the draft into Slack with Approve/Edit/Reject buttons.
publish at 11 AM PT picks up approved drafts and ships them through publisher.ts to Ghost (T3 blog) → Dev.to (cross-post with canonical URL from Ghost) → LinkedIn and X via Late.dev → Instagram carousel via Satori-rendered slide PNGs.

The schema is four tables: topics, content, research, publications. Status fields drive the state machine: topics flow proposed → approved → drafted → archived; content flows draft → lint_passed → pending_review → approved → published. Publications get a row per successful platform delivery with the external ID and external URL for later reconciliation.

The Manual Pattern That Coexists

The AI-driven pipeline above ships when I let it. Most days I write the piece by hand instead, in a drafts/*.md file under a structured header pattern — one second-level heading per surface (long-form blog body, LinkedIn body, X body, Instagram caption, hashtag lists, slide carousel JSON), parsed at publish time by the same script that runs the platform fanout.

Each manual draft gets a matching scripts/publish-<slug>.ts script that requires an explicit --ship flag — bare invocation exits without publishing — parses the draft into surface-specific content rows, calls the same publisher.ts functions the cron uses, and writes status updates back to the DB. Same five-surface fanout. Same lint records. Same publications rows. The difference is that the writing is mine line-by-line instead of generated.

Both paths converge at the publisher layer. The AI pipeline and the manual pattern are two front ends to the same back end.

The Lint Layer

src/lint.ts enforces voice. Roughly fifty banned words from my voice guide — the usual marketing-prose tells, the kind a reader recognizes on sight. Fifteen banned phrases. Word-count ranges per tier (T1: 50–200, T2: 150–600, T3: 600–2000). No question openers. No generic "state of the industry" openers. Concrete-example heuristic for T2+. Inline citation count minimum for T3 — at least three markdown hyperlinks.

The lint is the line that catches drift. It catches the banned word I almost shipped yesterday — the wrapper-pattern post originally used a different word in the backlink that lint refused, prompting me to rename and re-link without breaking the citation. It catches negative-parallelism title patterns I trained myself to write before I had banned them.

The taste lives in the lint file. Anyone reading it can see what I will not ship.

The Memory Loop

MEMORY.md is loaded into every Claude Code session at session start. It is an index, not a memory — one line per entry, each pointing to a separate *.md file in the same directory. The actual memories are typed: feedback_* for behavior rules, project_* for context that decays, user_* for stable preferences, reference_* for pointers to external systems.

This is the wrapper-pattern argument from May 3 in working form. Vendor memory is not durable across providers or model deprecations. Files are. Every memory in this system survives Claude version changes, model deprecations, and provider switches. The only operation that ends a memory is me deleting the file.

The Queue and the Wrapper Layer

A queue file at queue/posts-queue.json lists pre-drafted pieces with target dates and the publish script for each. A runner script reads the queue at noon PT daily, picks today's pending entry, executes its publish script with --ship, marks it shipped on success or leaves it pending with a logged error on failure. This was yesterday's compose-the-stack argument in working form — Claude Code as the writing worker, a hand-rolled cron-driven orchestrator as the durable runtime.

The whole orchestrator is about 90 lines of TypeScript. It does not need to be more.

What This Stack Does Not Do

It does not optimize for anyone but me. It does not have a UI. It does not have a settings page. It does not scale to a team of fifty without rewrites. It does not handle multi-tenant. It does not have a billing layer. None of those features would improve my daily ship rate. All of them would slow me down.

The point of a personal stack is that the operator and the user are the same person. The constraints that drive enterprise product complexity — onboarding, support, multi-tenancy, role-based access — disappear. What is left is the substrate, the pipeline, and the taste.

The Replication Cost

Most of what is in here is replicable in a weekend.

Skills, hooks, slash commands, and MCP servers ship with Claude Code. The publisher layer is platform SDKs wrapped in 488 lines of TypeScript. The lint layer is regex matching plus a banned-word list. The memory layer is a directory of markdown files and a one-line index. The queue runner is ninety lines.

The reason most engineers do not have a stack like this is not technical difficulty. It is the absence of a forcing function. Daily shipping is the forcing function. Once you commit to publishing every day, you find out within a week which parts of the workflow are friction and which parts are taste. The friction gets automated. The taste gets encoded in lint. What remains is the writing.

That is the stack. The components are boring. The discipline of having them all wired together is the asset.

Your Personal AI Stack Is the New Dotfiles

Michael Tuszynski — Fri, 22 May 2026 17:23:40 +0000

Every senior engineer who has shipped meaningful work in the last thirty years has carried a personal dev environment with them. Emacs configs, vim plugins, shell aliases, dotfiles repos, custom prompts, terminal multiplexer setups, a handful of scripts that exist only on their laptop and do exactly what the work needs. Nobody waited for IT to mandate the right .bashrc. The configurations that actually got used were the ones tuned to the operator, by the operator, and accumulated over years.

AI adoption is the same shape, on a thirty-year delay. The "wait for the enterprise plan to roll out" path is the same path that left people running Outlook in 1998 while the early adopters ran their own mail server with elm and procmail. The configuration that wins, again, is the one tuned to your work — not the team average.

The Institutional Lag Is Structural, Not Solvable

The enterprise AI committee, the IT rollout, the sanctioned LLM provider, the official acceptable-use policy — these are eighteen to twenty-four months behind what the team's power users already do. The cause is structural. Committees cannot iterate at the rate of an individual operator who is using the tool every day and rewiring their workflow weekly. Putting better people on the committee does not fix this; the structure itself caps the rate of change.

The historical record is unambiguous. Git was an individual-power-user tool from Linus's 2005 release through about 2010, and only became enterprise standard somewhere around 2015 — a full decade after it existed. As of the 2025 Stack Overflow Developer Survey, Git sits above 90% adoption across professional developers. The enterprise mandate followed the power-user adoption by years. Same story for Slack (founded 2013, dominant by ~2019), Docker (released 2013, enterprise standard by ~2017), VS Code (released 2015, dominant IDE by ~2019). The mandate always followed.

The people who outperformed in each of those windows were the people who adopted early, built personal infrastructure around the new tool, and accumulated workflow taste before the enterprise plan caught up. In every case, the official plan eventually arrived, and in every case it was late, incomplete, and missing the discipline-specific patterns the power users had already worked out. The same thing is happening with AI right now.

What a Personal AI Stack Actually Is

The concrete components are not exotic. Most of them ship in the tools you already have. The work is in assembling them.

A persistent memory layer in files you own. CLAUDE.md, MEMORY.md, per-project context files, an agents/ directory of role-specific context. Not vendor memory. Filesystem memory that travels with you across providers and survives any model deprecation. This is the wrapper-pattern argument from earlier this month.

A hooks system that enforces your taste. Anthropic shipped hooks in Claude Code — PreToolUse, PostToolUse, Stop, SessionStart, SubagentStop, UserPromptSubmit. The hooks are how you encode your own non-negotiables: don't let the agent run a destructive command without confirmation, lint every write, log every session, refuse to commit with TODO markers. The hook is the editor's auto-format on save for AI work.

A set of slash commands for your repeatable workflows. The five or six things you do every week — the standup digest, the PR review pass, the architecture sketch, the test triage — get encoded as one-character invocations. The commands are personal because the workflows are personal.

Skills, the procedural memory layer. Anthropic's skills documentation covers the platform-native version. The open standard at agentskills.io makes skills portable across agents — Claude Code, Codex, Gemini CLI, the Hermes orchestrator from yesterday's piece. A skill captures a pattern you have already executed enough times to formalize.

MCP servers wrapping the tools you actually use daily. Not a marketplace download. A small set of MCP integrations for the specific systems your work touches — your data warehouse, your project tracker, your finance system, your private docs. Most people will end up writing one or two themselves; the rest can be borrowed.

An orchestrator-worker compose. Claude Code as the in-session worker, a wrapper like Hermes Agent (or one you write yourself) as the durable cross-session orchestrator. The compose pattern was the argument of yesterday's piece and it is the structural answer to single-vendor lock-in.

That is the kit. None of these components is hard individually. The work is in assembling and tuning them to the actual job.

Why "The Way You Want" Matters

Enterprise AI plans optimize for the median user, which is by definition not you. The median user does not have your discipline-specific edge cases, your taste in code, your judgment about what is worth automating, the specific failure modes you have learned to anticipate from a decade of doing the work. The committee output is a lowest-common-denominator policy, and lowest-common-denominator policies produce lowest-common-denominator outputs.

A personal AI stack optimizes for the operator, which is you. The skill that captures your specific way of running a PR review will outperform a generic prompt template. The hook that enforces your team's actual code conventions will outperform the model's default style guide. The memory file that holds your project's actual history will outperform a context window that starts empty every Monday.

How the Personal Stack Becomes the Official One

This is the part the institutional planners get wrong. Every enterprise standard started as one person's hobby project. The path is consistent across thirty years of tools: someone builds it for themselves; it outperforms the team's sanctioned approach; other engineers adopt it informally; the informal pattern becomes "how we do this here"; eventually official sanction follows, or the official plan is quietly replaced by the personal pattern.

This is happening at companies right now with AI infrastructure, in places where the official plan has not yet arrived. A working content pipeline that ships across five surfaces a day with a SQLite memory database and a hand-rolled orchestration layer — for a concrete example, the kind of system the marketing team would have built if there were a paved road — starts as one engineer's weekend project and ends as the de facto company standard. The official plan eventually arrives and either ratifies the existing pattern or admits it lost.

The Honest Caveat

Some employers will discipline shadow tooling on principle. If your environment is one of those, you have to play by it. But most companies do not. Most companies have a vague "AI policy in progress" posture that buys nine to eighteen months of operator latitude, and the operators who use that window will be the ones authoring the policy when it eventually drops. The right posture during that window is the same posture senior engineers have always taken with personal infrastructure: do not ask permission for your own dev environment, ship value, let the work speak.

The Window

The official AI adoption plan at most companies will land in 2027 or 2028. It will be late, incomplete, and miss the discipline-specific work you do. The personal AI stack you build in 2026 is the only piece under your direct control. The institutional plan will, as it has every time before this, eventually follow the people who built theirs early.

Build the stack you want. Make it the official one by being the person who knew how before the committee did.

The Coding Agent Stack Has Two Layers

Michael Tuszynski — Thu, 21 May 2026 15:14:38 +0000

The current "Hermes Agent vs Claude Code" framing is the wrong comparison. The two tools live at different layers of the coding agent stack, and most of the YouTube hot takes treating them as alternatives are reading them as if they competed for the same job. They do not. Claude Code is a worker. Hermes is an orchestrator that can use Claude Code as one of its workers. The argument is not which to pick. It is which layer you are optimizing.

Here is what is actually different between them, and where each one earns its keep.

What Claude Code Is

Claude Code is Anthropic's official CLI, with native access to Opus 4.7, Sonnet 4.6, and Haiku 4.5 — currently the strongest production-grade coding models. It runs on your machine, in your terminal or IDE, and pairs with the Claude Max subscription via OAuth so most users do not need a separate API key. The native tool-use loop — Read, Write, Edit, Bash, Task, Grep, Glob — is tight, the hooks system (PreToolUse, PostToolUse, Stop, SessionStart) is mature, MCP integration works, and the recently shipped /goal command added single-session unattended completion loops in v2.1.139.

Claude Code is stateless across sessions by design. Every conversation starts in an empty room. The --resume and --continue flags restore a single recent session; there is no persistent memory layer that surfaces what you worked on last Tuesday.

This is a feature, not a bug, if your work fits inside the session. Single-machine, in-the-loop coding work — pair programming with the agent, debugging a specific issue, refactoring a module, writing a script — is where Claude Code is hardest to beat. The model quality shows up most in the lines of code that get written between tool calls, and on raw coding tasks where the answer fits the context window, the 18-task comparison published this week shows Claude Code wins its share — four of eighteen tasks went to it on raw coding chops alone.

What Hermes Is

Hermes Agent is the open-source orchestrator from Nous Research. Its v0.13.0 "Tenacity Release" shipped May 7 with persistent /goal loops, durable multi-agent Kanban with heartbeat and retry budgets, checkpoints v2, and post-write delta lint. The repository as of that release counts 864 commits and 588 merged PRs from 295 contributors — fast-moving but real.

The architectural difference from Claude Code is in three places.

First, memory is persistent and indexed. Hermes ships with a SQLite database under FTS5 full-text indexing that holds every session you have ever run through it. When you ask it to "fix the bug we were chasing on Friday," it greps Friday's transcript, pulls the relevant turns into context, and resumes. The "Honcho dialectic user modeling" layer builds a deepening profile of how you work across sessions. This is the single biggest functional gap with Claude Code.

Second, the worker model is pluggable. Hermes does not write code itself in the way Claude Code does. It dispatches the actual code-writing to whichever model or CLI you have configured — OpenAI, OpenRouter, Nous Portal, Anthropic through API, or by shelling out to the claude CLI directly. The most common production setup right now is "Hermes orchestrates, Claude Code does the typing." That is not Hermes competing with Claude Code; that is Hermes wrapping it.

Third, it runs anywhere. Six terminal backends — local, Docker, SSH, Daytona, Singularity, Modal — mean a Hermes session can hibernate on a serverless platform, resume on a phone, or run unattended on a remote box. Claude Code is single-machine by design.

In the same 18-task comparison, Hermes won fourteen of eighteen. The four it lost, it lost on raw coding. The fourteen it won, it won by remembering work from earlier sessions.

Where Each One Loses

Honesty about the weaknesses matters more than feature lists.

Claude Code's actual weaknesses: stateless by default; tied to Anthropic models (with the upsides and downsides that come with vendor concentration); no native cross-session memory of any depth; single-machine; the plugin/skill marketplace is still forming. If your bottleneck is institutional context that builds over time, Claude Code does not solve for it. You have to build the wrapper yourself, which is what the wrapper-pattern argument from May 2 is about — file-system-backed memory you own and bring to every session.

Hermes' actual weaknesses: open-source moving fast means flaky updates and rough edges. The Python dependency surface is real. Setting up the persistent memory store, configuring providers, getting the right backend running, choosing the right model for each subtask — this is operator-grade work, not consumer-grade. The codebase shipped eight P0 security closures in the v0.13.0 release notes, which tells you both that the project is being maintained seriously and that it was shipping with security holes weeks before that. The skill autocreation feature can manufacture procedural memory that is wrong, and there is no perfect way to audit a self-modifying skill base.

If you do not want to run a small piece of personal infrastructure, Hermes is not for you. If you do, Claude Code on its own leaves the persistent-memory layer unbuilt.

The Decision Matrix

The question "which one should I use" decomposes into "what is the work I am trying to do."

Use Claude Code, on its own, when: the work fits in a single session; you are in front of the machine; the answer is code that needs to be written, not context that needs to be remembered; you want the strongest available coding model with the lowest setup friction; the cost of vendor concentration on Anthropic is acceptable. This covers most ad-hoc coding sessions for most developers.

Use Hermes, with Claude Code as the worker, when: the work spans days or weeks; institutional context (project history, prior decisions, partial state) matters more than raw coding speed on any one task; you want unattended runs (overnight, cron-triggered, mobile-initiated); you need parallel subagents on a Kanban; you want provider portability so you are not single-vendor; you can absorb the setup cost of running personal infrastructure.

Use both, in different roles, when: you do daily focused work in Claude Code for the in-session productivity, and run Hermes as the durable layer for cross-session continuity. This is the pattern that is starting to dominate among heavy users. The two stop competing the moment you treat Claude Code as a worker and Hermes as the orchestrator that calls it.

The Layer Question

The framing "vs" loses something important. Most coding-agent debates this year have been arguments about features that turn out to be at different layers of the stack. The persistent-memory question is at the orchestrator layer. The model-quality question is at the worker layer. The tool-loop question can sit at either. The IDE-integration question is at the worker layer. The unattended-run question is at the orchestrator layer.

If you keep arguing about which agent is best without naming the layer the argument is at, the argument never lands. Once you do name it, the right answer is usually both, in different roles.

Claude Code is the strongest worker available today. Hermes is the strongest open-source orchestrator that wraps a worker like Claude Code. The compose case is where the real productivity lives. The vendor positioning makes them look like alternatives. The architecture says they compose.

You Can't Co-Design What You Don't Operate

Michael Tuszynski — Wed, 20 May 2026 21:59:23 +0000

An article circulating this week argues that faculty AI buy-in in higher education is a human factors engineering problem. The framing is correct. The path the piece describes skips the only two steps that matter, and the reason it skips them is structural, not pedagogical.

Start with the framework on its own terms. Human factors engineering, as a discipline, is most rigorous in the places where mistakes kill people — aviation, medicine, nuclear operations, military command. In none of those places does participatory design mean asking operators to author protocols for systems they have not yet operated. Crew Resource Management in commercial aviation was built by pilots who had logged thousands of hours on the platform. The accident-investigation literature, the cognitive task analyses, the checklists, the cross-checks — all of it sits downstream of operator-grade familiarity. Mature HFE practice in industrial settings treats prerequisite familiarity as a precondition for authorship, not as a parallel track. The order is fixed: operate the system, then design the safeguards.

The Step the Article Skips

The piece on faculty AI buy-in moves directly from "engage faculty as co-designers" to the outcomes — trust, transparency, governance, alignment with academic values. The prerequisite that holds every successful HFE program together never appears in the prose. The article asks faculty to co-design governance for tools the average faculty member has used for less than ten hours total, primarily in artificial training contexts.

What the article describes as co-design is closer to structured surveying. Faculty in a one-hour ChatGPT workshop can tell you what the demo felt like. They cannot tell you which boundaries a graduate seminar in clinical psychology needs around hallucination, or which retention defaults a research-methods course needs around student-generated prompts, or which provenance attribution rules an introductory writing course needs to keep its rubric honest. Those are the governance questions that matter. Surface familiarity produces surface governance.

What the article wants — discipline-specific, boundary-aware, defensible against edge cases — requires sustained use in the actual work. Faculty have to teach with the tool, grade against the tool, fail against the tool, and revise around the tool, for weeks or semesters, before they can author governance worth shipping. The discipline has a name for this kind of sustained operation in the actual work, and the name is praxis.

The Sequence That Makes the Outcomes Hold

The order matters because what comes out of a co-design session is exactly proportional to what its participants have actually done with the tool. A committee composed of operators who have spent a semester working through real student artifacts produces governance that survives the first stress case. A committee composed of policy interpreters who watched a demo produces governance that fails on contact with real coursework.

The fix is a sequencing change: praxis programs first, in disciplines, with real workflows and instructional artifacts, for at least one cycle of student work. Governance authorship after. The order is not optional, and the patience required to hold it is the part most institutions cannot afford politically. The faculty AI committee is sitting now; the spring catalog is locked; the student-affairs office wants a policy by July. So the committee is asked to ship governance from surface familiarity, and the result is governance theater.

The Second Step Hidden in Plain Sight

There is a second reason most institutions cannot deliver participatory design on AI even when they want to, and this one has nothing to do with pedagogy. By the time the faculty AI committee convenes, the enterprise contract has already been signed. Microsoft 365 Copilot for Education was procured eighteen months ago. The Google Workspace AI add-on, the OpenAI Edu tier, the Canvas-integrated AI tutor — all already on the books, with contract terms negotiated by procurement and counsel against the vendor's standard data-protection and indemnity language.

The actual policy surface — data flows, retention windows, opt-out defaults, training-data carve-outs, accountability allocation, liability for hallucinated outputs reaching students — was decided in that contract. What the faculty AI committee ships from here is acceptable-use guidance inside a perimeter that was drawn elsewhere by people the committee never met. Co-design at the policy layer is downstream of choices that already foreclosed most of what could be co-designed.

This is the same structural pattern that shows up whenever software arrives through the procurement door instead of the operator door. The real co-design moment is the moment the contract is being negotiated. The operators are not in that room. By the time the operators are in the room, the room has been redecorated, and the decisions that needed operator input are the wallpaper.

The Reframe

The vocabulary the discussion runs on is part of the trouble. Buy-in is a marketing term. It implies persuading a population to consent to a decision that has been made. Higher-ed faculty are operators of AI workflows in disciplines where errors compound — into student records, into transcripts, into citations, into degree credentials. Authorship is the target the framework actually requires.

Authorship requires praxis. Praxis requires sustained operation in the actual work. Sustained operation requires that the procurement phase admit it is the policy phase, and seat operators where the contract gets negotiated. The article describes the destination correctly. Trust, transparency, governance, alignment — all of those are the right outcomes. The path it draws skips the only two steps that can produce them.

What This Looks Like In Practice

For an institution willing to do the work, the program structure is concrete. A nine-to-twelve-month operator residency for each discipline before its AI governance is drafted, structured around real student artifacts and graded course outputs. A standing seat for faculty operators in the procurement workstream, with veto power on terms that touch retention, training-data use, and provenance. An explicit acknowledgment in published policy that the contract terms are the upstream constraint, named and dated, so the limits of faculty authorship are honest and visible. A sunset clause on every contract that returns the policy surface to renegotiation when the operator cohort says the boundary is wrong.

None of this is the part faculty AI committees are currently asked to produce. All of it is the part the human factors engineering frame, taken seriously, would require. The framework is right. The implementations being shipped this year are the framework with the prerequisites filed off.

Higher education will get AI governance worth defending only when the operators arrive before the contract is signed and the praxis arrives before the committee meets. Until then, what most institutions are calling co-design is a way of borrowing the legitimacy of participation without paying its operating cost.

Goodhart's Law Just Got a Slash Command

Michael Tuszynski — Tue, 19 May 2026 16:58:01 +0000

Anthropic added the /goal command to Claude Code in v2.1.139. You set a completion condition; the agent keeps working across turns; a second model reads the transcript and decides whether the condition was met. It is the built-in version of the keep-going loops people have been hand-rolling for long agent work.

A careful field guide for it circulated this week, and the piece lands the right diagnosis. A verification-only condition produces a correct-but-useless result. The worked example built a space shooter as a 960×540 canvas with a triangle, a dot, and three starfield pixels. Every machine check passed. The recommended cure is the wrong one: write better conditions, point them at a longer PRD that defines what good looks like, keep the condition short and the spec long. Better conditions do not escape this failure. They relocate it.

The Slash Command Has a Fifty-Year-Old Name

Marilyn Strathern's formulation of Goodhart's Law is the canonical statement of what /goal automates: "When a measure becomes a target, it ceases to be a good measure." Targets get optimized with full discipline. Anything outside the target does not appear in the result, because nothing unmeasured can fail the goal. /goal takes this dynamic — previously an organizational pathology — and ships it as a CLI primitive. The condition is the target. The agent is the optimizer. The evaluator-as-judge enforces the target with mechanical rigor.

The field guide does not contain the word "Goodhart," and the omission matters. Every paragraph of it describes Goodhart's Law without naming what it is fighting.

The HUD That Wasn't Checked

The strongest evidence for the structural read is buried in the piece's own conclusion. The "fixed" three-games run used per-version visual assertions — for the 70s build, an automated check asserts the renderer uses stroke and line primitives only; the 8-bit and modern builds have their own. The public repo shows the work. And then, from the closing paragraph:

The modern version's headless playtest renderer stubs text drawing, so its headless screenshots show no HUD; it renders correctly in a browser. The visual assertion passed without ever checking for the HUD, which is the same lesson one level down. It measured the effects it was told to measure, not the HUD it was not.

That is the diagnosis from the opening of the piece recurring inside the fix from the middle. The PRD got longer. The condition got smarter. The unmeasured thing — text rendering — moved one room over and the shoebox followed it. This is what Goodhart's Law does to every system that automates a measure into a target. The fix is not a stricter spec. There is no spec that anticipates the thing you didn't think to check.

Why the Loop Cannot Save Itself

The structural reason /goal cannot escape this on its own is in Anthropic's own description of the feature. The evaluator runs no tools. It reads the transcript. The field guide flags this in two separate sections — first in How to use it ("The evaluator only read the transcript. Verify the result the way you would verify a colleague's pull request before you trust it") and then again in Gotchas ("A confident summary of broken work reads as 'fine'"). Both statements are correct. Both close the case.

A verifier that does not run the artifact has not verified the artifact. It has verified the transcript. The transcript is the artifact's lawyer, not its auditor. The same model that produced the broken thing also produced the summary of the broken thing, and a second model trained on the same loss function reading that summary is not adversarial review. It is paperwork.

The Narrow Case That Survives

/goal earns its keep where the goal and the measure are the same object. Tests pass. Build is green. The queue is empty. Every module is under a size budget. Coverage is over a threshold. In that case Goodhart does not bite, because there is nothing unmeasured to subvert — you wanted the tests green and the tests are green. This is the argument from Babysitter, Auditor, Prayer. Or Tests. two weeks ago, restated: anything with deterministic verification is the right place to lean on a loop; anything that needs judgment is not.

The moment your goal includes a judgment term — looks good, is fun, has a HUD, is well-designed, feels right — you have left the domain /goal can serve. The PRD-as-context pattern does not change this. The evaluator still does not read the PRD. The evaluator still does not run the artifact. It is doing what its documentation says: summarizing whether the transcript looks like it satisfied a condition.

The Cost Ledger

The three-game example cost about 91 minutes across the three runs, plus the upfront work writing the PRD and the goal prompt. That is one half of the productivity story. The other half is the audit. The field guide is explicit about this: "Audit 'achieved' yourself. The evaluator only read the transcript. Verify the result the way you would verify a colleague's pull request before you trust it."

If you audit every "achieved" result with the rigor of a real PR review, the loop did not eliminate the work. It moved the work to a different verb. The savings are real only when the verification is mechanical and you can skip the audit because the tests genuinely passed. Outside the mechanical case, the audit is the work, and the time spent writing a longer PRD is overhead the hand-rolled loop did not have.

What Survives Contact With Goodhart

Two patterns survive. The first is the narrow mechanical case above. Use /goal for it, write a short condition that exactly equals what you want, and trust the green build. The second is a hand-rolled loop you write yourself, where the verification step is code rather than English. A loop with code-level verification surfaces missing checks as tests that do not run or assertions that do not compile. A /goal condition that misses the HUD just announces "achieved." The visible failure is the cheaper one to fix.

Goodhart's Law has been around for fifty years. Every system that has automated a measure into a target has lived through the same failure — KPIs, OKRs, SLAs, test scores, hospital wait times, sales quotas, ad-engagement metrics, every algorithmic feed. Now the pattern is a slash command. The PRD-as-spec recipe is the same trap with extra documentation.

Use the feature where the goal and the measure coincide. Everywhere else, the audit is the loop and the human is the evaluator.

Your Marketing Team Is Now a Software Vendor

Michael Tuszynski — Mon, 18 May 2026 00:13:03 +0000

A DevOps engineer posted on r/devops this week with what reads like a familiar shadow-IT question dressed in 2026 clothes. Marketing, product, and sales people across his AI startup are shipping internal apps with Cursor and Claude Code. They deploy to Vercel, Cloudflare Pages, Netlify. The data is real. The authentication is not. The thread hit 119 upvotes and 119 comments in 48 hours.

The top reply was two words: "Good luck."

The framing of the original post is wrong, and the framing is the reason the thread is full of fatalism. The question "how do we secure AI-generated apps built by non-dev teams" assumes the right enforcement point is the human or the policy. That assumption was wrong for Shadow IT 1.0, and it is wrong for the 2.0 version.

Shadow IT Already Moved Once

Shadow IT 1.0 was the marketing director expensing a Notion subscription, the sales rep wiring HubSpot to a personal email, the product manager paying for Figma on the team Amex. The solution was not "review every SaaS purchase." The solution was Okta — make the SSO catalog the only practical way to log in, and rogue accounts die of friction. The chokepoint was authentication, not procurement.

Shadow IT 2.0 has no SaaS vendor at all. The marketing team is the vendor. They are shipping software. The Cursor-generated dashboard that reads from the customer database, the Claude Code script that pulls from the data warehouse and posts to a Vercel preview URL, the internal "tool" with a hardcoded production API key — your marketing team is distributing a software product into your environment. That product has one user, no contract, no security review, and no kill switch.

Calling this "non-dev teams writing apps" understates what is happening. The right mental model is that you accidentally acquired twelve new internal software vendors last quarter, and vendors get treated as vendors.

The 5K-Lines-a-Day Wall

Inside the same Reddit thread, one tech-sponsor commenter writes the most honest line in the discussion. Their company's policy requires a business sponsor and a technical sponsor for every internal app. The business sponsors approve everything. The technical sponsors "can't be arsed to review 5k lines of Claude reinventing the wheel per day, on top of their actual jobs."

That is the bottleneck signal. The cost of code generation dropped to near zero. The cost of human code review did not. Any governance model that puts a senior engineer in the path of every vibe-coded internal app fails by simple arithmetic. You cannot review a Claude-generated codebase the way you review a pull request from a human teammate. The volume is wrong by an order of magnitude.

The companies that solve this will solve it the way Anthropic describes building effective agents — by putting the checks at the seams that matter rather than auditing every step. The seam that matters for shadow IT 2.0 is the deploy, not the diff.

The Substrate Is the Only Chokepoint

Every internal vibe-coded app has to land somewhere. Vercel, Cloudflare Pages, Netlify, AWS Amplify, a personal S3 bucket — the deployment substrate is the new SSO catalog. Own that, and most of the OP's problem list collapses.

A practical paved road looks like this.

One sanctioned deploy path. A self-service Backstage-style internal developer portal that takes a Cursor or Claude Code output and ships it in 60 seconds, but wraps it in SSO, secret scanning, data classification, CMDB registration, and your domain. Make the boring secure path also the only easy path. If the marketing team's "ship it now" instinct routes through the paved road by default, the policy fight stops being a policy fight.
Outbound deploy enforcement. Block deploys to *.vercel.app, *.netlify.app, *.pages.dev from corporate networks and managed devices except through the paved road. Treat unsanctioned deploys the way you treat unsanctioned SaaS — a network event, not a policy violation.
Every internal app gets a vendor record. Owner, business sponsor, data classification, retention policy, kill switch. The CMDB entry that one Reddit commenter described as their company's working pattern is not bureaucracy. It is the only artifact that survives the engineer's vacation, the marketing manager's promotion, and the eventual audit.

Cloudflare Access and equivalents from the major clouds already do the SSO-and-tunnel side cheaply. The infrastructure exists. The missing piece is making it the path of least resistance for a marketing person who just got a working prototype out of Claude Code.

Why a Browser-Security Vendor Just Sold for $205M

This week Akamai announced its intent to acquire LayerX Security for roughly $205 million. LayerX builds browser-based AI usage control — visibility and policy enforcement at the point where employees paste customer data into a foundation model or deploy a generated app from a SaaS workspace. A $205 million acquisition does not happen because a handful of enterprises are worried about shadow AI. It happens because the security market just priced in that this is a category.

That category is the Shadow IT 2.0 category, and the substrate vendors and security platforms are racing to claim it before the customer's internal platform team builds an alternative. The DevOps engineer who posted the Reddit question is buying or building in this space whether they planned to or not.

The Air Canada Logic Applies

Air Canada was ordered in February 2024 to pay a customer whose refund policy the airline's chatbot had invented. The airline's defense — that the chatbot was "a separate legal entity" — was rejected by the BC Civil Resolution Tribunal. The agent's promise was the company's promise.

The same logic applies one layer down. The customer-data dashboard your marketing manager vibe-coded last Thursday is the company's product when it leaks. The "I just made it for myself" defense lasts about as long as Air Canada's chatbot-is-separate defense did. Your liability surface is not the apps your engineering team ships. It is every app any employee deploys with company data, on company devices, under company infrastructure.

What to Do This Quarter

Stop trying to gate the building — speed is the reason vibe coding exists. Gate the deployment substrate, register every app as a vendor product, and accept that the marketing team writing software is now a permanent feature of how your company operates.

The platform team's job description just changed. It is no longer "support the engineering org." It is "run the internal vendor-onboarding desk for everyone who can now write software with an LLM." The companies that adapt fast will ship a paved road this quarter. The ones that send the policy email will, as the top Reddit reply put it, get to enjoy the inevitable disaster.

Anthropic Picked Tulsa

Michael Tuszynski — Sat, 16 May 2026 16:32:37 +0000

Anthropic launched Claude for Small Business on May 13. Every read I've seen has focused on the product surface: 15 prebuilt agentic workflows, 7 named SaaS integrations, a trust posture built around in-the-loop approval and no-training-on-data defaults.

The product is fine. The strategic signal is somewhere else.

Read past the workflow list to the bottom of the announcement. Anthropic is taking Claude for Small Business on a 10-city physical workshop tour: Chicago, Tulsa, Dallas, Hamilton Township NJ, Baton Rouge, Birmingham, Salt Lake City, Baltimore, San Jose, Indianapolis. Free half-day live AI fluency training. 100 local small-business leaders per stop. Local partner organizations in each city. Plus credits to a Workday Foundation Solopreneurship Accelerator run with LISC, and Claude credits + technical support to three CDFIs — Accion Opportunity Fund, Community Reinvestment Fund USA, Pacific Community Ventures.

That is not a marketing budget. That is a ground game.

The Two Ways to Win SMB Distribution

There are two structural paths to small-business AI distribution at scale.

The first is bundling onto an existing install base. Microsoft has Office 365 in the hands of millions of small businesses. Microsoft 365 Copilot Business is $21/user/month, priced and packaged for SMB consumption, and shipped through the same Office channel partners that have been selling to small businesses for two decades. If you're a 20-person business already paying for Microsoft 365 Business Premium, Copilot Business is a checkbox.

The second is building a ground game. Find the small businesses that aren't on a single dominant productivity stack. Show up in the cities where the consulting class doesn't fly. Partner with the organizations those businesses already trust — CDFIs that fund them, accelerators that mentor them, training nonprofits that work with them. Convert those relationships into installs.

Anthropic doesn't have an installed base to bundle onto. They chose path two. The tour cities are the first visible artifact of that choice.

The Integration List Is a Leaderboard

Look at who's named in the launch: QuickBooks, PayPal, HubSpot, Canva, Docusign, Google Workspace, Microsoft 365. Those are the revenue-touching tools — cash, sales, design, contracts, productivity. Each integration partner traded co-marketing for default position inside the Claude SMB workflow.

Now look at who's not on the list. Shopify. Stripe. Square. Mailchimp. Slack. Notion. Salesforce is the most interesting absence — they have their own Einstein/Agentforce play, so this reads as competitive choice rather than oversight.

If you run an SMB-tier SaaS product and you weren't asked to integrate, your category got assessed as either too small to matter, too replaceable to integrate, or close enough to a competitor's space that you got passed over. The leaderboard is also a list of products that just learned their position is more fragile than they thought.

This is the Build Like the Capex Already Left thesis playing out at the SMB tier in real time. Workflow automation as a pure interface — bookkeeping point solutions, contract-review SaaS, marketing-automation tools, fractional-CFO platforms — is the category most at risk. Anthropic made a list of which interfaces survive as integrations and which get eaten as features.

Trust as the Scale-Down Moat

Anthropic's own survey, cited in the launch: half of small-business owners name data security as their single biggest hesitation about AI. The Claude for Small Business response is structural: "every task is initiated by you," "your existing permissions hold," "we don't train on your data by default."

That posture is harder for larger competitors to make credibly. A small-business owner who has read about cloud-provider data settlements, who has seen platform vendors fold AI features into existing license agreements, who has watched training-data policies quietly evolve — that owner reads "we don't train on your data by default" differently from how a Fortune 500 CIO reads it. The trust posture scales asymmetrically down-market.

Anthropic's PBC structure and explicit Constitutional AI framing make this credible in the small-business segment in a way that's hard for larger players to match. That's not an accident.

The Tour Cities, Again

Tulsa is ranked second in Oklahoma for new business formation, with metro population over 1 million. Birmingham, Baton Rouge, Hamilton Township NJ, Indianapolis, Salt Lake City, Baltimore — mid-size metros with active SMB economies and minimal native AI marketing presence. The tour avoids the Bay Area and skips New York entirely.

The CDFI partnerships matter for the same reason. Accion Opportunity Fund, Community Reinvestment Fund USA, and Pacific Community Ventures lend to small-business owners that traditional banks underserve. Putting Claude credits and technical support inside those institutions reaches a population that Anthropic could not reach through coastal startup networks.

This is also where the PwC alliance expansion fits the same pattern at the enterprise tier. Anthropic doesn't have native large-enterprise GTM, so they're buying it via consulting alliance. The SMB tour is the bottom-up version of the same problem-solving: manufacture distribution.

The Diagnostic

For SMB operators considering Claude for Small Business: the question isn't whether the workflows save time — they will. The question is which of your current vendors becomes redundant once Claude handles the cross-tool layer. The bookkeeper who only reconciles to QuickBooks. The fractional marketer who only schedules HubSpot campaigns. The contract reviewer who only flags Docusign sends. Those are the access-rents, and the rent just dropped to subscription pricing.

For SMB SaaS founders: if you're not on the integration list, you have months to decide whether to compete with the integration layer or fold into it. The companies that picked the integration path bought time. The ones that bet on direct SMB acquisition without an installed base have to build distribution the way Anthropic is building it — geographically, partnership-by-partnership, in cities that don't make it onto a coastal startup deck.

What Tulsa Means

The product launch is the surface. The tour is the strategy. Anthropic is testing whether a public-benefit corporation with strong technical product but limited native distribution can manufacture a GTM motion in the SMB segment through geographic, institutional, and demographic outreach that the existing distribution channels haven't covered.

If it works, it becomes the playbook for any AI company without a consumer-brand head start or an installed software base to bundle onto. Pick the cities. Find the partners. Build the relationships. Ship the credits. Skip the conference circuit.

Anthropic picked Tulsa. The rest of the playbook follows.

What Looks Like Busywork Is Mostly Rent

Michael Tuszynski — Fri, 15 May 2026 20:54:06 +0000

Carl Frey's recent NYT piece argues that AI's real impact isn't job replacement — it's labor transfer from worker to consumer. We become our own travel agent, accountant, exterminator, doctor. The work doesn't disappear; it moves out of the labor statistics and into our evenings. Productivity climbs, corporate profits climb, individuals feel overburdened.

The observation is correct. The framing is wrong. And the receipt that demolishes the framing is sitting inside Frey's own piece.

The $162K Receipt

Frey cites a family that used Claude to cut a hospital bill from $195,000 to under $33,000 — over $162,000 in coding errors and duplicative charges. He presents this as a "tangible benefit" of self-service, then immediately pivots to "however, self-service does not automatically reproduce a professional's judgment."

That pivot is doing extraordinary work. The professional system here — the hospital's revenue-cycle department, the medical billing specialists, the insurance reviewers, the patient advocates — exists specifically to catch coding errors and duplicative charges. Those roles were billed for. They were paid. They simply weren't doing the work.

The "burden" of the family doing that audit themselves wasn't AI making them busier. It was AI revealing that the prior arrangement was charging $162,000 for oversight that wasn't happening. The professional wasn't displaced. The professional was already absent — the bill just looked like they were there.

This isn't a one-off. Industry estimates put error rates in medical bills as high as 80%, with average mistakes on $10,000+ bills running $1,300. The system supposed to catch them — the same system Frey mourns as professional expertise — has been failing silently for decades. AI didn't transfer the auditing burden to consumers. It revealed no one was auditing in the first place.

Two Different Things Frey Calls One

The Frey argument conflates two fundamentally different categories of professional work:

Access-rents — work where the intermediary's value was gatekeeping the inconvenience of access, not delivering judgment. Travel agents reading flight schedules. Tax preparers running TurboTax-style forms. Stock brokers placing trades. Bank tellers handling deposits. The intermediary added little beyond being a required step. Killing them is liberation, not burden.

Integrated expertise — work where the professional integrated context and judgment the consumer couldn't reach. Differential diagnosis on ambiguous symptoms. Real tax strategy across multiple business entities. Trial strategy under specific judges. These require an expert who tells you what to ask, not just answers your question. Killing these is real risk transfer.

AI eats both. Frey treats them as one phenomenon — "busywork landing on us" — and concludes we're overburdened. The conflation matters because the policy and product implications are opposite: accelerate the killing of access-rents, protect integrated expertise.

The hospital billing example is squarely the first category. The patient wasn't replacing a clinical judgment with Claude. They were replacing the administrative oversight layer that was billed for but not delivered. That isn't thinner expertise. That's a system finally getting audited.

The Travel Agent Case

Frey raises self-service travel as a historical parallel. He's right that the work moved. He's wrong that it transferred a burden. Travel agent employment dropped 60-80% between 2000 and 2020 as Expedia, Kayak, and Booking.com took over leisure bookings. Airlines eliminated most commissions to agents in 2002, removing the revenue model.

Consumers did not drown in travel-planning busywork. The bookings that took 90 minutes with an agent (call, hold music, faxed itinerary, callback the next day) take 5 minutes online. The work didn't transfer to the consumer — most of it disappeared, because most of it was the friction of going through a human intermediary in the first place.

The agents who survived the collapse were the ones doing real integrated work: complex multi-leg corporate travel, custom itineraries for high-end leisure, expertise on visa requirements and disruption handling. The access-rent agents disappeared. The integrated-expertise agents didn't. The market separated category A from category B without any policy guidance. Consumers benefited.

The Quality Differential Cuts Both Ways

Frey leans on "opportunity cost neglect" — the documented tendency to overlook the value of time we give up when self-serving. He's right that we miss it. The inverse error is also documented and larger in dollar terms: we routinely overpay for professional services that don't deliver judgment over what an AI tool gives free.

The $300 accountant who beats a free AI tool by $30 on your return is a net $270 loss. The lawyer who charges $400 to fill out a generic LLC formation is a net $380 loss. The travel agent who books the same flight you'd have found is a net commission loss. Self-service neglect cuts the consumer one way; expertise neglect cuts them the other. Frey only counts one direction.

What's Actually Happening

The shift Frey describes is real, but the right read is closer to this: the post-WWII service economy normalized paying intermediaries to gatekeep our own affairs. Filing taxes was always something we could do; the IRS publishes the forms. Booking travel was always something we could do; the airlines publish their schedules. Disputing medical bills was always something we could do; the line items are itemized. We outsourced these tasks because the access was inconvenient and the time cost was real.

AI dropped the access cost to near-zero and the time cost to minutes. What feels like busywork is mostly the recovery of agency over our own affairs. We're not drowning. We're doing things we could always do, finally without paying for the privilege.

The argument for protecting integrated expertise still stands. The pediatrician who notices the symptom you didn't think to mention, the tax strategist who sees the structure across years of returns, the lawyer who reads the judge before the brief — these are real and AI is closer to replacing them than most professionals admit, but not there yet. Those roles deserve defense.

The argument for protecting the medical billing specialist who wasn't auditing your bill, the travel agent who read the same schedule you can see, the accountant who clicked through TurboTax for you — that argument is over. It ended the day Claude found $162,000 in errors a paid system missed. As I argued yesterday, workflow-automation-as-pure-interface businesses are getting eaten first. The same logic applies to the human version of those businesses.

What Frey calls busywork is mostly rent we finally stopped paying.

Build Like the Capex Already Left

Michael Tuszynski — Thu, 14 May 2026 19:53:14 +0000

In 2025, four companies — Microsoft, Alphabet, Amazon, Meta — spent over $300 billion on AI data centers. The combined 2026 number is forecast at $725 billion, a 77% jump in a single year. For comparison, the entire global SaaS market in 2025 is roughly $295–370 billion depending on whose definition you use. The capital being poured into the thing that replaces software is now equal to or larger than the software market it competes with.

If you run a software business, "how do we add AI features" is the wrong question. The right one is whether your product would exist at all if you started the company today.

The Receipts on the Wrong Side

Chegg is the canonical example. The homework-help business — students paying $14.95/month for textbook answers — was structurally fragile already, but ChatGPT made it terminal. Chegg's revenue fell 39% in 2025 ($618M → $377M), the homework subscription business dropped 43% in the same year, and the stock is down 99% from its 2021 peak. The CEO told investors in late 2025 that Google's AI Overviews launch was "as material" to the collapse as ChatGPT itself.

Chegg did not lack AI features. They launched CheggMate, a GPT-4 study tool, in April 2023 — six months after ChatGPT's debut. They built AI tutors, AI study guides, AI essay help. None of it stopped the decline. The features were not the problem. The product was the problem. They were selling paywalled answers to questions ChatGPT was giving away free.

Stack Overflow followed a similar arc. Question volume collapsed almost immediately after ChatGPT's November 2022 launch — developers stopped asking on Stack Overflow because the AI was faster and trained on Stack Overflow's data. The 2025 Stack Overflow Developer Survey confirmed it: 84% of developers now use AI tools daily, and 79% rely on ChatGPT. Stack Overflow eventually licensed its data to OpenAI in 2024, but the licensing revenue does not replace the community engagement that drove the original product.

The pattern in both cases is the same. A workflow-automation business — Chegg automated finding textbook answers, Stack Overflow automated finding code answers — gets eaten when the underlying knowledge becomes free to query directly. The interface that used to mediate access stops being valuable when the access is direct.

The Receipts on the Right Side

Duolingo did the opposite trade. In 2023 they introduced Duolingo Max, a higher-priced tier built around AI — conversational roleplay with characters, AI grammar explanations on every wrong answer, AI-generated personalized lessons. They didn't bolt AI onto Duolingo Plus. They built a new product tier where AI was the product, and priced it above Plus.

The 2025 results: revenue crossed $1 billion ($1.01–1.02B annual), up over 50% year-over-year, with AI features driving 51% user growth. The bet was that language learning at any price point gets better with AI, and the customers who valued speed of progress would pay for the better version.

Adobe took a different but related path. Adobe Firefly, launched as a generative imaging model in 2023, has been embedded into Photoshop, Illustrator, Premiere, and the standalone Firefly app. As of Q3 FY2025, Firefly recorded 29 billion total generations with 40% quarter-over-quarter growth in video. Adobe's FY2025 revenue hit $24.05B, up 11%. The pivot here was structural: Adobe stopped treating creative software as the product and started treating creative output as the product, with AI as the engine for generating it.

Klarna sits in the middle of the framework. They took the disruption seriously enough to own it — their AI assistant handles two-thirds of customer service chats, doing the equivalent work of 700 full-time agents — and even after partially reinvesting in human support, the AI still handles the volume work. The pivot wasn't "we sell AI now." It was "we automated our own cost center before someone else automated it for us." A different strategic posture from Chegg, who tried to retroactively add AI features to a product the AI was making redundant.

The Strategic Question

Three moats survive an AI capex shift of this magnitude.

The first is data. Bloomberg's terminals survive because the data feed is proprietary and the licensing structure is decades old. MLS data for real estate survives for the same reason. If your customers cannot get your data from a public AI model, you have time.

The second is workflow with capture. The product owns a system of record that AI tools cannot easily reach into, and the friction of integration is what holds the position. ServiceNow, Workday, and Salesforce all sit in this category, though they are each spending heavily on AI features because the moat is shrinking.

The third — and most interesting — is owning the AI consumption layer itself. This is where Duolingo and Adobe sit. AI capability becomes a commodity; packaging that commodity for a particular user job is the product. The capex flowing into hyperscaler data centers builds the substrate. The product is what sits on top of the substrate, charging users for the application of the capability.

The wrong moat is workflow automation as a pure interface. Chegg's product was, structurally, "we make it convenient to look up textbook answers." Stack Overflow's was "we make it convenient to find code answers." Both moats vanished when the AI made the underlying capability free and direct. Any business whose pitch is "we automate X" is at risk if X is a knowledge-work pattern the model can reproduce.

How to Tell If You're Chegg

The diagnostic question is uncomfortable. If you started your company today, with full knowledge of ChatGPT's capabilities and access to frontier model APIs, would you build this product?

If yes, you have a real moat. Build harder, faster, ship more.

If no — if the honest answer is "we'd build something else, but we have customers and revenue so we're going to keep adding features" — you are Chegg in 2023. The features will not save the product. The strategic move is the pivot itself, not the feature roadmap.

In yesterday's piece on cloud support roles, I argued that AWS L1/L2 customer support is the first agent target because pattern-match-on-logs-and-escalate is what an LLM with tool use eats for breakfast. The same logic applies to your product. If the customer outcome you sell is "find the answer to X" or "summarize Y" or "automate Z workflow," ask whether a $20/month ChatGPT subscription plus a willing customer can produce 80% of the outcome you charge for.

If yes, the capex has already moved. Build like it.