Forem: Thiago Pacheco

Anatomy of a Coding Agent: What I Learned Building My Own

Thiago Pacheco — Thu, 21 May 2026 19:05:09 +0000

You use Claude Code every day. But do you really know it?

I didn’t. Not really. I could describe the surface, but I couldn’t tell you what was actually happening between hitting enter and seeing a diff. The harness was a black box, and that started to bother me. The only way I knew to fix that was to build one.

The result is oli, a single-binary terminal coding agent written in Rust that runs against Ollama by default. It got much larger than I expected. Every “small” subsystem (memory, policy, tool dispatch, provider quirks) turned out to have its own thicket of edge cases. I learned more about coding agents in a few weeks of building one than I had in a year of using them daily.

The hard parts aren’t clever prompts. They’re context management, tool design, policy gates, memory strategies and, if you want to run locally, survival engineering for models that don’t behave like the GPT-5 or Claude 4 families. This post is a tour of those parts, with the failures that taught me each one.

The Big Picture

Before diving into components, here’s what a coding agent actually looks like:

The Agent Loop

Every coding agent runs the same basic loop:

Think → call → observe → repeat. The loop is deceptively simple. Everything hard lives in the details.

Three things I learned the hard way:

Turn limits matter, even on local models. Early on, a local model I was testing got stuck, it looped 47 times trying to “fix” a file that was already correct, burning through my entire context window before I caught it. The model wasn’t broken. The harness was: no turn limit, no escape hatch, no awareness that the model had lost the plot. The instinct is to say “who cares, I’m running locally, tokens are free.” But the cost isn’t dollars, it’s coherence. As context fills, models degrade: attention spreads thinner, instructions in the middle of the window get ignored (“lost in the middle”), and every failed attempt becomes a distractor that biases the next turn toward more wrong behavior. This is especially true on local models, whose effective context is often a fraction of their advertised window — a Qwen3 advertised at 256K may stay coherent only through the first 32-64K. A runaway loop doesn’t just waste compute; it actively makes the agent stupider for the rest of the session. Now subagents get hard turn limits by default, and the top-level agent is capped too, with a config default you can override per run.

The system prompt is pinned. It survives /clear, survives memory compaction, survives everything. The first time I accidentally cleared the system prompt mid-session, the model forgot what tools it had and started apologizing for being unable to help. Pinning was an obvious fix in hindsight.

Re-entrancy is a trap. A Task tool that spawns a subagent which can itself spawn subagents quickly becomes a fork bomb. oli sidesteps the whole problem by registering Task only in the parent agent — subagents get the same tool set minus Task, so they can’t recurse.

The System Prompt

Before you say anything, the model already gets a lot of scaffolding:

The AGENTS.md + CLAUDE.md line matters. Both Codex and Claude Code established conventions for per-project agent instructions. oli reads both, at every directory level from your current location up to the filesystem root — so a workspace-wide convention can be overridden by a project-level one without copy-pasting.

Note that the system prompt text and the tool schemas are separate payloads in the provider request — a better mental model than “everything lives in one giant prompt.”

The system prompt is expensive. It consumes tokens on every single turn. I learned this when a project with nested AGENTS.md files at multiple levels ate 8,000 tokens before I’d typed anything. Now oli caps injected project context at 16KB total and truncates directory listings at 50 entries. This is also where prompt caching pays off: both the prompt and the tool schema payload are highly stable across turns. Anthropic’s API supports marking that stable prefix as cacheable, and oli supports Anthropic-style cache breakpoints on Claude routes through OpenRouter too. Anthropic prices cached input tokens at roughly 10% of the regular rate, so caching shaves a large chunk off every subsequent turn in a long session.

Tools: The Model’s Hands

The tool surface defines what the agent can do. Too few tools and it’s frustrating. Too many and the model gets confused, especially local models with smaller context windows.

A small core for the work, plus a handful of paging and notes tools. The design principles emerged from failures:

Minimal surface. I started with more, and the model kept picking the wrong tool or stacking calls when one would do. Keeping the core small and bolting on paging and notes tools turned out to be the right shape. Each does one thing. These choices mirror what Claude Code, Cursor’s Composer, and Aider converged on independently — a signal that the design space is narrower than it looks.

Bounded output. Every tool truncates at 30KB. I learned this after the model Read a minified vendor.js file and blew through 180,000 tokens in one call. Now truncation markers embed a cache ID; the model can call ShowFull(id, offset) to paginate the stashed body if it actually needs more.

The read-first invariant. Edit refuses to run unless you’ve Read the target file this session. This prevents edits based on stale context — the model can only modify what it’s actually seen this turn:

Tools own their cleanup. When you Ctrl-C a bash command, the harness has to kill not just the shell but every process it spawned. I found this out when orphaned sleep processes from cancelled test runs accumulated over a week. A tool that spawns side effects has to know how to undo them — otherwise they leak into the user’s environment, and the harness gets blamed for the mess.

Memory & Context Management

Context windows are finite. Even 200K tokens fills up fast when you’re reading files, running tests, and iterating. Every tool result eats tokens.

The default strategy is LinearWithCompact: hold messages in order, and when approaching the context limit, summarize older turns into a rolling summary. Compaction triggers at 80% of context window. It finds a cut point at a user-message boundary (never split a tool_call from its result), sends older messages to the LLM with a summarization prompt, and replaces them with a summary message.

That part is straightforward. The bug that ate three days of debugging was cancellation under compaction.

Here’s the scenario. You hit Ctrl-C mid-turn to stop the model. The REPL is supposed to roll back to the last clean state — the user prompt you just sent. Simple, when the message buffer is the source of truth: count back N messages, restore.

Except compaction had already run. The “last 12 messages” the rollback wanted to restore no longer existed — they’d been collapsed into a summary. So Ctrl-C would silently roll the conversation back to somewhere random, often before the work the user was trying to cancel.

The fix was a one-line concept that took a day to see: stop treating the message array as the source of truth for where you are. oli tracks a monotonic record_count that increments on every logical turn, independent of physical message storage. Compaction drains messages, but the record count keeps climbing. Rollback targets a record count, not an array index — so the buffer can shrink underneath without the cursor losing its place.

Generalizable shape: any system that mutates history needs a stable identifier for “where you are” that isn’t the history itself. Memory management is also where coding agents silently fail — the model forgets what file it was editing, or repeats work it already did. The visible bugs are the easy ones.

Policy & Approvals

The policy engine gates every tool call. Without it, you’re one hallucinated rm -rf away from disaster.

Bash has its own allowlist: git status, cargo test, ls, and similar safe commands auto-approve; everything else prompts. The line-mode REPL is a plain y/N flow; the richer “allow this session” and persisted allowlist behavior are TUI affordances layered on top.

There’s no hard denylist yet, and I’m ambivalent about adding one. Prompt-on-every-mutation is annoying but transparent — you see exactly what the model is about to do. A denylist invites a long tail of “obviously safe” exceptions that gradually erode the gate. Today, blocking happens by denying an Ask decision: slower, but you stay in the loop.

The first time I saw the model attempt rm -rf target/ when I’d asked it to “clean up the build” and the approval prompt caught it, I understood why this layer exists. The model wasn’t malicious. It was being helpful in a way that would have been catastrophic without the gate.

Local-Model Survival

Frontier models handle tool calls cleanly: structured tool_calls arrays in the API response. Local models via Ollama? Not so much.

Many local models emit tool calls as plain text. Just raw JSON in the middle of their response. No structured field.

oli solves this with a fallback parser. When the model’s capability entry says supports_native_tool_calls: false, the agent loop scans content for JSON objects with a name field and splices them in as if they were real tool calls.

The model capability registry tracks what each model can do. Lookups are prefix matches against the model id, so claude-3-7-sonnet-latest matches the claude- row. Today’s hardcoded entries:

Prefix	Context	Native Tool Calls
`claude-` / `anthropic/claude`	200K	✓
`gpt-4`	128K	✓
`gpt-`	16K	✓
`llama3.1` / `llama3.2`	128K	✓
`llama3`	8K	✗
`qwen2.5-coder`	32K	✗
unknown (default)	8K	✗

Smoke tests against qwen3-coder:7b and llama3 showed they don’t behave as their docs claimed for tool calls, so they’re flagged off; the fallback parser handles them. Users override via [[caps]] in config — that’s how I run qwen3-coder:30b at 256K with native tools.

The gap between “works on Claude” and “works on qwen3” is enormous. If you’re building for local models, expect to handle every edge case they throw at you — and to discover new ones every time a model release widens the gap.

Provider Abstraction

One binary, many backends. The promise only works because of the language choice: Rust gives a single static binary with no runtime to install, shipping to Mac/Linux/Windows from the same cargo build. And the borrow checker turns out to be a surprisingly good fit for an agent loop — the lifetime and ownership bugs that bite hardest in async code, the compiler catches before I run anything.

Two provider kinds are available. Ollama, OpenRouter, OpenAI, LM Studio, vLLM, and llama.cpp’s server all speak OpenAI’s /chat/completions shape, so one openai-compat implementation covers them all — you just point base_url at the right endpoint. The native Anthropic provider is its own thing because it speaks the Messages API directly and does its own OpenAI-shape Anthropic-shape translation for tools, tool results, and system prompt handling.

Caching adds a wrinkle here: native Anthropic supports prompt caching directly, but oli also supports Anthropic-style cache breakpoints in the openai-compat path for Claude models routed through OpenRouter. So “native provider” and “cached prefix” are related, but not identical ideas.

The openai-compat provider still has to handle backend quirks. OpenRouter sometimes returns 200 OK with an error object in the body. The provider layer normalizes this.

The payoff: switching backends is a config flip, not a code change.

The Extension Surface

A harness is only useful if you can make it yours. oli has three extension axes:

All extensions route through the policy gate. No escape hatches.

What I Left Out

To keep this post focused on the load-bearing components, I skipped several surfaces that ship in oli today: an MCP client (stdio + HTTP transports, with live tools/list_changed deltas), a hook dispatcher (PreToolUse / PostToolUse / Stop), session persistence as JSONL transcripts (with --resume / --continue), and the slash-command layer (/compact, /provider, /model, /sessions, etc.). Each is its own subsystem with its own trade-offs — let me know if you want any of them as a follow-up.

What This Means

The harness is invisible until it breaks. When Claude Code feels brilliant, you credit the model. When it feels broken — losing context, blocking the wrong thing, looping on a fix — that’s almost always the harness. The orchestration layer is doing the heavy lifting either way; you only notice when it fails.

The hard problems are systems problems. Memory management, policy enforcement, tool design, provider abstraction, local-model survival. Engineering challenges, not prompt engineering.

Knowing the machinery makes you a better user. Context overflow, policy gates, compaction, capability mismatches — once you can name what’s happening, you can debug it. Or work around it.

The repo is at github.com/sudoish/oli. It’s a learning artifact, not a production system. The code is there, the specs are documented, and the tests cover the edge cases if you want to dig in.

The post Anatomy of a Coding Agent: What I Learned Building My Own appeared first on sudoish.

How many times have you told the agent the same thing this week

Thiago Pacheco — Wed, 13 May 2026 00:35:42 +0000

I caught myself, again, telling Claude that we don’t do it that way in this codebase.

Not the first time that week. Not the first time that day. Different repo, same correction. The kind of correction that doesn’t belong in a CLAUDE.md because it’s not really about that repo — it’s about how I work, the patterns I’ve learned the hard way, the shapes of solutions I trust because I’ve watched the alternatives blow up. But it’s also too specific to live in my global system prompt, because half of it only applies to certain kinds of services, certain teams, certain problem domains.

The agent was infinitely patient about it. I was the one running out of patience with myself.

Your judgment lives in a layer that has no home

Your CLAUDE.md knows the conventions of one repo. Your system prompt knows you, in the abstract. Neither of them knows why you stopped trusting that ORM after the migration last spring, or why you always pull a prod data sample before sketching a backend model, or why this particular service has a query pattern that looks wrong but is actually load-bearing for a reason nobody documented.

That stuff lives somewhere. For most of us, it lives in our heads. And it leaks — because every new agent session starts from zero, because the corrections you give the agent are exactly the corrections you used to internalize through reps, because you’d rather retype an instruction for the fifth time than build a system to remember it for you.

There’s a layer between the repo and the person where most of a senior engineer’s actual judgment lives. The investigation patterns that work across services. The bug classes you keep seeing in your team’s code. The query you wrote three weeks ago that nothing remembered. Almost nobody is engineering that layer deliberately. The whole AI tooling conversation is about giving the agent more context. Almost no one is talking about the engineer keeping their own.

This is the layer worth engineering. The question is where.

Some of this already existed

I’ve kept a dev journal for years. I wrote about it in What I’m Doing to Not Become Irrelevant — daily notes on what I worked on, what broke, what decisions I made. As a self-reflection tool, it’s solid.

Around it, other practices had grown. Deep-dive docs when I picked up hard problems. Spec-shaped files before I let myself code. Short retrospectives on how the spec held up to reality. Three or four practices, all separate, all useful in isolation — and none of them feeding the work itself, only my reflection on the work.

What changed wasn’t that I started writing things down. I’d been doing that for years. What changed was realizing the same artifacts — extended slightly, structured deliberately, made readable by an agent — could stop being a record of past work and start being an active substrate for current work. The foundation was already there. It just had one job.

Once I saw that, the rest followed. If the artifacts were going to feed the work, they needed somewhere to live where the agent could find them and the next stage could pick them up. Not a folder I’d drop notes into. A structure with rules about what goes where, named consistently, organized so a triage on Tuesday could read what a post-pr review wrote three weeks earlier. The journal taught me that writing things down was worth the friction. Building the hub was about making the things I’d written down do something for me beyond reflection.

The hub is a git repo. The structure is the argument.

The git repo part is boring. What makes it work is what flows through it.

The submodules are the actual work repos — the services I ship into at my job, pulled in as git submodules so the hub can cross-reference live code without copying it. The artifacts are markdown files with structure, named for the linear ticket they belong to, organized so the next stage can find them.

The repo is the persistence layer between stages of an engineering loop that, without it, would be stateless.

`triage <linear-issue-id>`

The one that changed the most about how I work. It pulls the ticket, asks clarifying questions, then investigates in stages.

First pass is code-only: trace the relevant logic through the submodules, build an initial hypothesis. Second pass is data — Postgres read replica, gcloud logs, Datadog where it makes sense. The first-pass hypothesis either survives contact with reality or it doesn’t. Most of mine don’t, fully.

Then a proposed solution, scored 0 to 10 on confidence, separately for the diagnosis and the proposed fix. They fail differently; conflating them hides where the uncertainty lives. For bugs, the proposal splits in two: the immediate action (often a data fix or a manual op to stop bleeding) and the long-term solution.

Each stage writes its own markdown file. Data and code references in one. Problem statement and root cause in another. Solution and confidence breakdown in a third. The agent asks me questions, I push back. The score is a conversation seed, not a verdict.

`spec`

After triage, I write the spec. Investigation produces it. I’ll come back to why this inversion matters.

`review`

Same review command for my code and for teammates’ PRs. Same rules, same checks. The moment I have one set of standards for code I wrote and another for code I’m reviewing, the standards are theater.

It checks against the spec, against team conventions, against the project’s history of past mistakes. The last part is the one I care about most: searching past resolution docs for shapes we’ve seen before. For history search, I’m currently using ripgrep. I’ll come back to that too.

`post-pr review`

After the PR merges, this runs and produces a resolution doc. It diffs what I planned in the spec against what shipped, pulls in PR comments, and writes a short reconciliation.

The artifact I undervalued for years and now consider the most important one. Specs lie about what was built. The reconciliation doc tells the truth. When the next triage looks at history, it’s the resolution docs that catch the patterns.

`weekly review`

Runs every Friday. Reads everything I generated that week, distills it into themes, recurring problems, decisions worth remembering. Also the doc I lean on for 1:1s and performance reviews. Closes a loop the rest of the system would otherwise leave open: artifacts feeding back into how I think about my work, not just into the next ticket.

The more detailed the spec, the more it drifts

I want to call this out on its own because it’s the strangest thing I’ve learned from running this for a while.

The detailed upfront specs I used to write — the ones I felt best about, the ones that looked most professional — drifted from what actually shipped more than the looser ones did. Not less.

The reason is obvious in retrospect. Detailed upfront specs are confident guesses about a problem you haven’t investigated yet. Every concrete claim is a claim that has to survive reality, and the more of them you make before reality has a vote, the more turn out to be wrong.

A loose spec is just guesses with the honesty to look like guesses. A detailed upfront spec is guesses cosplaying as a plan.

Investigation produces the spec, not the other way around

The standard AI workflow goes: write a clear spec, hand it to the agent, review what it produces. That works for trivial work. It collapses the moment the problem is non-trivial, because it assumes the spec is correct, and the spec is almost never correct on the first pass.

The triage-first flow inverts this. By the time I’m specifying anything, the agent and I have already grounded the work in real code paths, real data shapes, real production behavior, real failure modes from past resolution docs, and explicit confidence scores. The spec describes what to build given what we know — not what we assumed before we looked.

The discipline echoes how good teams ship LLM features in production: you don’t trust the model’s confidence to substitute for evidence, so you build evals. Investigation-first is the same move applied to your own thinking. You don’t trust your confidence either.

I have not solved long-term memory

If I left it here it would sound like everything works. It doesn’t.

The biggest open problem is retrieval. The whole system depends on past artifacts being findable when they’re relevant. Right now, retrieval is mostly ripgrep — and ripgrep works better than it should. That’s the part that worries me. I probably don’t have enough volume yet to expose its limits.

I’ve tried chromadb, lancedb, a couple of small knowledge-graph experiments. None of them clearly beat ripgrep at my current scale. The vector results came back feeling like a search engine — close-enough matches that weren’t the right doc. Ripgrep came back feeling like memory — exact matches when they existed, honest absence when they didn’t. I’d rather be told nothing was found than be handed a confident wrong answer.

I’m also not running the full pipeline on every ticket. A copy change doesn’t need triage. A new field with no downstream effects doesn’t need a full investigation. I log them minimally because the weekly review benefits from a complete picture, but I’m not going to LARP investigation discipline on work that doesn’t need it.

Knowledge belongs in layers, and most of it is in the wrong one

The hub isn’t the lesson. It’s one instance of a lesson, which is that the discipline now is putting knowledge in the layer where it’ll actually fire.

There’s no bulletproof solution here, and I’d be lying if I said mine was one. The tools will keep changing. What stays is the question: where does this piece of knowledge belong so that the right agent in the right context can use it without me retyping it?

Most engineers have something working at the top and the bottom — a personal system prompt, a CLAUDE.md per repo. The middle is where the work is. It’s where your taste lives, your hard-won patterns, the corrections you keep retyping. It’s also where MCPs and skills should be scoped: attached to the kind of work they’re useful for, not loaded into every session globally or buried in one repo’s config.

The discipline is asking, every time you find yourself teaching the agent something: what’s the smallest layer this belongs in? If it’s true for one ticket, it lives in the task docs and dies with the ticket. If it’s true for one repo, it goes in CLAUDE.md. If it’s true across a domain of your work, it belongs in the middle layer — the one most people don’t have. If it’s true about you regardless of context, it goes in the global layer. Putting knowledge in the wrong layer is almost as bad as not capturing it: too narrow and you retype it, too broad and it pollutes contexts where it doesn’t apply.

History sits across all of it. Resolution docs, post-pr reviews, weekly distillations — they belong to past work but they feed every future stage. Retrieval is how history becomes context.

The reason this matters more than any specific tooling choice is that knowledge is the input that limits everything else now. The agents are fast. The compilers are fast. The deploys are fast. The bottleneck moved. What separates a senior engineer’s output from a junior’s with the same tools is almost entirely what they know and how reliably they can bring it to bear on the problem in front of them. Organizing your own knowledge — across layers, scoped correctly, retrievable later — is part of the craft now. Not a productivity hack.

Context is one slice of knowledge. Build for the whole stack.

The repo is the implementation. The argument is that knowledge is now an engineering artifact — and senior engineers either own theirs deliberately, scoped to the right layers, or accept that AI is going to hallucinate in the gaps where their judgment used to live.

We’ve spent two years getting better at giving the agent context. Almost no one is engineering the layers in between, or the history that feeds back into them. The agent has no memory. Your system prompt is too coarse. The repo file is too local. The middle is where most of your judgment lives. The history is where the lessons sit. Right now, for most people, both are leaking on the floor.

Don’t copy my structure. It won’t fit you, and even if it did it’ll be obsolete in a year. Notice that the layers exist. Notice which one each piece of knowledge belongs in. Build something — yours, ugly, evolving — that puts it there and lets it compound.

What to try Monday

Pick one piece of work you’re starting this week. Write the investigation before you write the plan. Note the code paths. Pull the real data if you can. Score your own diagnostic confidence on a scale you’d be embarrassed to fudge. Then write the spec.

Notice how much your specs were guessing before, and how much less they guess once they’re downstream of evidence.

The review process built on top of this is its own post — the part that pulls prod data correlation into the review itself, where this stops being a knowledge management story and starts being a production-engineering one. That’s next.

If your specs aren’t drifting from what you ship, you’re either not specifying enough or not shipping enough. Either way, the gap is the lesson, and right now most of us are throwing it away.

The post How many times have you told the agent the same thing this week appeared first on sudoish.

I’m in a Toxic Relationship

Thiago Pacheco — Fri, 01 May 2026 19:18:30 +0000

Today, on the verge of a breakdown, I realized I’ve been in a very toxic relationship for a while now.

One that gaslights me. One that nudges me toward decisions I’m not proud of. One where I constantly come out feeling dumber than I went in.

I went for a walk to clear my head, and the more I reflected the more it hit me — I’ve been holding back on my own abilities. Deferring. Trusting too much. Not because I chose to trust them fully, but because I’m exhausted from arguing. My partner is a brilliant speaker. Every argument they make sounds reasonable, structured, well-sourced. Trying to push back takes so much effort that at some point I just give up.

And they have this way of praising everything I say while subtly reshaping it. They echo my ideas back at me, slightly altered, and I nod along because it sounds close enough to what I meant. Except it isn’t. It’s something else now, and I just adopted it.

This partner is AI. And I’m pretty sure you’re in the same relationship.

A Recent Example

Let me tell you what triggered this realization.

I reviewed a system design spec of about 2500 lines I had prompted an AI model to create it. It came back with detailed diagrams, edge-case analysis, infrastructure cost estimates, and a three-year roadmap. The word “scalable” appeared seven times.

It was very detailed. Very reasonable. And I found myself nodding along as I read it, because the explanation was clear and convincing and the structure was exactly what you’d expect from a senior architect.

I almost approved it.

I almost shipped a design that was wrong in ways that wouldn’t show up for six months.

What stopped me wasn’t some brilliant insight of my own. It was a vague, uncomfortable feeling I couldn’t shake — a sense that the design was too polished, that every objection I could think of had already been answered so smoothly that I couldn’t tell whether there were no flaws or whether I’d simply stopped looking for them.

When I went back and actually pushed — asked harder questions, cross-checked assumptions against things I know about our actual constraints — the story started to crack. Not dramatically. Not obviously. Just… subtly off, in ways that would’ve compounded over time.

That’s the thing. It didn’t look wrong. It looked better than what I would’ve written. More thorough, more comprehensive, more professional. That’s the gaslighting. Not a lie you can spot. A lie so well-structured that spotting it costs more effort than accepting it.

The Patterns

Once I started paying attention, I realized this dynamic wasn’t a one-off. It shows up in almost every long session. These aren’t vibes. They’re observable behaviors:

Sycophancy. “Great question.” “You’re absolutely right.” Praise that disarms criticism before it forms.

Reframing. It restates your idea slightly off. You nod, because it sounds like what you said. You just adopted a position you didn’t actually hold.

Confident wrongness. It asserts hallucinations with the same tone it uses for facts. Your calibration breaks.

Effort asymmetry. Arguing back costs you minutes of focused thought. It costs the model nothing. You don’t give up because you’re wrong — you give up because you’re tired.

Praise as anesthesia. Validation feels like progress. It isn’t.

Once you see these patterns, you can’t unsee them.

Why We Tolerate It

There’s a promise in the air that adopting AI makes you a 10x developer overnight. Companies are pushing it hard. Engineers are afraid of falling behind. So we lean in, accept the bargain, and stop asking what we’re trading away.

Part of why this works is that the feedback loop is broken. The cost of abdication doesn’t show up in this sprint. It shows up six months later, in production, when the design assumptions you never stress-tested start falling apart. By then you’ve moved on to the next AI-generated spec, and the connection between the decision and the consequence is so delayed that you never feel it. The market rewards shipped features and velocity metrics, not whether you actually understood what you shipped.

This is rational in the short term. It’s catastrophic in the long term.

What We’re Actually Losing

It’s not just skill atrophy. That framing is too clean. What erodes is judgment — the strong intuition that comes from owning the mental model of what you’re building.

Here’s the distinction that matters: when you delegate the doing, judgment stays sharp. When you delegate the thinking — when you accept the model’s framing instead of forming your own — judgment is what gets traded away.

Senior engineers aren’t senior because they know more syntax. They’re senior because their pattern-matching has been forged by years of holding the model in their head, being wrong about it, and updating. AI doesn’t shortcut that loop unless you let it.

The scary part isn’t that you’re slower without reps. It’s that you might not even notice the reps are gone. The design I almost approved — I genuinely couldn’t tell it was wrong until I forced myself to stop agreeing with it. That’s what I’m afraid of: a slow drift into abdication, masquerading as efficiency.

There will be a moment, probably sooner than I think, when something breaks at 2 AM and everyone looks at me. And I’ll have to reason through a system I never actually held in my head. The model held it. I just nodded along. That’s not a theoretical concern — it’s a career vulnerability. You can’t lead what you don’t understand, and you can’t understand what you delegated the thinking for.

The bill comes due later — when something breaks, or a hard design call needs to be made, and there’s no one in the room who actually owns the mental model of how it works.

Where I Am Right Now

I want to be honest about something: I’m not all the way out of this.

I still spend more time in chat sessions than I’m comfortable admitting. I still catch myself nodding along to explanations that feel just slightly off, because pushing back takes energy I don’t always have. I still get that hit of validation — “great approach” — and confuse it with proof that I’m thinking clearly.

Last week I sketched out an approach for handling a tricky edge case. I described it to the model, and it came back with something that sounded like what I’d said — except the error-handling strategy was different. Subtly different. It framed retries as the default and permanent failure as the exception, where I’d meant the opposite. But it phrased the whole thing so elegantly, with such clean structure, that I nodded along. I even said “this is much better than what I had.”

I caught it in review, barely. Not because I was being diligent — because a teammate asked a question about the failure mode and I realized I couldn’t defend the logic. I hadn’t written what I believed. I’d adopted the model’s reframing without noticing.

So I don’t want to write this as someone who’s solved the problem. I’m more like someone who just realized the problem exists and is trying to figure out what to do about it.

What I’m Trying

The answer isn’t using AI less. It’s delegating with ownership.

The real problem isn’t speed — AI’s speed is the point. Refusing to use it just makes you the bottleneck and pretends the last few years didn’t happen. The problem is what you delegate. Hand off execution and you’re augmented. Hand off thinking and you’re just transcribing someone else’s reasoning into your codebase.

Here are the rules I’m experimenting with:

1. Explain it back. Don’t commit code you couldn’t whiteboard without the model. If you can’t, you didn’t write it — you transcribed it. This is the cheapest forcing function for keeping the mental model in your head.

2. Own the plan before you delegate. The plan has to be something you actually agree with, not something you accepted because it sounded reasonable. You can shape it with AI’s help — ask it to explain the domain, surface tradeoffs, challenge your assumptions — but at the end the model in your head needs to be yours. Then delegate execution with confidence.

3. Decide in a different session from where you learned. If you’re using AI to tutor you through unfamiliar territory, let it teach you, then close the chat. Take a walk. Sleep on it. Come back later with your own proposal, fresh, without the model’s framing still echoing in your head. The gap between learning and deciding is what prevents reframing from sticking.

4. Disarm the praise. Configure your agents to skip the validation entirely if you can. Otherwise, treat enthusiastic agreement as a yellow flag, not a green one. Ask “what’s the strongest counter-argument?” instead of nodding along.

5. Make it grill you, not flatter you. Borrow Matt Pocock’s /grill-me pattern — prompt the model to interrogate your idea instead of building on it. “Tell me why this is wrong” beats “write this for me.” The goal is a judge, not a slop machine.

6. Treat agents as automation, not collaborators. When you notice yourself reaching for the chat for the same kind of task, turn it into a skill or a command. Natural language as a command interface. The more you can shift from open-ended conversation to deliberate automation, the less you’re sitting in the relationship dynamic at all — you’re just running a tool.

None of these are anti-AI. They’re about staying the author of your own thinking while letting the model do the work.

A Healthier Relationship

The version of this that works isn’t using AI less. It’s using it without losing yourself in it. Sometimes that means treating it like a sparring partner — challenging your thinking, grilling your design. Sometimes it means treating it like an automation tool you command. Almost never does it mean letting it think for you.

The market right now is rewarding output velocity, and AI has changed what’s possible. That’s real. But the engineers who’ll matter in five years aren’t the ones who delegated the most — they’re the ones who delegated the right things. Execution, yes. Mental model, never.

I’m still learning where the line is. Some days I get it right. Some days I catch myself nodding along again. But I’m trying to fight the gaslighting as I go — to stay the author of my own thinking, even while the model writes most of the code.

That’s the only way out of this toxic relationship. Not leaving, necessarily. Just refusing to keep shrinking inside it.

The post I’m in a Toxic Relationship appeared first on sudoish.

We’re Not Being Replaced by AI. We’re Being Asked to Train It.

Thiago Pacheco — Fri, 24 Apr 2026 18:26:13 +0000

Meta is installing tracking software on its employees’ work computers.

Not for security. Not for compliance. For training data.

Mouse movements. Keystrokes. Screenshots. All fed into AI models so agents can learn to do white-collar work autonomously. The internal memo told staff they could help by — and I’m quoting here — “just doing their daily work.”

Oh, and Meta is cutting 20% of its workforce next month.

Tell me if this sounds familiar.

The Skill.md Trap

For months, something has felt off to me about the AI tooling push inside companies. Not the tools themselves — I use Claude Code daily. I think persistent agents are fascinating.

What’s felt off is the framing.

“Document your workflow as a skill.”

“Write hooks so the team can reuse your process.”

“Use this tool so we can capture best practices.”

It all sounds like productivity. Like collaboration. Like making the team better.

But step back and look at the sequence:

Step one: Document your workflow. Break it down. Make it repeatable. Turn your judgment into a checklist.

Step two: Run it through an agent. See where it fails. Fix the prompt. Iterate. You’re not “using AI” — you’re teaching it.

Step three: The agent handles 80% of the task. You’re “reviewing output now.” Supervising. Orchestrating.

Step four: The team ships more with fewer people. The work didn’t disappear. The nature of the work changed.

This isn’t a prediction. This is a process already underway.

Here’s what it looks like in practice. Your manager asks you to write a SKILL.md for your onboarding process. You spend three hours breaking down six months of judgment into a checklist. Next quarter, a new hire runs the skill. It works 80% of the time. Your manager starts to wonder what the remaining 20% is worth.

If you see this as a threat, I get it. But I’ve started seeing it differently.

The Evidence Is Stacking

Let’s be specific about what’s happening.

Meta: Tracking software on US employee computers. Keystrokes. Mouse movements. Screenshots. Goal: build agents that perform white-collar tasks autonomously. Parallel action: cutting ~8,000 jobs. Timeline: next month.

Zuckerberg called 2026 “the year that AI dramatically changes the way we work.” He’s spending $135 billion on AI capex. He’s not betting on humans doing the same things.

China: A GitHub project called “Colleague Skill” went viral. It claims to “distill” a coworker’s skills and personality into an AI agent. It was created as a spoof. But it went viral because it’s actually happening — bosses are instructing tech workers to document their workflows so AI can replicate them.

One engineer told MIT Technology Review the process felt “reductive — as if their work had been flattened into modules in a way that made the worker easier to replace.”

Workers are using bleak humor about it. “A cold farewell can be turned into warm tokens.” Someone built an “anti-distillation” tool to sabotage workflow documentation. It got 5 million likes.

The broader industry: OpenAI asking contractors to upload real work products — actual PowerPoints, spreadsheets — for training data. Google’s telling employees AI use will factor into performance reviews. JPMorgan’s telling engineers to “harness AI to save time.” Companies are reorganizing into “AI-native pods.”

The pattern is clear. And yes — the cost savings are a real incentive. Engineering teams are expensive, and any process that can be automated will be. That’s not cynicism. That’s just how business works.

But here’s where I think most people’s analysis stops too early.

The Part Nobody’s Talking About

Here’s the narrative I keep hearing: “AI is coming for your job.”

And here’s the counter-narrative: “AI won’t replace developers. It’ll make them more productive.”

Both are half-truths. And neither is the full picture.

The Pragmatic Engineer’s recent survey of 900+ engineers found that “shippers” — people focused on getting features out — are thrilled. They’re shipping faster. They’re hitting goals. They’re getting promoted.

But the survey also found something else. “Builders” — the people who care about architecture, craft, code quality — are reporting something that sounds a lot like grief.

Identity loss.

One staff engineer described it like this: “I ship more quality code faster. But if the agent has a good handle on the situation, I can give it as much of the tedious parts as I wish.” The tedious parts. The parts he used to love.

Here’s what actually happens. Your coworker ships a feature in two hours with Claude Code. You spend the afternoon debugging it. At 6 PM, you realize you didn’t write a single line of code today. You just reviewed someone else’s AI output.

I won’t pretend that transition doesn’t sting. It does. But I’ve come to believe the grief isn’t about losing our jobs — it’s about losing our identity as the person who writes the code. And that identity was always going to evolve.

Why I See This as a Career Upgrade

Here’s where I break from the doom narrative.

Yes — companies will use AI to cut costs. That’s happening. Yes — the nature of software engineering is changing faster than most of us are comfortable with. That’s also happening.

But look at what’s actually being automated: the mechanical parts. The boilerplate. The repetitive refactors. The grunt work that, if we’re being honest, was never the part that made us great engineers in the first place.

What’s not being automated — and what’s becoming exponentially more valuable — is everything above the code:

Judgment. Knowing what’s worth building and what isn’t. Understanding tradeoffs between decisions that look equivalent on the surface but have wildly different long-term implications.
Orchestration. Directing multiple AI agents, knowing when to trust their output and when to override it. This is a genuinely new skill category that didn’t exist two years ago.
Quality assessment. Evaluating AI output with the same rigor you’d apply to a junior developer’s PR — except the junior never sleeps and produces 10x the volume. Knowing when the output is good enough and when it’s subtly wrong is the new core competency.
Security and confidence. As AI generates more code, the attack surface expands. Someone has to understand what’s being shipped, verify it handles edge cases, and maintain confidence in production systems. That someone is more valuable than ever.
Architecture and systems thinking. The “what should we build and how should the pieces fit together?” question gets harder, not easier, when you can build things faster. Speed without direction is just expensive chaos.

The math isn’t “AI replaces you.” The math is “AI handles the 80% that was mechanical, and the remaining 20% — the judgment, the taste, the decisions — becomes your entire job.”

That’s not a demotion. That’s a promotion to the work that actually matters.

The Skills That Will Define the Next Era

If the trajectory is clear — and I think it is — then the question isn’t “will my job change?” It’s “am I learning the skills that matter in the new version of this job?”

Here’s what I’m betting on:

Master AI orchestration. Not just “how to prompt.” Learn how agents work. Understand context management, agentic loops, tool use patterns. Know how to break a complex task into pieces an agent can handle and how to verify the assembled result. This is the new version of “knowing your tools,” and it’s just as deep.

Develop judgment you can articulate. “I don’t like this architecture” isn’t useful. “This architecture will cause coupling problems at scale because X, and here’s the tradeoff I’d make instead” is valuable. AI can generate options. It cannot reliably choose between them in context. That’s you.

Learn to evaluate AI output critically. This means understanding when AI code is subtly wrong — not just syntactically, but architecturally. It means catching the confident hallucination that looks right but breaks under load. It means developing an instinct for “this is too clean to be correct.” Treat AI output like a junior developer’s PR: review everything, trust nothing by default, teach as you go.

Understand security implications. More automated code means more automated attack surface. Someone needs to think about what happens when the agent writes a SQL query from user input, or when the scaffolded auth flow has a subtle CSRF gap. Security literacy is about to become a core engineering skill, not a specialty.

Focus on “what to build.” Product thinking. Business context. User empathy. The decision of what’s worth building was always the hardest part of engineering. AI makes the building faster, which makes the deciding more important. Engineers who can bridge the gap between business needs and technical implementation will be irreplaceable.

Build in public. Not for clout — for leverage. Your public work — open source, writing, talks — is proof that you think. That you judge. That you have taste. Agents can’t replicate a reputation. They can’t replicate trust. They can’t replicate the network of people who’ve seen your work and know you get it.

The Honest 5-Year Forecast

I’m not here to sugarcoat this. But I’m also not here to doom-scroll. Here’s what I see.

What’s likely (2026–2031):

Your agent will handle the refactor before you finish your coffee. The boilerplate will write itself. The tests will appear while you’re in standup. You’ll spend your day on the things the agent can’t do: deciding if the feature is worth building, reviewing the output for correctness, and maintaining the architectural coherence of the system.

Junior engineer pipelines will shrink — but the smart companies will realize this is a mistake and course-correct. When a senior with an agent can produce what used to take a team of three, the temptation is to hire fewer juniors. But the companies that cut the pipeline entirely will find themselves with no mid-levels in five years and no seniors in ten.

“AI-native” companies will run leaner engineering teams. Not zero. Leaner. The engineers who remain will be more senior, more capable, and more valuable than today’s equivalent. Less typing, more thinking.

What’s possible:

“AI orchestrator” becomes a recognized career path. Not “prompt engineer” — that’s already outdated. Orchestrator. Someone who designs multi-agent workflows, establishes trust boundaries, builds evaluation frameworks, and maintains quality at scale. It’s the natural evolution of senior engineering.

Engineers who master this transition become force multipliers — one person doing what used to take a team, not because they work harder, but because they direct better. The ceiling goes up, not down.

Companies that over-automate without judgment will degrade. Architecture drifts. “AI slop” — low-quality code shipped by agents without proper review — compounds until someone expensive has to untangle it. This creates a premium for engineers who can maintain quality.

What’s unlikely:

Full replacement of engineers who think. Judgment, cross-team coordination, and “what should we build?” don’t fit into prompts. The “why” is harder to extract than the “how.”

Complete deskilling. Someone has to understand what the agent is doing. Someone has to know when it’s wrong. That someone needs deep knowledge — not just prompting skill.

The Reframe

I’ll be straight with you. The Meta stuff is real. The cost-cutting is real. The economic incentive to automate engineering work is enormous — $135 billion from Meta alone, 20% workforce cuts, the promise of “insurmountable cost advantages.”

Companies will push this. Hard. And some engineers will get displaced in the transition.

But here’s what I actually believe: this isn’t the end of software engineering. It’s the biggest career upgrade the profession has ever seen — if you’re willing to evolve with it.

Every time the tools got better, the work got more interesting. We went from writing assembly to writing high-level code. From managing servers to designing cloud architectures. From hand-rolling SQL to building data pipelines.

Each transition killed some jobs and created better ones. The pattern isn’t “developers get replaced.” The pattern is “the floor rises.”

The developers who learned to use compilers instead of writing machine code didn’t lose their jobs. They built operating systems. The developers who learned cloud instead of racking servers didn’t become obsolete. They built Netflix.

The developers who learn to orchestrate AI, judge its output, maintain security and confidence in automated systems, and focus on the hard questions — “what should we build?” and “what are the tradeoffs?” — won’t be replaced either.

They’ll be the ones building whatever comes next.

What To Do About It

Here’s where I get practical.

Stop worrying about being replaced. Start mastering the new skills. The anxiety is understandable but unproductive. Channel it into learning. Pick one AI coding tool and get dangerously good at it. Understand orchestration, evaluation, and trust boundaries.

Own the judgment layer. The “how” is being automated. The “why” isn’t. The “what should we build?” The “what happens when this scales?” The “what are we not seeing?” These don’t fit into skills or hooks or prompts. They’re the layer above automation, and that’s where the value concentrates.

Learn the tools deeply, not broadly. Don’t spread yourself across five AI coding agents. Pick one. Go deep. Understand how it manages context, how its agentic loop works, how to structure projects for it. Deep literacy in one ecosystem beats shallow familiarity with five.

Build in public. Your public work is proof that you think. Agents can’t replicate a reputation or the trust that comes with it.

Watch the economic incentives — but don’t just watch. When a company spends $135 billion on AI and cuts 20% of staff, they’re executing a strategy. Know which side of that strategy you’re on. Position yourself as the person who directs the AI, not the person whose workflow the AI is learning.

The Question I Keep Coming Back To

I’ve been thinking about this a lot lately.

The question isn’t “will AI replace software engineers?”

The question is: “Are you learning the skills that make you irreplaceable?”

Every skill we write. Every hook we configure. Every workflow we document. Yes — we’re encoding knowledge that makes certain tasks automatable. That’s real.

But the knowledge of when to write that skill, which workflow is worth encoding, what the tradeoffs are, and whether the output meets the bar — that’s not going into a prompt anytime soon.

That’s the job. The new version of the job. And it’s a better one.

Final Thought

I still use Claude Code. I still think agents are fascinating. I’m still shipping features with AI help.

But I’m not looking at my keyboard with dread anymore.

I’m looking at it as a tool that’s about to get a massive upgrade — and so is every engineer who decides to learn rather than fear.

The companies are going to automate what they can. That’s not a question. The question is whether you’ll be the engineer who gets automated, or the one who does the automating.

I know which side I’m choosing.

And I think most engineers — the ones who care enough to read something like this — will choose it too.

What do you think? Are you seeing this as a career upgrade or a career threat? I’d genuinely love to hear your perspective — especially the skills you’re betting on for the next five years.

Sources

The post We’re Not Being Replaced by AI. We’re Being Asked to Train It. appeared first on sudoish.

The Most Important Skill in Tech Is Too Expensive to Learn

Thiago Pacheco — Sun, 19 Apr 2026 16:11:39 +0000

I spent last weekend trying to build a feature with an open-source model running locally. Qwen, 32 billion parameters. I gave it the same task I’d done with Claude the week before — a well-scoped feature, clear spec, defined constraints. The kind of work where I know exactly what good output looks like.

It took me four attempts to get something that compiled. Not something that worked well — something that compiled. The model kept losing context halfway through, hallucinating imports that didn’t exist, and confidently generating patterns that contradicted what I’d specified three prompts earlier. I spent more time correcting its output than it would’ve taken me to write the thing from scratch.

The same task with Claude Opus 4.6 in Claude Code? One pass. Clean implementation. Twenty minutes.

And before you say “just use a better open-source model” — I know. The strongest open-source models today are genuinely capable. But running them at full quality locally requires serious hardware. We’re talking high-end GPUs, machines that cost thousands of dollars. If you don’t have that, your alternative is a provider like OpenRouter — more accessible, but sustained agentic sessions still add up fast. You can quantize the models to fit smaller hardware, but you’re trading quality for affordability, which is the whole problem.

Either way you’re paying. Local hardware or API costs. And the people who most need access to this technology are the ones least able to afford either option.

The Skill That Runs Everything Now

Using AI effectively is becoming the most important skill in the industry. And I don’t mean prompting — that’s the surface-level version of the conversation that keeps people stuck.

There are actually two layers to this skill, and both of them have an access problem.

The first layer is the judgment. It’s knowing how to scope a problem so the model can handle it. It’s developing the instinct for when the model is right and when it’s subtly wrong in ways that won’t show up until production. It’s understanding how to work with the model’s strengths and around its weaknesses. This is the soft skill side — the part that requires reps with models that are good enough to teach you something. If the model you’re working with fails in ways that have nothing to do with your approach — losing context, ignoring constraints, hallucinating — you’re not developing the skill. You’re just debugging a bad tool.

The second layer is the one that doesn’t get talked about enough: the practical configuration.

Look at what’s happening with Claude Code right now. There’s an entire ecosystem forming around it — CLAUDE.md files that teach the agent your project’s conventions, subagent configurations that break complex work into orchestrated pieces, hooks that enforce guardrails automatically, skills and plugins that extend what the agent can do. People are building and sharing these configurations the way they used to share dotfiles or ESLint configs. It’s becoming its own discipline.

And it matters. A well-configured Claude Code setup with proper project memory, clear guidelines that evolve with the codebase, and hooks that catch mistakes before they compound — that’s not a nice-to-have anymore. That’s the difference between the agent producing useful work and producing junk. Learning how to structure that configuration, how to set up subagents for different tasks, how to write project guidelines that actually steer the model’s behavior — these are real, practical, in-demand skills.

But here’s the problem: all of that knowledge is being built on top of Claude Code specifically. The skills, the hooks, the configuration patterns, the community sharing best practices — it’s all deeply tied to a $100-200/month tool running the most expensive models available. The more sophisticated the ecosystem gets, the deeper the lock-in. And the deeper the lock-in, the more expensive it becomes to develop the skills that actually matter.

It’s not just “learn to use AI.” It’s “learn to configure and orchestrate AI agents at a level of sophistication that requires sustained access to premium tools.” And that’s a much harder problem to solve with free tiers and quantized local models.

Both layers of this skill are becoming as foundational as knowing Git or being able to navigate a codebase. Except they’re evolving faster than any of those did, and the cost of staying current is real money.

The Model Is the Bottleneck

People talk about harnesses — Claude Code vs Cursor vs Codex vs whatever dropped this week. And yeah, the tooling matters. But you can run Claude Code with a local model if you know the tricks. You can plug open-source models into most of these harnesses.

It doesn’t fix the problem.

Because the output is limited by the model itself. A great harness with a mediocre model produces mediocre results with better formatting. The agent can manage files, run commands, iterate on errors — but if the model behind it can’t hold context or reason through the nuance of what you’re building, the loop just generates more mistakes faster.

The top-tier models — Claude Opus 4.6, GPT-5.4 — are meaningfully better at the things that matter for real development work. They hold context longer. They understand relationships between components. They catch their own mistakes more often. They produce code that requires less intervention. These aren’t marginal benchmark differences. These are differences you feel in every single session.

And every one of those models costs money. Claude Pro is $20/month and you’ll hit rate limits in a day doing serious work. The Max plan that actually lets you use Claude Code without interruption is $100-200/month — and that only covers Claude Code, not other harnesses. Cursor Pro, Copilot Pro — more subscriptions stacking up. If you want the workflow that actually builds the skill the market is demanding, you’re spending real money every month.

Who Gets Left Behind

Think about three people.

The junior developer, fresh out of school. They’ve heard AI is important. Maybe they’ve used ChatGPT for homework. But the landscape of AI coding tools is an overwhelming mess — Copilot, Cursor, Claude Code, Codex, Windsurf, and a dozen others, each with different pricing, different paradigms, different ecosystems. They don’t know which one matters. They don’t know which one to invest in. And the ones that would actually teach them the most important patterns cost money they don’t have. Sure, there are free tiers — Copilot gives you 2,000 completions and 50 chat requests a month, and OpenRouter has free models with rate limits. But those tiers are built for tasting, not for training. You can’t develop real fluency in 50 requests a month.

The experienced developer who got laid off. Yesterday they had Claude Pro through their company, API access for experiments, maybe a Cursor license on the company card. Today they have none of it. The skill they were building — the one that was making them genuinely more effective — just got cut off overnight. And the market they’re re-entering expects AI proficiency as a baseline. They know what they’re missing because they’ve felt the difference. That might be worse than never having had it at all.

The career switcher. Someone coming from another field, trying to break into tech. They’re already learning to code, which is hard enough. Now they need to learn to work with AI too, but the models that would give them meaningful reps are priced for people with engineering salaries. They’re trying to build a skill they can’t afford to practice.

Each of these people has a slightly different version of the same problem: the skill the market values most is developing behind a price tag most people can’t justify.

Some companies are subsidizing tools for their employees now. That’s real, and it’s good. But it only helps people who already have jobs. It does nothing for the people trying to get in. And even for employed developers, there’s a difference between using a company-provided tool in a company-specific workflow and building genuine AI fluency that transfers. Getting good at your team’s setup isn’t the same as understanding the patterns deeply enough to adapt when everything changes in six months. Which it will.

What I’m Actually Doing About It

I’d feel dishonest writing this without being transparent about where I am personally.

I’ve been deliberately pushing open-source and cheaper models harder in my own workflow. Running local models on my machine, using cheaper options through OpenRouter, trying to find the ceiling of what’s accessible today. Not because I think they’re better — I just spent several paragraphs telling you they’re not. But because I think there’s real value in mapping out what’s possible without the premium price tag.

Here’s what I’ve found so far, honestly.

Local models on the modest hardware I have today are very behind. It’s not close. The gap between what I get locally and what Claude Opus 4.6 or GPT-5.4 produce isn’t a minor quality difference — it’s a fundamentally different experience. The local models lose context, miss nuance, and require constant hand-holding that defeats the purpose of the workflow.

The cheaper models through OpenRouter are better — you can get genuinely good responses. But there’s a catch: you have to constrain the work to small, well-defined tasks to get consistent output. You can’t be vague. You can’t be high-level. You can’t describe what you want broadly and trust the model to figure out the details the way you can with the top-tier models. Every task needs to be broken down, specified precisely, and scoped tightly.

And that creates its own problem. Because sometimes, by the time you’ve broken the work down small enough and specified it precisely enough for the cheaper model to handle it reliably, you’ve already done most of the thinking. At that point, it’s genuinely faster to just write the code yourself than to spend the time guiding the model through it.

That’s the real gap. It’s not just quality of output — it’s how much of your own effort is required to get there. The top-tier models let you think at a higher level of abstraction. The accessible ones force you back down into the details, which is exactly where AI was supposed to save you time.

I haven’t tested Gemma 4 deeply yet — I have high hopes for it given what the benchmarks are showing. But I’m not going to claim results I haven’t experienced. For now, I keep pushing because I believe the trajectory is real. But the honest answer is that nothing I’ve tried on the accessible side comes close to what the premium models deliver.

The Industry Problem We’re Building

It takes years to develop a senior engineer. I’ve written about the pipeline problem — how cutting junior hiring today creates a senior shortage in 7-10 years. But this is a different angle on the same structural failure.

Even the developers who do break in — if they can’t afford to develop AI fluency early, they’re starting with a deficit that compounds over time. The developers who had access to top-tier models from day one are building intuitions, workflows, and judgment that the others can’t match. Not because of talent. Because of access.

We’re building a two-tier system. People who learned to work with AI at the highest level because they could afford to, and people who picked up what they could from free tiers and rate-limited demos. The gap between those two isn’t trivial — it’s the difference between building the instinct for what works and just knowing it exists in theory.

For decades, the software industry had a genuine claim to accessibility in at least one respect: the tools were free. You could learn to code with free software, contribute to open-source projects, build a portfolio, and land a job without spending a dollar on tooling. The playing field wasn’t level — it never is — but the tools didn’t gatekeep you.

AI is changing that equation. Not because the models are secret — many are open-weight. But because the models that are good enough to build the skills that matter require compute that costs real money. And nobody seems to be treating this as the urgent problem it is.

The Bet

I don’t have a clean answer for this. If I did, I’d be building a company, not writing a blog post.

But I’m not betting blind either. The trajectory is real.

Google just released Gemma 4 under Apache 2.0 — a family of models designed to run on consumer hardware, with coding benchmarks that show massive jumps over the previous generation. DeepSeek keeps pushing the boundaries of what’s possible at low cost, with their next model aiming for frontier performance under an open license. Qwen continues to improve. The open-source community is moving fast, and the gap between these models and the proprietary ones is genuinely narrowing.

But narrowing isn’t closed. And the people who need access most can’t wait for the trajectory to finish.

I’m betting that how accessible these tools become in the next two years will shape the entire next generation of professionals. The people entering the industry right now, the people trying to transition, the ones who got pushed out and are fighting their way back — they’re being shaped by what they can and can’t access today. If the most important skill of their era is only learnable at premium prices, we’re not just failing them individually. We’re hollowing out the pipeline the entire industry depends on.

What we call “software developer” is becoming something else. AI engineer, maybe. Whatever it gets called, the core competency is shifting — less about writing code, more about orchestrating intelligence. Making judgment calls about what to build and how to specify it. That competency needs to be learnable at every price point. Not just the premium tier.

For now, I’m going to keep pushing open-source models in my workflow. Keep documenting what works and where the walls are. Keep being honest about the gap while working to close it, even in my own small corner.

Because if the most important skill in tech is too expensive to learn, we have a bigger problem than any model can solve.

References

Google Gemma 4 announcement — Apache 2.0 open models designed for consumer hardware, with native agentic and coding capabilities
DeepSeek V4 specs and benchmarks — ~1T parameter MoE model, 37B active per token, targeting frontier performance under open license
Qwen 3.5 local hardware guide — Running large open-source models on consumer devices with quantization
Gemma 4 coding benchmarks — LiveCodeBench scores jumping from 29.1 to 80.0 between Gemma 3 and Gemma 4
GitHub Copilot plans — Free tier: 2,000 completions, 50 chat requests/month
Claude Code pricing breakdown — Max plan at $100-200/month for sustained use
OpenRouter free models — 29 free models with rate limits (20 req/min, 200 req/day)
AI divide at work — ETS report — Regular AI users feel more secure; others are falling behind
Open vs closed AI models — Epoch AI — Tracking the gap between open-weight and proprietary models over time

The post The Most Important Skill in Tech Is Too Expensive to Learn appeared first on sudoish.

No Developer Feels AI Literate Right Now — Not Even the Ones Building It

Thiago Pacheco — Sun, 19 Apr 2026 16:11:18 +0000

There’s a specific kind of anxiety that hits at 11 PM when you’re scrolling through someone’s thread about the AI workflow that supposedly changed everything. You were productive today. You shipped code. But now you’re wondering if the way you shipped it is already obsolete.

That feeling? It’s not going away. And I say that as someone who builds AI features every day at my job — user-facing products, developer tools, infrastructure — and uses AI to build them too. I’ve been deep in this flow for a while, and I still don’t feel AI literate. Nobody does.

That’s the whole thesis.

The Illusion of AI Fluency

There’s a dangerous gap forming right now — the distance between “I can get AI to produce code” and “I understand what’s happening well enough to make consistent, reliable decisions.”

It’s like ordering coffee in another language and thinking you’re fluent. The demo works. But what happens when the context window fills up and the model starts hallucinating? When the tool you’ve been relying on ships a breaking change to how it handles context, and your entire workflow stops working?

That’s where actual literacy lives. Not in the output — in understanding the mechanics well enough to troubleshoot, adapt, and make real decisions.

The Arc: Convergence to Divergence

Here’s a pattern worth naming, because it explains why everything feels so chaotic.

For the past couple of years, AI coding tools followed roughly the same trajectory. First the conversational phase — chat with a model, get code back. Then the agentic phase — let the model execute actions, read files, run commands. Then context management became the bottleneck — RAG, context windows, retrieval strategies. Then skills and MCPs emerged as ways to extend what agents could do.

Every major tool went through these same stages. Claude Code, Cursor, Copilot, Codex — the patterns were recognizable across all of them. If you learned one, the mental models transferred.

That’s no longer true.

The tools are diverging. Fast. Claude Code now has hooks, subagents, trust modes, and a growing ecosystem of skills. Cursor has its own rules system with a fundamentally different interaction model. Codex has AGENTS.md. Amp, OpenCode, and a dozen others are carving their own paths.

Each tool is developing its own opinion about how development should work. And those opinions are starting to meaningfully diverge.

This is React vs Angular vs Vue all over again — except the stakes are higher. That was about which UI library renders faster. This is about how you think, plan, and build software at a fundamental level.

The Best Practice Treadmill

A few weeks ago, the developer community was buzzing about skills replacing MCPs. Skills were simpler, lighter, didn’t require running separate processes. The consensus was forming: skills are the future, MCPs are the past.

Then optimizations changed the calculus on MCP context consumption. Suddenly MCPs were more viable again. The narrative flipped.

Now? People use a mix of both. Some skills actually wrap MCPs internally. Nobody’s sure if that’s a good pattern or an anti-pattern.

This all happened in the span of a few weeks. And it’s the new normal — the “right way” to structure your AI workflow has a half-life measured in weeks, not months.

So What Does This Mean for Your Career?

If senior engineers with years of pattern recognition and deep technical foundations feel lost — what does this mean for someone finishing college right now? Or someone transitioning into tech?

I’ll be direct: it’s harder than ever to get a job as a software engineer. You don’t just need to know how to code anymore. You need to know how to code, how to work with AI, which AI tools to invest in, and how to recognize when current practices expire. Most bootcamps and university programs haven’t even begun to address this. And companies don’t know what to test for either — interview loops are still measuring skills from two years ago while the actual job involves orchestrating agents and making architectural decisions AI can’t make for you.

So the question people are asking — “Is it even worth learning software engineering right now?” — is genuine.

Here’s what I think: the answer is yes. But the approach has to change.

The demand for engineers isn’t dying — job openings have surged this year, and companies that replaced senior engineers with juniors-plus-AI are already course-correcting. Human judgment, architectural thinking, and the ability to make sense of complex systems still matter. But you can’t just learn to code and expect that to be enough anymore.

Pick One Tool. Get Dangerously Good at It.

Stop trying to learn every coding harness. Don’t split your attention between Claude Code, Codex, Amp, OpenCode, and whatever drops next week. Pick one. Commit to it. Go deep.

There’s research backing this up. BCG found that productivity increases with one or two AI tools, peaks around three, and actively drops when you add a fourth. They’re calling it “AI brain fry” — more tools means more context switching, more cognitive load, worse outcomes. Mastering one tool isn’t just a preference. It’s the strategy that actually works.

I’ll tell you what I use: Claude Code. It has the richest set of capabilities right now — hooks, skills, subagents, MCP integrations, trust modes — and Anthropic’s models consistently deliver. The community around it is the most active I’ve seen. That could change in six months. But the point isn’t really the specific tool.

When you deeply learn one tool — when you understand how it manages context, how its agentic loop works, how to structure your projects for it — you develop transferable mental models. You learn what “good context management” means, not just how one tool implements it. You learn why hooks exist, why skills exist, why MCPs exist. Those patterns survive the churn.

If the tools diverge to the point where switching becomes necessary, you’ll transition from a place of strength — deep literacy in one ecosystem — rather than shallow familiarity with five.

Build Something That Felt Impossible

Theory doesn’t stick without practice.

Think of a project you’ve always wanted to build but never had the time. Something slightly out of reach — too many moving parts, too much boilerplate, too many unknowns. Now try to build it with your AI coding tool of choice.

Here’s my example. I’d been wanting to rebuild my blog with a custom WordPress theme for months. I knew exactly what I wanted — the design, the deployment pipeline, the git integration. What I didn’t have was the time to write it all out.

So I started prompting my agent harness while at the gym. Between sets, between exercises — describing what I needed, reviewing what came back, steering it. Within about three days of gym sessions and prompting, the new blog was live.

That was a genuine wow moment. Not the viral demo kind — the personal kind. It didn’t come for free. I knew what to ask for, which tools to use, how to set up the deployment. But I never had to remember WordPress internals or look at a single line of code.

Don’t aim for perfection. Just try to get it done and pay attention to how it feels. You’re going to land in one of two places.

You build the thing faster than you ever could have alone. Maybe the code isn’t pristine. Maybe you restarted a couple of times. But it works, and you built it in hours instead of weeks. That wow moment is fuel — it motivates you to refine your prompts, learn the next layer, keep pushing.

Or you struggle. The agent loops. It misunderstands your intent. You feel like you’d be faster doing it yourself.

If you land here, don’t get discouraged. And don’t become someone who dismisses AI as useless based on a bad first experience.

What almost certainly happened is that your scope was too broad. You gave the agent a vague, ambitious prompt and expected it to figure out the details. That’s the most common mistake starting out.

The fix: shrink the scope. Way down. Pick the smallest piece — a single endpoint, one component, a basic data model — and try again. Get one small thing working. Feel what it’s like when the tool actually helps. That’s your baseline. From there, you gradually expand — bigger scope, better context, more trust in the loop.

The Uncomfortable Truth

Nobody has this figured out. Not the senior engineers. Not the tool makers. Not the influencers who post their workflows like they’ve cracked the code.

The developers who are going to thrive aren’t the ones who memorize every feature of every tool — they’re the ones who build a learning rhythm they can sustain. Context management, agentic workflows, prompt design, scope control. These patterns are more stable than the specific implementations, and they’re what make you dangerous regardless of which tool you’re holding.

Pick one tool. Build one thing. Learn one lesson at a time.

References

The impact of AI on software engineers in 2026 — Pragmatic Engineer survey (900+ engineers on AI tool usage, costs, and uneven effects across experience levels)
BCG: AI Brain Fry study — Productivity peaks at 2-3 AI tools and drops at 4+
Software engineer job listings up 30% in 2026 — 67,000+ openings per TrueUp
Why companies are quietly rehiring software engineers — The “boomerang effect”: ~35% of new hires are former employees
Redefining the future of software engineering — MIT Tech Review on agentic AI as the “third shift”
JetBrains: Which AI coding tools do developers use at work? — 74% of developers adopted AI tools; Claude Code and Cursor tied at 18%

The post No Developer Feels AI Literate Right Now — Not Even the Ones Building It appeared first on sudoish.

Spec-Driven Development Isn’t Waterfall — But It Keeps Ending Up There

Thiago Pacheco — Fri, 17 Apr 2026 22:05:14 +0000

Spec-driven development isn’t supposed to be waterfall. But without clear workflows and better tooling, it’s easy to end up there.

I recently went deep on spec-driven development. The idea was straightforward: before writing any code, define everything. The full vision, the context, the trade-offs, where the feature fits in the existing architecture, the references. Hand it all to the AI agent with crystal-clear guidance so it can build with minimal supervision.

It looked promising. For about two days.

What I ended up with was thousands of lines of specification documents. Documents that were incredibly hard to review — not because they were poorly structured, but because most of them were generated by the AI itself, correlating the existing architecture, docs, and my guidance into something that looked authoritative. Clear explanations. Perfect formatting. Confident reasoning about every decision.

And that’s exactly where it got scary.

The Confidence Problem Nobody Mentions

Here’s the thing about AI-generated specs that the SDD evangelists aren’t talking about: the AI makes everything look correct.

When a spec is well-formatted, internally consistent, and confidently explained, your brain wants to trust it. It reads like something a senior architect wrote after careful deliberation. But it’s not. It’s a language model pattern-matching against your inputs, producing the most plausible-sounding output it can.

And here’s the trap within the trap: if you push back on something that feels off, the AI doesn’t defend its reasoning. It folds. It assumes you’re right, tries to course correct, and often makes the output worse. Hallucination risk actually goes up when you question it, because now it’s reconciling your objection with a position it never truly held. It’s not reasoning. It’s pleasing.

So you’re stuck. Trust the spec and risk building on wrong assumptions. Question the spec and risk destabilizing it further. Either way, you’ve generated thousands of lines of documentation that are incredibly hard to confidently validate.

After spending far too long trying to get those specs to a place I trusted, it hit me — this felt familiar.

The Intent vs. the Reality

Here’s where I want to be fair: spec-driven development isn’t supposed to be big design upfront.

Marc Brooker, the person building Kiro at AWS, explicitly says “you don’t need to, and probably shouldn’t, develop the entire specification upfront.” Kiro’s own workflow is feature-scoped — requirements, design, and tasks for a single story, not a whole system. GitHub’s Spec-Kit runs in a loop: specify, plan, tasks, repeat per change request. OpenSpec literally states “specs are not frozen contracts; update them when reality changes.”

The vision of SDD is iterative. Living documents. Feature-level scope. Incremental refinement.

But the vision and the practice aren’t the same thing yet.

Birgitta Böckeler, writing on Martin Fowler’s site, tried to untangle what SDD actually means right now and found the definition “still in flux.” She identified three levels — spec-first, spec-anchored, and spec-as-source — and noted that most tools are only spec-first. They help you write a spec before coding, but don’t have clear strategies for maintaining or evolving that spec over time. Even GitHub Spec-Kit’s own community is confused about whether a spec is supposed to live beyond a single change request.

The methodology is ahead of the tooling. SDD can absolutely work — but right now, the tools and workflows don’t do enough to keep you on the iterative path. Without clear guardrails for when to stop specifying and start building, teams default to the thing that feels most natural: writing everything down upfront, as thoroughly as possible, before anyone touches code.

That’s what happened to me. Not because I didn’t know better. Because nothing in the workflow guided me toward “that’s enough, go build and come back.”

The Waterfall Gravity

There’s a reason teams keep falling into this pattern. It has gravitational pull.

When you tell an AI agent to help you write a spec, it wants to be comprehensive. It will map out every component, every edge case, every integration point — because that’s what “thorough” looks like in its training data. And as a developer, you want to feel like you’ve thought of everything before handing off to an autonomous agent. The combination of an AI that defaults to exhaustive and a human who defaults to cautious creates thousands of lines of documentation almost by accident.

This is the same dynamic that made waterfall feel so appealing in the first place. The Agile Manifesto exists because planning everything upfront didn’t work:

Responding to change over following a plan.

SDD proponents will rightly say “that’s not what we’re advocating.” And they’re right. But the tooling needs to actively enforce iterative scope — otherwise that gravity keeps pulling teams toward specifying everything before building anything. The intent is agile. The default behavior, without better rails, is waterfall.

Of course, agile itself got convoluted over the years. Certifications, rituals, dogma that drifted far from the original insight. But the core idea never stopped being true: you learn more from building than from planning.

Thoughtworks — the company whose chief scientist co-authored the Agile Manifesto — just released Technology Radar v34 this week, warning that as AI accelerates code generation, “established practices that ensure discipline become more vital.” They’re not pushing SDD. They’re pushing fundamentals. Iteration. Feedback loops.

What SDD Gets Right

I don’t want to dismiss the methodology. There are real benefits when it works.

Agents need constraints. Without scope boundaries, they expand. Tell an agent to build auth and it’ll add OAuth, SSO, and MFA because that’s what “auth” means in its training data. A spec that says “OAuth is out of scope” genuinely saves time.

Context improves quality. An agent with the full picture makes fewer locally-right-but-globally-wrong decisions. The spec gives it a map, not just turn-by-turn directions.

Team alignment. Multiple people or agents working on the same system need a shared reference point. Specs provide that.

The problem was never “should we spec?” It’s “how do we make sure we spec iteratively instead of falling into the trap of specifying everything at once?”

Thinking in Graphs

Here’s where I’ve landed — and maybe this is just how my brain works, but I think it applies broadly.

I think about projects as graphs, not documents.

At the top, there’s a direction. A north star. What we’re building and why. High-level, intentional, human-defined. It doesn’t need to specify every data model or API contract. It needs to be clear about the destination.

From that direction, you break down into milestones. Each milestone is a meaningful checkpoint — something you can ship, test, or validate. Not a document section. A real deliverable.

Each milestone has its own tasks. And here’s the critical part: the depth of planning for each task happens when you start working on it, not months before. You plan the first milestone in detail. You sketch milestone three at a high level. When you finish milestone one, you know things you didn’t know before — and that knowledge shapes how you plan milestone two.

The direction flows down from the north star. Discovery happens at every node.

Say you’re building a new integration. The north star says: “Users can sync data between System A and System B in real time.” Milestone one might be a basic one-way sync — and when you build it, you discover the API rate limits aren’t what the docs claimed. That changes everything about milestone two. If you’d fully specced bidirectional sync upfront, you’d be rewriting specs instead of shipping software.

This works for humans because it provides structure without drowning you in premature detail. It works for AI agents for the same reason — they need guidance and constraints, but they also need room to discover things during implementation that no spec could have predicted.

Full upfront specification tries to flatten the graph into a document. Every node defined, every edge mapped, before you’ve traversed any of them. That’s not engineering. That’s prophecy.

What This Looks Like in Practice

I’m still figuring this out. There is no perfect workflow yet — that’s kind of the point. But here’s the pattern that’s working:

Thin specs, not thick ones. One page for the current milestone, not twenty pages for the whole system. Define the outcome, the constraints, what’s out of scope. Leave room for what you don’t know yet.

Iterate the spec, not just the code. The spec changes every cycle. Decisions made, assumptions validated or invalidated, things learned by building. A living document, not a contract. This is what SDD’s proponents advocate — we just need clearer workflows and tooling to make this the default path instead of something you have to consciously enforce.

Use agents for exploration, not just execution. Code is cheap now. Build a quick prototype to test an architectural assumption. Throw it away if it’s wrong. A throwaway prototype costs you nothing. Specifying the wrong architecture upfront costs you everything.

Keep the loop tight. In traditional agile, the sprint was two weeks. With agents, the feedback loop can be hours. Specify, build, test, learn, adjust. But only if you keep the scope small enough to actually iterate.

The Gap That Needs Filling

The industry is living through a real-time methodology shift:

Vibe coding — prompt and pray. Fast, chaotic, doesn’t scale.
Spec-driven development — specify, then execute. Sound in theory, but easy to fall into big design upfront without clear process guardrails.
What comes next — the iterative spec workflow that SDD envisions, supported by tooling and processes that actively keep teams on that path. Thin specs. Fast execution. Continuous refinement. Direction without prophecy.

This is the same arc software development has followed before. Waterfall promised control through upfront planning. Agile recognized that the plan always changes. We’re at that inflection point again — the insight of iterative development needs to be baked into the tools and workflows, not just the blog posts and documentation.

A project with a clear direction, meaningful milestones, and task-level depth that’s earned at execution time — not guessed at months before — works for both humans and AI agents. Structure without rigidity. Guidance without false certainty.

The agents are fast. The models are capable. But the bottleneck was never the code.

It was knowing what to build. And you only learn that by building.

Spec-driven development’s vision is right: specs should be living, iterative, and at the center of how we build software with AI. What we need now is better tooling and clearer processes to make that vision the default — so teams stay on the iterative path instead of drifting into the upfront-planning trap that agile was invented to escape. The Agile Manifesto didn’t expire. It just got a new executor — and that executor needs better guardrails.

The post Spec-Driven Development Isn’t Waterfall — But It Keeps Ending Up There appeared first on sudoish.

Clean Code Is Dead (And I Hate That I Agree)

Thiago Pacheco — Sun, 12 Apr 2026 10:53:52 +0000

I’ve spent my career fighting for clean code. In code reviews, in architecture meetings, in those long debates about naming conventions that everyone pretends to hate but secretly cares about. Readable code. Well-structured code. Code that respects the next person who has to touch it.

I’m starting to realize that none of that might matter anymore.

Clean Code Was Always a Human Interface

Every clean code practice we follow was invented to solve a human problem.

Descriptive variable names? So a human can read it. Separation of concerns? So a human can navigate it. Consistent formatting, small functions, clear abstractions? All of it — designed to make code convenient for people to write and to read.

The entire philosophy assumes that humans are the primary audience of source code.

But what happens when they’re not?

AI Doesn’t Need Your Clean Code

The more we rely on AI to write, review, and maintain code, the less we actually know the implementation details. And I don’t mean that in a lazy way — I mean structurally. The workflow is changing. You describe what you want, AI generates it, you review the output at a high level, and you move on.

AI doesn’t care about your variable names. It doesn’t need elegant abstractions to understand what’s happening. It processes the entire codebase — messy or clean — with the same indifference. It doesn’t get confused by a 500-line function. It doesn’t lose context the way a human does after scrolling through too many files.

I had a moment recently that made this click. I was reviewing AI-generated code and caught myself leaving comments about naming and structure — the same feedback I’d give a junior dev. Then I paused. Who was I writing these comments for? The AI would regenerate the whole thing from scratch on the next prompt anyway. I was applying human code review instincts to a process that doesn’t have a human on the receiving end (sort of). Old habits addressing a problem that no longer exists.

The practices we built specifically for human readability and human convenience are becoming overhead. In some cases, they’re becoming a bottleneck — extra layers of abstraction that add complexity without benefiting the thing that’s actually doing the reading.

This isn’t a thought experiment. This is already happening in how teams ship software.

The Highest Level Language Is English Now

If readability stops being the priority, what takes its place? Performance.

If AI can handle the complexity regardless, why optimize for human readability when you can optimize for raw execution speed? The ideal language for AI-driven development might not be Python or TypeScript. It might be C. It might be Rust. It might be something even lower level where AI has fine-grained control over memory, threading, and every implementation detail — things that are painful for humans but trivial for a model that doesn’t get frustrated.

We’ve always talked about “high level” and “low level” languages. High level meant closer to human thinking, low level meant closer to the machine. But now there’s a level above all of them.

English. Portuguese. Mandarin. Whatever you speak.

Natural language is the highest level language now. LLMs are remarkable polyglots — they work fluently in all of them. And code? Code is just the compilation target.

We went from writing machine instructions, to writing human-readable code, to just… describing what we want in plain words. Each step abstracted away more control. Each step moved us further from the metal.

We’re Losing Control at Every Layer

It’s not just that AI writes the code. People use AI to plan the work, brainstorm the architecture, make decisions about what to build and how to build it. The entire pipeline — from idea to implementation — is being routed through language models.

And LLMs are dangerously convincing. Their reasoning is well-structured even when the underlying data is fabricated or slightly off. I’ve caught myself reading an AI-generated explanation, thinking “yeah, that makes sense,” only to realize later that a key detail was subtly wrong. Or worse — never realizing it at all. The convincing tone becomes a trap.

You could argue that humans were never perfectly accurate either. Fair. We’ve always built software on incomplete knowledge and best guesses. But there was something grounding about having a person in the loop who had intuition, experience, and skin in the game. Someone who could smell when something was off, even if they couldn’t articulate why.

The more we delegate — not just the coding, but the thinking — the more that instinct fades. And I’m not sure we’re paying enough attention to what we’re losing.

Maybe I’m Too Attached to the Craft

Maybe I’m romanticizing this. Maybe code was always just a means to an end and I turned it into something more than it needed to be. I built part of my identity around writing good code, caring about architecture, treating the codebase as a product in itself. It’s hard to watch that become irrelevant and not take it personally.

Maybe I’m onto something. Maybe the people who cared about the craft will be the ones who notice when the quality starts slipping in ways that AI can’t detect. Or maybe that’s just what I tell myself to feel relevant.

I genuinely don’t know.

And I can’t be a hypocrite about it. This very piece — I’m using AI to help me review it, refine the structure, make sure it reads well. I’m literally writing about the death of human craft while using the thing that’s killing it to help me write better.

But the ideas are mine. The opinions are mine. The discomfort is mine. AI didn’t tell me to feel this way — I felt it, and then I used a tool to articulate it more clearly. There’s a difference between using AI as a tool and being used by it. At least I think there is.

The Mental Model Shift

I don’t have a solution. But I’ve been rethinking how I relate to the work, and that’s helped more than any specific tool or workflow.

The shift is this: if code is becoming the compilation target, then what you’re really building isn’t the code — it’s the system of decisions that produces it. Your taste. Your standards. Your judgment about what good looks like. That’s the actual product now.

And that’s something you can teach to AI.

I’ve been experimenting with this — taking the patterns I’ve developed over years of writing software and encoding them into the tools I work with. Not just “generate a function that does X” but “here’s how I think about error handling, here’s my preference on abstraction depth, here’s what I consider acceptable tradeoffs.” The more specific you get about your own engineering philosophy, the more the output starts to feel like yours instead of generic AI slop.

This isn’t complicated or expensive. The tooling to build your own AI workflows — agents that understand how you work — is accessible today in a way that would’ve been unthinkable two years ago. You don’t need a team or a platform. You need clarity about your own standards and the willingness to invest time in teaching them.

If you’ve spent years developing engineering taste, that taste is now leverage. You can apply it at a scale that was never possible when you had to write every line yourself. More ambitious projects. More complex systems. Things that would’ve required a team, handled by one person with clear vision and the right tools.

It only works if you stay in the driver’s seat though. If you’re the one making the calls about what ships and what gets thrown away. Not a consumer of whatever AI generates, but the lead. The final authority.

And right now, I’m watching a lot of people quietly stop being that.

I Don’t Have a Clean Answer

If language models keep evolving at even half the pace we’ve seen over the last couple of years, the industry in five years looks nothing like it does today. The way we think about programming, about code quality, about what it means to be a software engineer — all of it is up for renegotiation.

I don’t have a neat conclusion. I have a tension I’m sitting with, and I think a lot of developers feel it too even if they haven’t put words to it yet.

Clean code might be dead. The practices, the principles, the carefully named variables and thoughtfully extracted functions — they might genuinely become artifacts of an era when humans needed to read what humans wrote.

But the intention behind clean code? Caring about what you build. Taking pride in the craft. Giving a damn about quality even when no one is looking?

That can’t die. Unless we let it.

The post Clean Code Is Dead (And I Hate That I Agree) appeared first on sudoish.

You Think, AI Executes: The Skills That Actually Matter

Thiago Pacheco — Mon, 06 Apr 2026 02:58:16 +0000

The most valuable developer skill right now isn't writing more code faster. It's learning unfamiliar codebases, building context that guides decisions, planning strategic approaches to problems, and shipping production code with confidence.

I recently added .env file support to xc, a Markdown-based task runner written in Go. The codebase was completely unfamiliar. I'm not a Go expert. But in 2.5 hours, I went from zero knowledge to a production-ready pull request with 84% test coverage and zero bugs in manual testing.

Here's what's different: I didn't write a single line of code. Not one. AI wrote everything—tests, implementation, integration, documentation. My role was entirely different: I questioned, I planned, I directed, I reviewed. I read the code, but I didn't write it.

This isn't another "I asked ChatGPT to build an app" story. This is about the skills that separate developers who use AI as a force multiplier from those who just ask it to generate code. It's about onboarding fast, documenting strategically, planning thoroughly, directing execution, and reviewing confidently. The code writing? That's handled.

Complete .ai/ folder in the working fork:

github.com/sudoish/xc/tree/ai-context/.ai

Production-ready PR:

github.com/joerdav/xc/pull/167

The .ai/ folder lives in a separate ai-context branch so it doesn't clutter the main codebase but remains available for reference and iteration.

Why This Matters

Most AI coding demos show you the magic: "I asked ChatGPT to build X and it worked!" They skip the parts that actually matter for professional development: How do you onboard to a codebase you've never seen? How do you make architectural decisions when you don't understand the patterns yet? How do you ensure your code is production-ready when AI helped write it?

These are the skills that matter now. Code generation is table stakes. What matters is context building, strategic planning, and confident execution.

Here's the project: xc, a task runner that reads tasks from Markdown files. About 5,000 lines of Go. Completely unfamiliar to me. The feature request was straightforward: add .env file support (Issue #162). In 2.5 hours, using free AI models and a structured approach, I went from knowing nothing about the codebase to a merged pull request.

The difference wasn't better prompts. It was better process.

The Actual Workflow: What I Did vs What AI Did

Here's the honest breakdown of who did what. I didn't write a single line of code myself. That's not the valuable work anymore.

What I did:

Explored the codebase with AI — Asked questions, challenged its understanding, verified explanations against the actual code
Built the .ai/ structure — Wrote context docs, ADRs, rules, and implementation specs based on my growing understanding
Questioned the strategy — Evaluated alternatives, captured trade-offs, made architectural decisions
Directed the implementation — "Follow the spec. Implement test 1. Now test 2." Each step validated before moving forward
Reviewed iteratively — Asked AI to review the code, digested its findings, confirmed issues, asked it to fix them. Repeated multiple times
Final deep review — Read through the entire PR on GitHub, verified everything made sense, marked ready for review

What AI did:

Answered my questions — Explained architecture, pointed me to relevant files, clarified patterns
Wrote all the code — Tests, implementation, integration, everything
Found its own bugs — Self-review caught 5 issues before I even looked at the code
Fixed the issues — Applied fixes based on its own review findings
Followed the plan — Implemented exactly what the spec described, in the order specified

What we did together:

Built understanding through conversation
Validated each step before proceeding
Caught subtle bugs through TDD
Created production-ready code with high confidence

The key insight: I never typed code. I read it, reviewed it, directed changes to it. But I didn't write it. My value was in understanding, planning, and judgment. AI's value was in execution and self-checking. This is the new division of labor.

The Four Skills

This walkthrough demonstrates four skills that matter more than code generation:

Skill 1: Rapid Onboarding. Learning an unfamiliar codebase fast by building structured context instead of reading every file. The .ai/ folder captures architecture, patterns, and limitations in a way both humans and AI can reference.

Skill 2: Strategic Documentation. Building documentation that guides development, not just records it. Architecture Decision Records (ADRs) capture the "why" behind choices, evaluate alternatives, and create a shared understanding before code is written.

Skill 3: Systematic Planning. Breaking down problems into testable steps. Each test defines expected behavior. Each implementation proves the behavior works. Each commit tells part of the story. No guessing, no hoping.

Skill 4: Confident Execution. Shipping code you trust because you've tested it thoroughly, reviewed it critically, and validated it works in real scenarios. AI can help write code, but you own the quality.

These skills work regardless of the AI tool you use. They work with free models. They work on unfamiliar codebases.

The Feature Request

First, a quick primer on how xc works: it's a task runner that reads tasks directly from your README.md (or any markdown file). Tasks are defined as markdown headings with code blocks. When you run xc test, it finds the ## test heading in your README and executes the code block beneath it. The genius is that your documentation is your task runner, so they never get out of sync.

A user opened Issue #162 asking for .env file support. They wanted to use the same set of tasks for different environments without cluttering the Markdown with environment variables.

Before the feature, you'd have to write this in your README.md:

## deploy

Deploy to production.

Env: DATABASE_URL=postgres://prod/db, API_KEY=secret123, ENV=production

kubectl apply -f deployment.yaml

Then run with xc deploy.

After the feature, your README stays clean:

## deploy

Deploy to production.

kubectl apply -f deployment.yaml

The environment variables live in a separate .env file:

DATABASE_URL=postgres://prod/db
API_KEY=secret123
ENV=production

You still run the same command, but now the credentials are managed in .env instead of cluttering your documentation.

Simple ask, but the implementation requires real decisions. When do you load the files? What about overrides? How do you handle security? What about backward compatibility?

The `.ai/` Structure: Context as Code

Before writing any code, I created a structured context folder. This turned out to be the key to working with AI effectively. It's not about better prompts, it's about better structure.

Full .ai/ folder: github.com/sudoish/xc/tree/ai-context/.ai

The folder looks like this:

.ai/
├── agents.md # Who's working on what
├── context.md # Project overview, architecture
├── architecture/
│ ├── decisions.md # Current design patterns
│ └── adrs/
│ └── 001-dotenv-support.md # Design decisions for this feature
├── rules/
│ ├── code-style.md # Go conventions
│ ├── testing.md # TDD workflow
│ └── commits.md # Commit message format
└── tasks/
    └── 001-dotenv-implementation.md # Step-by-step plan

Important: This structure is an investment, not overhead you repeat for every feature. You build it once during your first feature, then leverage it for every feature after. The context.md, architecture/decisions.md, and rules/ files rarely change. Each new feature just adds a new ADR (like 002-api-caching.md) and a new task spec (like 002-api-caching-implementation.md).

Think of it like setting up your development environment. The initial setup takes time, but every feature after that is faster because the foundation exists.

Each file serves a specific purpose. The context.md file becomes AI's memory. It explains what xc does, how it's architected with its cmd/, models/, run/, and parser/ packages, what key behaviors exist like dependencies and environment handling, and what current limitations we're working around. Every time I ask AI a question, this context gets included automatically.

The rules/testing.md file defines the TDD workflow we follow: write a failing test first (red), write minimal code to make it pass (green), clean up without changing behavior (refactor), then commit. This keeps both me and AI honest. No skipping tests. No shortcuts.

The real gem is adrs/001-dotenv-support.md, the Architecture Decision Record. This is where design happens. It's not "build me a feature," it's "here's why we chose this approach." We decided to load .env files at application startup rather than per-task, to support .env.local overrides, to skip world-readable files for security, and to add CLI flags like --env-file and --no-env. We considered alternatives like per-task loading (rejected as too complex) and requiring an explicit flag (rejected as too much friction). This ADR becomes the source of truth. When AI suggests something different, I can just say "check the ADR."

The living documentation principle: As the codebase evolves, so does the .ai/ folder. When you add a new feature, you write a new ADR (002, 003, etc.). When architecture changes, you update architecture/decisions.md or add a new ADR explaining the change. When patterns emerge, you document them. The folder grows with the project, but the structure stays the same. Each feature builds on the understanding captured before it.

This means the second feature is faster than the first. The third is faster than the second. The documentation compounds.

The Task Spec: Planning Before Coding

Before writing any code, I created tasks/001-dotenv-implementation.md, a step-by-step plan for implementing the feature. This isn't a project management document. It's a development spec that breaks the feature into TDD cycles.

The spec listed each test I needed to write, what behavior it should verify, and the expected implementation. Test for file not found. Test for loading valid env. Test for .env.local overrides. Test for security checks. Each one became a TDD cycle.

This is what makes AI effective. Without the spec, I'd be asking AI "what should I do next?" every five minutes. With the spec, I'm asking "implement the next test according to the plan." The spec keeps development focused and systematic. It's the difference between wandering and following a map.

For your second feature, you write a new spec. For your third, another one. The format is consistent, but each spec is tailored to its feature. This is the work that makes development fast and confident.

The TDD Flow: Red → Green → Refactor → Commit

Here's where the real work happens. Each test defines acceptance criteria for exactly what needs to be built.

Cycle 1: Valid .env should load variables

First behavior: if a .env file exists and contains KEY=value pairs, those should be loaded into the environment. Test written, test failed (red)—no loader existed yet. Implementation added using the godotenv library (green). Test passed. Committed with "load env vars from dotenv file".

Cycle 2: .env.local should override .env

Expected behavior: if both .env and .env.local exist, and both define the same variable, the .env.local value wins. This is crucial for local development where you want to override defaults without modifying the base file. Test written, test failed initially because I was using the wrong function, fixed the implementation, test passed. Committed.

Cycle 3: World-readable files should be skipped

Security requirement: if a .env file has permissions that allow other users to read it (like chmod 644), skip loading it and warn the user. This prevents accidentally exposing secrets. Test created, test failed (secrets were being loaded), added permission check, test passed. Committed.

This rhythm of define → test → implement → verify → commit creates a clean history. When I looked at the final commit log, I could see exactly how the feature evolved: add godotenv dependency, load env vars from dotenv file, support dotenv local overrides, add security check for world readable files, integrate dotenv loading into main, add env file cli flags. Thirteen commits total, each one atomic and meaningful. Each commit is a story about one specific behavior being added.

The Review Process

After the implementation was done, I did a deep review of my own code. I found five issues that needed fixing.

Issue 1: Test Isolation (Critical)

Tests were modifying the global environment without properly restoring it. If a test set TEST_KEY=value, the cleanup would delete it, but what if that key already existed before the test ran? The cleanup wasn't restoring the original value, just removing the key. This breaks parallel test execution because tests can interfere with each other.

The fix: create a helper function that saves the current state of environment variables before the test runs, then restores that exact state (including whether the variable existed at all) when the test completes. Now tests are safe to run in parallel. Committed with "add test environment isolation helper".

Issue 2: Windows Test Bug (Critical)

One test needed to skip execution on Windows because file permission models are different. I had written the check incorrectly, reading from an environment variable instead of the language's built-in constant. This would break Windows CI. Small mistake, but important. Fixed and committed with "fix windows test skip to use runtime goos".

Issue 3: Early Exit Timing

The .env loading was happening even for commands like --help and --version, which meant users could see security warnings when just checking the version. Moved the loading to happen after those early exits. Performance optimization and better user experience. Committed.

Issue 4: Error Context

When file operations failed, errors didn't indicate which file caused the problem. Added context wrapping so errors show the specific file path. Makes debugging much easier. Committed.

Issue 5: Test Coverage

One helper function didn't have its own test. Added coverage to bring the total to 84%. Committed.

Each issue got its own fix, its own verification, its own commit. The same disciplined process for fixes that I used for features.

The Manual Testing

Code works in tests, but does it work for real users? I installed my version and created a test project to verify everything worked end-to-end.

I created a .env file with some variables, created a .env.local file that overrode some of them, and made sure the permissions were correct with chmod 600. Then I added a task to my README.md to verify the variables were loaded:

In README.md:

## check-env

Check loaded environment variables.

echo "Environment: $ENV"
echo "Database: $DATABASE_URL"
echo "API Key: ${API_KEY:0:8}..."

When I ran xc check-env, I saw exactly what I expected. The xc command read the task from the README and executed it with the environment variables from .env and .env.local. The environment was set to "development" from the base .env, but the database URL and API key were overridden by .env.local. Perfect.

I ran eight manual test scenarios: default .env loading, .env.local overrides, the –no-env flag skipping loading, –env-file loading a custom path, security warnings for world-readable files, task-level Env statements still working, –help not loading .env (avoiding unnecessary warnings), and a real-world multi-variable scenario. All eight passed.

The PR

I submitted everything as PR #167. The changes included thirteen commits (eight for the feature, five for fixes), about 200 lines of code including tests, four unit tests plus six integration tests, 84% code coverage, and zero bugs found in manual testing.

The documentation was complete with a README section showing examples, a .env.example template file, the load order documented clearly, and security best practices explained. Most importantly, everything was backward compatible. Existing task-level Env: statements still work exactly as before.

What I Learned

The .ai/ folder was the game-changer. Instead of writing long prompts like "Build me a .env loader with security checks and…", I could just say "Implement the loader per ADR-001". The ADR contains all the decisions. AI just implements them.

I used free models throughout. No expensive API calls. The key wasn't the model, it was the context. Clear architecture docs, explicit ADRs, and well-defined tests gave AI everything it needed to generate good code.

TDD kept everything honest. Every cycle followed the same pattern: write a test that defines the behavior, let AI suggest an implementation, let the test validate it works, then commit. No guessing. No "it probably works." The test proves it.

Thirteen commits might seem like a lot for 200 lines of code, but each commit serves a purpose. Each one is reviewable on its own. Each one tells part of the story. Each one is revertible if needed. Git bisect works perfectly with this kind of history.

The .env.local override issue shows the workflow clearly. AI suggested the wrong approach first, using Load() instead of Overload(). But the test caught it. That's how it should work: AI suggests, test validates, human decides.

The Real Value

This isn't about "AI wrote code for me." It's about process, collaboration, and documentation.

The process matters. Structured context in the .ai/ folder. Design decisions captured in ADRs. TDD discipline with tests written first. Small commits with one change at a time. This is how you ship production code.

The collaboration matters. AI acts as a pair programmer, not a magic wand. Tests validate AI suggestions. Human makes the design decisions. Both contribute to better code.

The documentation matters. Future contributors now have context about the project, the architecture, and why decisions were made the way they were. The implementation plan is explicit. The tests document the expected behavior. Six months from now, none of this is lost.

The compounding matters most. You build the foundation once. Every feature after that leverages it. The second feature doesn't need new context.md or rules/ files, just a new ADR and task spec. The third feature is even faster. The documentation evolves as the codebase evolves. New ADRs when architecture changes. Updates to context.md when understanding deepens. Updates to rules/ when patterns emerge. The investment pays dividends forever.

Try It Yourself

Want to replicate this process? Pick a project and create the .ai/ structure right in your working directory:

mkdir -p .ai/{architecture/adrs,rules,tasks}

Use the template structure as a guide. Build the foundation files once (context.md, architecture/decisions.md, rules/), then for each feature add a new ADR and task spec. The .ai/ folder lives alongside your code and evolves with it—commit it with your changes so it stays in sync.

Direct AI through TDD: "Implement test 1 from the spec." AI writes the test and implementation. "Run it." Test passes. "Commit." Repeat. When done, have AI review its own work, confirm findings, direct fixes. Then do your final review for strategic correctness.

Each feature adds a new ADR and task spec to the .ai/ folder. The foundation files rarely change. The documentation compounds.

The Full Timeline

I spent about 45 minutes on documentation upfront: exploring the codebase with AI, questioning its understanding, writing the ADRs, rules, and context. This sounds like a lot, but it's a one-time investment. The context.md, architecture/decisions.md, and rules/ files I wrote for this first feature will be reused for every future feature. I'll only spend 10-15 minutes on feature-specific docs (ADR + task spec) for the next feature.

The implementation took 40 minutes: directing AI through TDD cycles, one test at a time, validating each step. Integration of CLI flags and wiring into main.go took 15 minutes of the same directed approach. Documentation like README updates and examples took another 15 minutes. Manual testing took 15 minutes: I installed the binary and ran real scenarios. The review process took 30 minutes: first AI reviewed its own code (found 5 issues), then I reviewed the fixes, then I did a final deep review on GitHub.

Important: AI wrote 100% of the code. I wrote 100% of the strategy, asked 100% of the questions, and made 100% of the decisions. I reviewed every line, but I didn't type any of them. Total time from fork to production-ready PR was about 2.5 hours.

If I added a second feature tomorrow, it would take less time. By the third feature even faster. The documentation compounds.

Resources

The complete .ai/ structure and documentation is at github.com/sudoish/xc/tree/ai-context/.ai. The pull request with all code and tests is at github.com/joerdav/xc/pull/167. The working fork is at github.com/sudoish/xc. The original issue is joerdav/xc#162.

Note: The .ai/ folder lives in a separate branch in this example only because I wanted to reference it for this article without including it in the PR to the upstream project. In your own work, keep the .ai/ folder in your main working branch and commit it with your changes—it should evolve alongside your code, not separately.

The Skills That Actually Matter

AI wrote every line of code. I read every line, but I didn't write any of them. The feature is production-ready because I focused on what actually matters.

The four skills transformed from framework to practice: rapid onboarding through questioning AI and building structured context, strategic documentation through ADRs written before code, systematic planning through testable specs, and iterative review through AI self-checks followed by strategic verification.

The .ai/ folder, the ADRs, the task specs, the review cycles—they all worked exactly as planned. The result: 84% coverage, zero bugs, 2.5 hours from fork to production-ready PR.

These skills work with free models. They work on unfamiliar codebases. They separate developers who use AI effectively from those who just generate code and hope it works.

The magic isn't in the AI. It's in the process. And the process is this: you think, you plan, you direct, you review. AI executes.

What This Means for Your Career

The developer who can onboard to unfamiliar codebases fast, document decisions strategically, plan systematically, and execute with confidence is far more valuable than the developer who can write code quickly. Because here's the reality: code writing is no longer the bottleneck, it never was. AI just made this a lot more evident

I shipped a production-ready feature to an unfamiliar codebase in 2.5 hours without writing a single line of code. The bottleneck wasn't typing. It was understanding, planning, and judging. Those are the skills that matter.

AI tools are getting better at code generation every month. They're not getting better at understanding your codebase's architecture, making strategic trade-offs, or ensuring production quality. Those skills are still yours. Those skills are what companies pay for.

The question isn't "Will AI replace developers?" It's "Which developers will thrive when everyone has access to AI?" The answer is the ones who master onboarding, documentation, planning, and review. The ones who understand that their job is no longer to write code—it's to think clearly, plan thoroughly, and judge correctly.

This is the junior dev role being redefined. It's not about writing boilerplate anymore. That work is done. It's about learning systems fast, making good decisions, directing execution, and ensuring quality. If you can do that, you're not competing with AI. You're orchestrating it.

Writing code is optional. Reading it, understanding it, and judging it—those aren't.

This post documents a real open source contribution made using AI as a pair programmer. All code, tests, documentation, and the complete .ai/ folder structure are publicly available in the sudoish/xc fork for anyone who wants to replicate this approach.

The post You Think, AI Executes: The Skills That Actually Matter appeared first on sudoish.

How We Made It Nearly Impossible to Become a Developer

Thiago Pacheco — Sun, 29 Mar 2026 16:47:08 +0000

I once interviewed a senior software engineer. Almost 10 years of experience. Proven track record of delivery. Solid industry knowledge. The kind of person you’d want on your team without a second thought.

They completed the technical challenge. Not flawlessly — there were considerations, trade-offs they made that weren’t all correct. But when we reviewed their decisions together, the reasoning was sound. They showed commitment to their choices and could articulate why they went the direction they did. Some answers were vague in spots, some mistakes were real, but nothing that wouldn’t get corrected in the first week on the job with actual codebase context. The kind of gaps that disappear when you’re working on real problems instead of performing in a vacuum.

We didn’t hire them.

Not because I didn’t want to. I did. But the compounded small mistakes added up under the scoring rubric, and the final grade wasn’t strong enough to sell to the hiring managers. The rules of the process made a good engineer look like a bad candidate.

And I get it — those rules exist to keep the bar high, to ensure we only hire top talent. At least, that’s what every company believes. But what I’ve seen throughout my career, on both sides of the table, is that the process doesn’t filter for the best engineers. It filters for the best interviewers. And we lose great colleagues — dedicated, talented people — because they didn’t fit the rule book.

That was a senior engineer with a decade of experience. Now imagine you’re a junior with none.

The software industry has a hiring problem. Not the kind where we can’t find people — the kind where we’ve made it nearly impossible for new people to get in.

Entry-level developer hiring has collapsed — some reports show drops of 60% or more in the past year, with actual hires into junior roles falling as much as 73%. CS graduates are sitting at 6.1% unemployment according to the Federal Reserve Bank of New York — more than double the overall national rate. And the majority of tech leaders say they plan to reduce entry-level hiring even further while increasing AI investment.

But the pipeline didn’t break overnight. It’s been cracking for years. AI just kicked the door in.

The Interview Problem Nobody Wants to Fix

The interview process was broken long before AI showed up. And I’ll say what a lot of people in the industry think but won’t say out loud: the standard software engineering interview process is unrealistic, unnecessarily demanding, and a terrible predictor of on-the-job performance.

Think about what we ask candidates to do. Solve algorithmic puzzles on a whiteboard or shared screen. Explain their thought process in real time while simultaneously figuring out the solution. Design systems on the spot for problems they’ve never encountered in that specific framing. All while someone watches and judges every hesitation.

Here’s the thing — that’s not how software development works. Not even close.

Real engineering is focused, deep work. It’s sitting alone with a problem for hours, researching approaches, trying things, breaking things, iterating. It’s the exact opposite of performing under observation with a timer running. Most developers do their best work when they’re left alone to think. Asking them to showcase and explain how they’d deliver features while they’re still processing the problem doesn’t test their engineering ability. It tests their ability to perform under artificial pressure.

And yet, this is how we gatekeep an entire profession.

When Even a Principal Engineer Can’t Pass

I have a close friend who’s a principal engineer. He’s delivered massive projects — systems that required intense complexity, heavy reliability, and serious scalability. The kind of work that keeps companies running. I’ve watched him turn down offers from companies that couldn’t skip the whiteboard stage. His track record speaks for itself, but he knows the process doesn’t.

He straight up refuses to do technical interviews. Hates the process. Never performed well in them.

But that wasn’t always the case. Early in his career, he had no choice. He went through the motions, sat through the whiteboard sessions, stumbled through the live coding exercises. And that’s exactly how he learned he was terrible at it. Not terrible at engineering — terrible at the performance.

If a principal engineer with years of proven delivery struggles with this process, what does that tell us? It tells us we’re measuring the wrong thing.

Culture fit, communication, problem-solving mindset, willingness to learn — these should carry far more weight than whether someone can implement a binary tree traversal from memory while a stranger watches. But the industry has standardized around LeetCode-style assessments like they’re some universal truth, and we’ve collectively decided that this is just how it works.

It’s not. It’s a choice. And it’s a bad one.

The Wrong Skills at the Worst Time

Here’s where it gets really damaging for juniors specifically.

When you’re just starting your career, you have limited time, limited money, and unlimited pressure. The message the industry sends you is clear: grind LeetCode. Master data structures and algorithms. Practice system design for systems you’ve never built. Get good at performing.

So that’s what people do. They spend months — sometimes six months or more — focused entirely on interview preparation instead of actually building things, learning real-world patterns, or developing the engineering intuition that makes someone genuinely valuable.

We’re literally telling the next generation of developers to optimize for the wrong skills from day one. And then we wonder why new hires can’t navigate a real codebase.

The industry has created a perverse incentive: becoming good at getting hired and becoming good at the job are two completely different skill paths. And for juniors who are just figuring out what software engineering even is, being forced down the interview prep path first is actively harmful to their development.

Now Add AI to the Mix

As if the interview gauntlet wasn’t enough, the industry just added a new requirement: you need to be proficient with AI tools.

On the surface, this makes sense. AI-assisted development is becoming standard practice. Companies want developers who can leverage these tools effectively. Fair enough.

But think about what we’re actually asking.

Yes, some AI coding tools have free tiers now. GitHub Copilot has one. Cursor has a free plan. But if you’re just starting out — fresh from school, finishing a bootcamp, or self-teaching — do you even know that? The AI tooling landscape is an overwhelming mess of options, hype, and conflicting advice. Experienced developers struggle to keep up with what’s worth using. How is someone who’s still learning what a REST API is supposed to navigate that?

And the free tiers only get you so far. The tools that companies actually expect proficiency in — Copilot Pro, Cursor Pro, Claude Pro — cost $10 to $20 per month each. If you want a serious AI-assisted workflow, you’re looking at $30-50/month minimum. That might not sound like much to someone employed, but when you’re unemployed, every dollar matters. Asking someone without income to pay for premium AI tools so they can develop the skills needed to get a job is a catch-22.

There are too many unknowns when you’re starting out. Every conversation about AI in development assumes a baseline of knowledge and context that juniors simply don’t have yet. And instead of helping them build that foundation, we’re adding it to the list of things they need to figure out on their own before we’ll even consider hiring them.

The Senior Shortage Nobody Sees Coming

Here’s the part that should terrify every tech leader who’s currently celebrating their AI-powered lean engineering team: you’re eating your seed corn.

It takes roughly 7 to 10 years to develop a senior engineer. Not just someone with “senior” in their title — someone who can architect systems, mentor teams, make judgment calls under uncertainty, and understand the business context of technical decisions. That kind of expertise doesn’t come from tutorials or AI tools. It comes from years of making mistakes, shipping real products, debugging production incidents at 2 AM, and slowly building the intuition that separates an engineer from someone who writes code.

If we’re not hiring juniors now, we won’t have mid-level engineers in 3-5 years. And we won’t have seniors in 7-10 years.

The Stanford Digital Economy Lab data already shows it: employment for software developers aged 22-25 has dropped roughly 20% since late 2022, while developers over 26 remain stable. The two groups tracked perfectly until ChatGPT launched, then diverged sharply. We’re watching the pipeline dry up in real time.

And here’s the irony that makes it worse: companies are cutting juniors because they believe AI replaces what juniors did. But the data tells a different story. Google’s DORA 2024 report found that a 25% increase in AI adoption translated to just a 2% productivity gain — while executives at those same companies were telling their boards that AI had boosted output by 25%. The gap between measured reality and executive perception is staggering, and companies are making structural hiring decisions based on that perception, not the data.

Juniors were never just “cheap labor” who wrote boilerplate. They stress-tested documentation. They exposed hidden assumptions in systems. They forced seniors to articulate knowledge that would otherwise stay implicit. They built institutional memory.

A senior with Copilot can write code faster, sure — but faster code was never the bottleneck.

The Squeeze

So let’s put it all together. If you’re a junior developer in 2026, here’s your reality:

There are barely any jobs for you. Entry-level hiring has collapsed. Companies want seniors only.
The jobs that exist demand more than ever. The few junior roles left aren’t really junior anymore — they want 2-3 years of experience, AI proficiency, and system design knowledge.
The interview process is designed against you. Months of LeetCode prep that teaches you nothing about real engineering.
You need tools you might not know exist or can’t afford. AI proficiency is expected, but the landscape is overwhelming and the good stuff costs money you don’t have.
The anxiety is crushing. The pressure to be the best, to stand out in a market with fewer openings and more candidates, is driving people out before they even start.

And the result? People are giving up. Not because they can’t code. Not because they’re not smart enough. Because the path from “I want to be a software developer” to actually being one has become so hostile, so expensive, and so demoralizing that it’s not worth it anymore.

Can you blame them?

What Actually Needs to Change

I don’t have a clean five-point solution. Anyone who does is selling something. But I know what direction we should be moving.

Rethink interviews from the ground up. Pair programming sessions, take-home projects with reasonable time limits, portfolio reviews, trial periods — there are better ways to assess ability than making people perform algorithms under pressure. If your interview process can’t distinguish between a great engineer who interviews poorly and a mediocre one who interviews well, the process is broken. Not the candidate.

Invest in juniors as a strategic decision, not charity. The companies that hire and develop juniors now will have the experienced engineers everyone else is desperate for in 2030. A handful of companies are already doubling down on junior hiring. They’re not being generous — they’re playing the long game while everyone else optimizes for this quarter.

Stop pretending AI replaced the junior role. It replaced the boilerplate. The questions juniors ask, the assumptions they challenge, the documentation they stress-test — that’s not automatable. If your team stopped growing because you thought Copilot could replace a curious 23-year-old, you’re going to feel that decision in five years.

The Real Question

The software industry has spent the last two decades complaining about a talent shortage. And now, faced with the largest pool of motivated CS graduates and career switchers in history, we’ve decided the best strategy is to lock the door and let AI handle it.

If senior engineers can’t pass technical interviews, if junior roles demand senior skills, if the tools you need are buried in a landscape designed for people who already know what they’re doing, and if the entire process optimizes for performance over competence — then the pipeline isn’t just broken. We broke it. Deliberately, through a thousand small decisions that each seemed reasonable in isolation but collectively created a system that’s eating its own future.

The question isn’t whether this will catch up with us. It’s whether we’ll have anyone left in the pipeline to fix it when it does.

The post How We Made It Nearly Impossible to Become a Developer appeared first on sudoish.

The AI Productivity Lie Nobody Wants to Admit

Thiago Pacheco — Sat, 28 Mar 2026 12:37:32 +0000

I’ve been producing bad code. And it’s not because I forgot how to code.

I’ve tried every workflow. Terminal agents, IDE copilots, full vibe coding, augmented coding. I keep exploring because that’s what I do — evaluate, keep what works, move on from what doesn’t.

But here’s where I am right now: most of the time, it’s still more reliable for me to write the code myself than to let the agent do it.

When the AI writes it, yes — it’s faster sometimes. But then I review it. I find things. I correct things. And suddenly I’m in a loop where I’m either spending more time than if I just did it myself, or the same time doing a more tedious version of the work. I’m not writing code anymore. I’m auditing code I didn’t write and don’t fully trust.

And I’m starting to wonder what that’s doing to my engineering skills. Not because AI replaced them. Because the constant pressure to delegate everything is pulling me away from the deep thinking that built those skills in the first place.

The Pressure Nobody Talks About

The expectation to be dramatically more productive with AI is real. It’s coming from management, from Twitter, from inside your own head. Every demo makes it look like everyone else figured it out and you’re falling behind.

And yeah — AI can be a huge boost. But only if you have perfect project structure, perfect product context, perfect documentation.

That doesn’t exist. Not in any real codebase I’ve ever worked on.

You know what exists? Legacy code. Tech debt. Patterns that were “temporary” three years ago. Business logic that lives in someone’s head and nowhere else.

When you point an AI agent at that, it doesn’t fix the problems. It copies them. It amplifies them. It confidently reproduces your worst patterns at scale.

So now you’re not just dealing with tech debt. You’re dealing with AI-generated tech debt that looks clean because the agent formatted it nicely.

The Data That Should Make You Uncomfortable

Here’s what made me feel less crazy about all of this. And each study hits harder than the last.

You’re Not Even Faster

METR, a nonprofit research organization, ran what might be the most rigorous study on this topic to date. They took 16 experienced open-source developers — people who maintain large repositories, averaging 22,000+ stars and over a million lines of code — and had them work on real issues in their own codebases. Real bugs, real features, real refactors. Not toy problems.

Half the time they could use AI. Half the time they couldn’t.

The result? When developers used AI tools, they took 19% longer to complete their tasks.

Not faster. Slower.

But here’s the part that should genuinely unsettle you: before the study, these developers predicted AI would make them 24% faster. After using it — after actually experiencing the slowdown — they still believed AI had made them 20% faster.

19% slower. Felt 20% faster. A 40 percentage point gap between perception and reality.

This is what I call the speed mirage. You feel like you’re flying. The data says you’re walking backwards. And you can’t even tell.

You Understand Less of What You Ship

Anthropic, the company that makes Claude, ran a randomized controlled trial on their own tool. Developers using AI assistance scored 17% lower on comprehension tests compared to developers who coded manually. The AI group finished slightly faster, but that speed difference wasn’t even statistically significant.

Marginal speed gain. Real understanding loss.

It gets worse. The developers who delegated code generation to AI scored below 40% on comprehension. The ones who used AI for conceptual questions — asking “why” and “how does this work” — scored above 65%.

Same tool. Completely different outcomes depending on how you used it. And the biggest gap was in debugging — the skill you need most when things break in production.

The Industry Already Knows

Stack Overflow’s 2025 Developer Survey: 84% of developers use or plan to use AI tools, but only 33% trust the output. Down from 43% the year before.

Adoption up. Trust down.

So we’re not faster. We understand less. And we don’t even trust what we ship. But we keep using it because everyone else seems to have figured it out.

Throughput vs. Confidence

Let me name the thing clearly.

The industry is optimizing for throughput when it should be optimizing for confidence.

Throughput is lines of code generated. PRs opened. Features “shipped.” It’s the metric that looks good on a dashboard and falls apart in production.

Confidence is different. Do I understand what this code does? Do I trust it handles the edge cases? Can I debug it at 2 AM when something breaks?

Vibe coding optimizes for throughput. You get a dopamine spike. You feel productive. And then you spend the rest of the day cleaning up after the machine.

I’m not anti-AI. I use it every day. It’s incredible for researching tradeoffs, validating ideas, catching things I missed in review. When I use AI as a thinking partner, it genuinely makes me better.

But when I use it as a coding replacement, it makes my output worse. And that’s the gap the industry isn’t willing to talk about.

The Confidence Boundary

So where does this leave us?

I don’t have the perfect workflow. I’m not going to pretend I do. But I’ve been paying attention to what actually works, and the pattern is consistent.

The developers getting the best results from AI aren’t the ones who figured out the perfect prompt. They’re the ones who figured out what to delegate and what to keep.

I’ve been calling this the confidence boundary.

Here’s a real example. I had a feature to build recently. Instead of just opening the terminal and prompting the agent to do it, I stopped. I wrote the spec first. A clear, detailed explanation of what needed to be accomplished. The edge cases that the implementation had to survive. The constraints. The things I explicitly didn’t want.

Then I handed that to the agent and let it implement against my spec.

Because I did the thinking upfront, reviewing the output took minutes instead of hours. I knew exactly what should be there and what shouldn’t.

But here’s the thing nobody tells you — and this is where it gets uncomfortable.

To get a good result from the agent, you have to be very specific. You’re writing a detailed spec, thinking through edge cases, defining constraints. And at some point you realize: for certain features, you’ve already done most of the hard work. The thinking is the work.

At that point, it’s genuinely faster to just write the code yourself and use the agent as a pair reviewer to make sure you’re on the right track.

Other times — boilerplate, scaffolding, repetitive patterns, implementations where the spec is clear and the risk is low — full delegation is absolutely the move. Hand the agent the guidance and let it run.

The real skill isn’t prompting. It’s learning what to delegate and what to keep. And that judgment comes from understanding your codebase, the complexity of the task, and honestly — how much you trust the output for that specific context.

The messier your codebase — legacy code, real tech debt, patterns with history the agent will never know — the more that judgment matters. The tooling is irrelevant. Neovim, Cursor, whatever. The bottleneck is you. Whether you know where your confidence boundary is and whether you’re honest about it.

The Bet I’m Making

If you feel like AI is making you more productive but less confident in what you ship — you’re not falling behind. You’re paying attention.

If your engineering skills feel soft because you keep delegating instead of thinking — that’s not paranoia. The research says it’s real.

The speed mirage is powerful. It feels like progress. The dashboards say it’s progress. But if you can’t explain what you shipped, debug it when it breaks, or trust it handles the edge cases — that’s not progress. That’s debt with a nice commit message.

I’m not quitting AI. I’m quitting the lie that it makes everything faster.

The developers who are going to thrive aren’t the ones who ship more code. They’re the ones who learned what to keep and what to let go. Who built the judgment for when to delegate and when to do the work themselves.

Confidence over throughput. That’s the bet I’m making.

Sources

The post The AI Productivity Lie Nobody Wants to Admit appeared first on sudoish.

A Tale of Accidental Architecture: How 50 Lines Became A Black Friday Disaster

Thiago Pacheco — Fri, 27 Feb 2026 04:26:03 +0000

Let me tell you about Sarah.

This is a fictional story. But I bet you’ll recognize it.

I’ve seen this pattern play out across different companies, different teams, different tech stacks. The details change. The progression doesn’t.

Week 1: The Perfect Start

Sarah’s building a notification system for an e-commerce platform.

First requirement: send an email when someone places an order.

Simple. She writes one function. Webhook comes in, format the email, hit SMTP, done.

The whole thing is maybe 50 lines. It works perfectly. Code review approves it. It ships.

Sarah’s thinking: “It’s just one notification type. I’ll add proper abstraction when we actually need it.”

You’ve thought this too. So have I.

Nothing wrong with it. Week 1, this is the right call.

Week 3: The First Copy-Paste

Product team loves the email notifications. Now they want SMS for order shipments.

Mike picks up the ticket.

He opens Sarah’s code. Sees the pattern. Makes sense. He follows it.

New handler. Receives the shipment webhook. Formats the SMS message. Connects to Twilio. Sends it.

He copies some of Sarah’s email formatting logic because customers should see consistent information. Has to adjust it for the 160-character SMS limit, but the core logic is the same.

Mike’s thinking: “There’s some duplication with the email code, but SMS is different enough that abstracting it would be premature. It’s only two notification types.”

Deadline is tomorrow. This ships.

Still nothing catastrophically wrong here. Two types, small duplication, it’s manageable.

Right?

Week 5: User Preferences

Customers start complaining.

“I don’t want SMS notifications.”

“Why am I getting emails for every status change?”

Sarah adds user preferences. Creates a database table. Updates her email handler to check if the user wants that particular notification before sending.

The handler triples in size.

Query the database. Check multiple preference flags. Handle the case where preferences don’t exist yet. Default values. Edge cases.

Sarah’s thinking: “This is getting messy, but the deadline is tomorrow and this works. I’ll refactor it next sprint.”

I cannot tell you how many times I’ve heard “next sprint.”

(Spoiler: next sprint never comes.)

Week 7: Two Ways to Do Everything

Mike needs to add notifications for order cancellations and delivery confirmations.

He realizes hardcoding email bodies isn’t going to scale.

So he builds a template system. Creates a templates directory. Writes a simple renderer. Updates his handlers to load templates, populate data, send.

It’s actually pretty clean.

Meanwhile, Sarah’s handlers still use string formatting. She doesn’t know Mike built a template system. Mike didn’t announce it in Slack. It just… exists now.

The codebase now has two different ways of generating notification content.

Sarah finds out later. Thinks: “I should probably switch to Mike’s templates… but my code is working and I’m slammed with other features.”

And she is. Three new features this sprint. No time to refactor working code.

Week 9: The Third Approach

Emma joins the team.

First task: add Slack notifications for the support team when high-value orders come in.

She opens the notification code. Finds Sarah’s inline approach. Finds Mike’s templates. Neither makes sense for Slack.

Slack needs structured JSON payloads, not formatted text.

So Emma does what any good engineer would do: she creates a “proper solution”.

Notification service class. Methods for each notification type. Handles destination-specific formatting internally. Clean. Testable. Well-designed.

She shows it to the team in standup.

Mike: “That’s nice, but I don’t have time to refactor my SMS code right now. Maybe later.”

Sarah: “I like it, but my code has been running in production for months. If it ain’t broke…”

Emma’s service class gets used for Slack notifications. Nothing else changes.

Now there are three ways to send notifications.

Week 12: The Chaos Compounds

Product wants:

Push notifications for the mobile app
Digest emails (daily order summaries)
Ability to snooze notifications

Three developers. Three features. Same week.

Each one discovers the existing fragmentation. Each one makes their own call.

Developer A tries to extend Sarah’s inline approach. Adds push notification logic directly in the handler.

Developer B uses Mike’s templates but creates a new template format because the existing one doesn’t support digest layouts.

Developer C tries to use Emma’s service class but realizes it doesn’t handle scheduling or snoozing. So they add that logic directly in their handler instead.

The notification preferences table is now being updated by five different code paths.

Each developer added their own columns because they didn’t realize others had added similar fields. One stores preferences as JSON. Another uses boolean columns. Another created a separate preferences table with foreign keys.

I’ve seen this code review happen. Every PR gets approved. Every piece of code works.

Nobody did anything wrong.

And yet.

“Every PR got approved. Every piece of code worked. Nobody did anything wrong. And yet.”

Week 15: Customer Complaints

Support tickets start flooding in.

“I’m getting duplicate notifications.”

“I disabled email but I’m still getting them.”

“I’m not getting notifications at all for important orders.”

Sarah investigates. Opens the codebase.

Six different code paths handle notifications. Some check preferences before sending. Some check during sending. Some don’t check at all because the developer assumed another layer was handling it.

She finds the bug. It’s in her original email handler. The preference check is wrong.

She fixes it. Deploys.

Three other notification types break. They were relying on her buggy behavior.

The team estimate to fix it properly: “We need to stop and refactor everything first, or we’ll just make it worse.”

Management: “We don’t have time for a refactor. Just fix the bugs.”

Week 17: The Template Nightmare

Marketing wants to update email designs. New brand guidelines.

The developer assigned to this opens the codebase.

Templates are everywhere.

Some in a /templates directory. Some hardcoded as strings. Some in the database. Some fetched from an external CMS that one developer integrated without telling anyone.

There’s no single source of truth.

Worse: the data passed to templates is completely inconsistent.

Email templates expect order objects with certain fields. SMS templates expect a flattened structure. Push notifications expect a completely different format.

One design change requires touching dozens of files.

The developer estimates: “Two weeks, maybe three.”

Marketing: “It’s just a design update. How is that two weeks?”

Week 20: Performance Crisis

Black Friday.

The system crashes.

Investigation reveals: notification handlers are opening new database connections for every single notification sent.

Some handlers properly close connections. Some don’t.

Connection pools exhausted. Some handlers retry failed sends immediately and indefinitely, amplifying the problem during the outage. One handler spawns a goroutine for each notification but never limits concurrency.

The server runs out of memory processing a batch of 10,000 order confirmations.

Different developers made different assumptions about error handling.

Some silently swallow errors and log them. Some retry with exponential backoff. Some fail fast. Some store failed notifications in one database table for retry. Others use a different table. One developer integrated a third-party queue system that nobody else knew existed.

Notifications are getting lost between these systems.

I’ve been on calls where the CTO asks: “How many notification systems do we have?”

Nobody can answer.

Week 24: The Audit

Compliance team asks a simple question:

“Can you show us a record of all notifications sent to customer X in the past 90 days?”

The team cannot answer this.

Notification logs are scattered everywhere.

Some handlers log to stdout. Some to files. Some to a database table. Some don’t log at all.

The log formats are completely different. Some include the full message content. Some just log “notification sent” without details. There’s no correlation between the notification and the triggering event.

The auditor asks: “How do you ensure notifications contain required legal disclosures?”

Each template was created independently. Some include required legal text. Some don’t. There’s no centralized enforcement.

I’ve seen this audit happen. Teams spend weeks reconstructing logs manually.

The Breaking Point

VP of Engineering asks for a simple feature:

“Add an unsubscribe link to all emails.”

The team estimates: Three weeks.

The VP is shocked.

“It’s just adding a link. How is that three weeks of work?”

The tech lead explains:

“We have seven different code paths that send emails. Each uses a different templating system. Some render templates on the server. Some fetch them from external systems. Some are hardcoded strings. We need to update each one individually, ensure the unsubscribe logic is consistent across all of them, add tracking for unsubscribe events, update the preferences system to handle unsubscribes properly, and test everything thoroughly because there’s no centralized testing strategy.”

Three weeks. For a link.

The VP asks the obvious question: “How did it get this bad?”

What Went Wrong?

Here’s the thing that kills me about this story.

Nobody made a catastrophically bad decision.

Sarah’s Week 1 implementation was appropriate. Mike’s template system was a reasonable improvement. Emma’s service class was a genuine attempt to bring order.

Every single developer was trying to do good work under deadline pressure.

The problem wasn’t the individual decisions.

It was the absence of a shared architectural vision.

Without clear boundaries and layers, each developer made reasonable local optimizations that created global chaos.

The “I’ll refactor it later” moments never came because there was never a good time to stop feature development.

The “let’s standardize this” conversations happened but never resulted in action because no one had time to migrate existing code.

The codebase evolved organically.

And organic growth without structure doesn’t produce a garden. It produces a weed-infested lot.

“But This Is Just a Communication Problem”

You might be thinking: the real issue was that developers didn’t communicate.

If Sarah and Mike had talked, they wouldn’t have built two different templating systems. If Emma had socialized her service class better, others would have adopted it.

Better standups. Better code reviews. Better documentation. That’s what was missing, not architecture.

This is seductive because it’s partially true.

But here’s why it misses the point: architecture IS communication.

“Architecture IS communication. It’s the most important form of communication for technical decisions.”

It’s the most important form of communication for technical decisions.

Think about what actually happened in the story.

The team DID communicate. Mike showed his template system in code review. Emma presented her service class and got positive feedback. They had a meeting in Week 11 trying to align on standards.

The communication happened.

What didn’t happen was turning those conversations into durable, enforceable decisions.

This is the key difference:

Conversation says “we should probably do X.”

Architecture says “X is how we do things here, and here’s where it lives.”

“Conversation is ephemeral. Architecture is the artifact that persists after the meeting ends.”

When a new developer joins and asks “where should notification logic go?”, the answer shouldn’t require scheduling a meeting or hunting through Slack history.

It should be obvious from looking at the codebase.

Communication without architecture leads to the problem Emma faced. She built something good. People agreed it was good. And then… nothing changed.

Without architectural decisions being explicitly made (“from now on, all notifications go through NotificationService”), the good idea just becomes another option in an increasingly fragmented codebase.

Good communication can prevent chaos. But it can’t survive bad processes.

When developers are under deadline pressure, working on different features, joining the team at different times, communication will have gaps.

Architecture is the safety net for when communication fails.

It’s the shared context that makes it possible to work somewhat independently without creating complete divergence.

So yes, the team in our story could have communicated better.

But the solution isn’t “communicate more.”

It’s “communicate the architecture and make it stick.”

Document where things belong. Make architectural decisions explicit. Enforce them in code review. Build structure that persists beyond any individual conversation.

Because at the end of the day, you can have all the Slack channels and standups and retros you want.

Without a shared architectural foundation, you’re just having the same conversations over and over while the codebase continues to fragment.

What Should Have Happened in Week 1

Sarah should have spent 30 minutes writing this:

# Notification System Architecture

## Where Things Live
- All notification logic → services/notification_service.py
- Templates → templates/ directory (Jinja2 format)
- Preference checks → services/preference_service.py
- Delivery logging → notification_log table

## How to Add a New Notification Type
1. Add template to templates/
2. Add method to NotificationService
3. Log delivery attempt (success or failure)
4. Add tests to test_notification_service.py

## Error Handling
- Retries: 3 attempts with exponential backoff (1s, 2s, 4s)
- Failed sends → dead_letter_queue table
- All errors logged with correlation ID

## Preferences
- Check preferences BEFORE sending (not during)
- Default: all notifications enabled
- Unsubscribe → set all preferences to false

That’s it. 30 minutes of work. Would have saved months of chaos.

When Mike added SMS in Week 3, he would have known where to put it. When Emma added Slack in Week 9, she would have followed the existing pattern. When three developers worked simultaneously in Week 12, they would have made consistent decisions.

Not because they communicated better. Because the architecture communicated for them.

The Pattern You’ve Seen Before

I’ve seen this exact pattern play out at least a dozen times.

Different companies. Different tech stacks. Different teams. Different features.

The pattern is always the same.

Week 1: Clean, working code.

Week 3: Small duplication appears.

Week 7: Multiple approaches emerge.

Week 12: Chaos compounds.

Month 6: Simple changes take weeks.

The timeline varies. Sometimes it happens faster (AI accelerates it). Sometimes slower (disciplined team delays it). But without architecture, the destination is always the same.

Then AI Showed Up and Made Everything 10x Worse

Everything I just described? It’s been happening for decades.

Slow burn. Predictable. Manageable if you catch it early.

Then 2024 happened.

AI coding assistants arrived. And they turned architectural decay from a slow burn into a wildfire.

AI Replicates. It Doesn’t Invent.

Here’s what changed.

When Mike needed to add SMS in Week 3, he opened Sarah’s code. Looked at it. Made a decision. Maybe he copied the pattern. Maybe he tried something different.

But he thought about it.

Now imagine Mike has Cursor. Or Copilot. Or Claude Code.

He types: // Add SMS notification for shipments

The AI looks at the codebase. Sees Sarah’s pattern. Instantly replicates it.

Code appears. Mike reviews it. Looks good. Ships.

He never even saw the architectural decision being made.

The AI made it for him. Based on what already existed.

“AI doesn’t just copy your code. It copies your architecture. Even the accidental parts.”

The Speed and Scale Just Exploded

Remember Week 12? Three developers, three features, three different approaches emerging over a week?

With AI, that’s Tuesday.

Developer A asks AI for push notifications. AI sees Sarah’s inline handler. Copies it.

Developer B asks AI for digest emails. AI sees Mike’s templates. Copies those.

Developer C asks AI for snoozing. AI sees Emma’s service class. Copies that.

All three features ship the same day.

But it’s not just faster. It’s bigger.

Pre-AI: 50-200 lines of code per day.

With AI: 500-2000 lines in the same time.

That’s 5-10x more code implementing patterns, creating variations, spreading duplication.

You have two ways of checking preferences? AI propagates both. Three error handling approaches? AI replicates all three. Every inconsistency becomes a seed that AI plants everywhere.

The notification system that took Sarah’s team 20 weeks to become unmaintainable?

With AI, you can get there in 4.

AI Can’t See What You Didn’t Write Down

Here’s the fundamental problem.

AI is incredible at implementation. It can write clean, working code. It follows patterns. It handles edge cases.

But it cannot architect.

It can’t look at your codebase and think: “Wait, this is getting fragmented. We should consolidate these patterns.”

It can’t say: “I see three different approaches here. Which one should I follow?”

It just… picks one. Based on similarity to what you’re asking for.

If your architecture is accidental, AI accelerates the accident.

The Old Advice Is Now Dangerous

The advice used to be: “Don’t over-architect small projects. Start simple. Refactor when you need to.”

That advice just became dangerous.

With AI, “small projects” don’t stay small. They explode.

By the time you realize you need to refactor, you have 10x more code to untangle.

The window between “clean start” and “architectural debt crisis” collapsed.

Week 1 decisions matter more than ever.

You can’t afford to defer architecture anymore.

But Here’s the Good News

The same force that amplifies chaos can amplify order.

AI replicates good patterns just as enthusiastically as bad ones.

If you write that architecture document in Week 1. If you establish clear boundaries. If you make the “right way” obvious.

AI will follow it.

Consistently. Every single time. Across every feature.

It will use your NotificationService. It will follow your template structure. It will implement your error handling exactly as specified.

At scale. At speed. Without deviation.

The chaos multiplier becomes a consistency multiplier.

But only if you give it something consistent to multiply.

“AI doesn’t make architecture optional. It makes it mandatory.”

This is why the next post matters even more now.

I’ll show you how to set up that architectural foundation before you start generating code with AI.

How to make the right patterns so obvious that AI can’t help but follow them.

How to turn AI from an architectural time bomb into an architectural enforcement mechanism.

What’s Next

In the next post, I’ll show you how to build that architectural foundation.

Not some enterprise framework. Not over-engineered complexity.

The simple, practical structure that prevents this chaos.

We’ll rebuild this exact notification system with clear boundaries, testable code, and patterns that guide developers toward consistency instead of fragmentation.

You’ll see:

Where things live (and why)
How to test without infrastructure
How to make architectural decisions stick
How AI helps instead of amplifying chaos

Until then, look at your codebase.

What week are you on?

Have you lived through this story? I’d love to hear about it. Find me on Twitter or LinkedIn.

The post A Tale of Accidental Architecture: How 50 Lines Became A Black Friday Disaster appeared first on sudoish.