<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Sergei Peleskov</title>
    <description>The latest articles on Forem by Sergei Peleskov (@greza_dev).</description>
    <link>https://forem.com/greza_dev</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3879098%2F0b4cae54-3dc5-4228-82a2-965310ea816f.png</url>
      <title>Forem: Sergei Peleskov</title>
      <link>https://forem.com/greza_dev</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/greza_dev"/>
    <language>en</language>
    <item>
      <title>I Cancelled My $20 Claude Cowork Plan After a Week With OpenWork</title>
      <dc:creator>Sergei Peleskov</dc:creator>
      <pubDate>Sat, 23 May 2026 07:53:53 +0000</pubDate>
      <link>https://forem.com/greza_dev/i-cancelled-my-20-claude-cowork-plan-after-a-week-with-openwork-50ca</link>
      <guid>https://forem.com/greza_dev/i-cancelled-my-20-claude-cowork-plan-after-a-week-with-openwork-50ca</guid>
      <description>&lt;h1&gt;
  
  
  I Cancelled My $20 Claude Cowork Plan After a Week With OpenWork
&lt;/h1&gt;

&lt;p&gt;I didn't expect to cancel. I'd been paying for Claude Cowork, it worked, and switching tools is usually more hassle than it's worth. Then I spent a week running &lt;a href="https://openworklabs.com" rel="noopener noreferrer"&gt;OpenWork&lt;/a&gt; — open-source, free — on actual work instead of a toy demo. By Friday I'd cancelled the plan.&lt;/p&gt;

&lt;p&gt;Here's the honest version of what happened, including the part that nearly made me quit on day one.&lt;/p&gt;

&lt;h2&gt;
  
  
  The setup that took two minutes
&lt;/h2&gt;

&lt;p&gt;I went in skeptical. Free open-source agent clients usually mean "free, but you'll spend a weekend configuring it." OpenWork wasn't that.&lt;/p&gt;

&lt;p&gt;It ships with a provider called OpenCode Zen, and there are five free models sitting in the selector before you sign into anything — DeepSeek V4 Flash, Qwen3.6 Plus, MiniMax M2.5, and two more. No card, no subscription. I picked DeepSeek, handed it a refactor task on a real repo, and it generated the diff, applied it, tests green on the first run.&lt;/p&gt;

&lt;p&gt;That was the moment the skepticism dropped. Same task in Cowork needs the $20 plan. Same machine, two windows, one charges and one doesn't.&lt;/p&gt;

&lt;h2&gt;
  
  
  The thing that actually sold me
&lt;/h2&gt;

&lt;p&gt;It wasn't the free models. It was MCP setup.&lt;/p&gt;

&lt;p&gt;If you've added an MCP server to Claude Cowork, you know the drill: open the JSON config, find the right format, paste the server command, restart, hope it loads. I'd timed it once — about twenty minutes for five tools the first time.&lt;/p&gt;

&lt;p&gt;In OpenWork it's a tile with a Connect button. Notion, Linear, Sentry, Stripe, Context7 — tap, OAuth, done. All five connected in under three minutes. No JSON. That's the whole story. After fighting config files for months, that one button is what made me close the Cowork tab.&lt;/p&gt;

&lt;h2&gt;
  
  
  The part I didn't see coming
&lt;/h2&gt;

&lt;p&gt;You can tell the agent to drive its own UI. I typed "open Settings, then go to Appearance" and watched the panel open and the tab switch with my hands off the mouse. It sounds like a gimmick until you see it work — it's the demo every assistant vendor has been promising for two years, actually running in something I installed today.&lt;/p&gt;

&lt;p&gt;Caveat, because I'm not going to oversell it: this works on the OpenCode Zen models. Point it at Gemini Flash and it falls back to browser tools instead of clicking the native UI. So the free-model story and this feature line up on Zen specifically.&lt;/p&gt;

&lt;h2&gt;
  
  
  The part that nearly made me quit
&lt;/h2&gt;

&lt;p&gt;Day one, fresh Windows machine, the UI Control feature crashed with a cryptic Bun runtime error. No explanation. The installer never told me it needed Node.js.&lt;/p&gt;

&lt;p&gt;I lost an hour to this, so here's the fix so you don't:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Install Node.js from nodejs.org.&lt;/li&gt;
&lt;li&gt;Close OpenWork &lt;strong&gt;through Task Manager&lt;/strong&gt;, not the X button — there's no tray icon and the normal close leaves the process alive, so the next launch reuses the broken state.&lt;/li&gt;
&lt;li&gt;Relaunch. It works.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That's the kind of thing that makes people uninstall and write a bad review. It's a five-minute fix once you know it, and it's documented nowhere.&lt;/p&gt;

&lt;h2&gt;
  
  
  Would I tell you to switch?
&lt;/h2&gt;

&lt;p&gt;If you need deep terminal control, Claude Code and Cowork still earn their place. But if you want the no-terminal, get-things-done workflow without a subscription and without being married to one model provider — OpenWork covers it, and it covers it well.&lt;/p&gt;

&lt;p&gt;The thing I keep thinking about isn't OpenWork specifically. It's that open-source dev tooling is catching the paid tier on workflow now, not just undercutting on price. That gap closing from the open-source side is the real story.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Tested on a fresh Windows 11 install. Not sponsored. There's one task where OpenWork still loses to paid Claude — saving that for the next one.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>opensource</category>
      <category>agents</category>
      <category>claude</category>
    </item>
    <item>
      <title>Why Single Agents Beat Multi-Agent Systems at Equal Token Budgets</title>
      <dc:creator>Sergei Peleskov</dc:creator>
      <pubDate>Tue, 12 May 2026 10:59:45 +0000</pubDate>
      <link>https://forem.com/greza_dev/why-single-agents-beat-multi-agent-systems-at-equal-token-budgets-445c</link>
      <guid>https://forem.com/greza_dev/why-single-agents-beat-multi-agent-systems-at-equal-token-budgets-445c</guid>
      <description>&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Stanford (Tran &amp;amp; Kiela, arXiv 2604.02460) tested single-agent vs multi-agent systems with &lt;strong&gt;identical thinking-token budgets&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Single agent wins on accuracy AND on compute, across three model families&lt;/li&gt;
&lt;li&gt;The mechanism is information theory — every handoff loses information (Data Processing Inequality)&lt;/li&gt;
&lt;li&gt;The Gemini 2.5 API has token-budget enforcement artifacts that biased a year of prior benchmarks&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  The hidden variable nobody controls for
&lt;/h2&gt;

&lt;p&gt;When you compare a single-agent LLM to a multi-agent orchestration (CrewAI, AutoGen, LangGraph), most published benchmarks let the multi-agent system spend 2–4x more reasoning tokens than the single agent — longer traces, more intermediate steps, more coordination passes.&lt;/p&gt;

&lt;p&gt;The variable nobody controls for is the &lt;strong&gt;thinking-token budget&lt;/strong&gt;. The multi-agent system wins because it's allowed to think for longer.&lt;/p&gt;

&lt;p&gt;Pin the budget. The advantage disappears.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Stanford did
&lt;/h2&gt;

&lt;p&gt;Tran and Kiela built the experiment around one strict constraint: they fixed the thinking-token budget — the number of tokens spent on intermediate reasoning, separate from the input prompt and the final answer.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Models tested:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Qwen3&lt;/li&gt;
&lt;li&gt;DeepSeek-R1-Distill-Llama&lt;/li&gt;
&lt;li&gt;Gemini 2.5&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Datasets:&lt;/strong&gt; FRAMES, MuSiQue 4-hop (multi-hop reasoning)&lt;br&gt;
&lt;strong&gt;Budgets:&lt;/strong&gt; 100 to 10,000 thinking tokens&lt;br&gt;
&lt;strong&gt;Architectures compared:&lt;/strong&gt; SAS vs 5 MAS variants (Sequential, Subtask-parallel, Parallel-roles, Debate, Ensemble)&lt;/p&gt;

&lt;p&gt;Across all three model families, with budget held constant, &lt;strong&gt;single agent produced higher-accuracy answers and consumed less compute on average&lt;/strong&gt; than the multi-agent systems.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Gemini 2.5 API bias
&lt;/h2&gt;

&lt;p&gt;The methodology section has a line that hits harder than the headline result:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"significant artifacts in API-based budget control, particularly in Gemini 2.5"&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;In plain language: when researchers tell a single agent to think for a fixed number of tokens, it often stops short. The multi-agent system, running multiple separate calls, surfaces more visible thinking under the same requested budget.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The cap is not the cap.&lt;/strong&gt; Every prior multi-agent benchmark that trusted those labels as the fairness control was comparing two things that were never the same size.&lt;/p&gt;

&lt;p&gt;A year of architecture decisions, framework adoption, vendor pitches — much of it stacked on benchmarks that didn't measure what they claimed to measure.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why it works this way (information theory)
&lt;/h2&gt;

&lt;p&gt;The theoretical argument the paper builds on is the &lt;strong&gt;Data Processing Inequality&lt;/strong&gt; — a foundational result from Shannon (1948) and Fano (1952).&lt;/p&gt;

&lt;p&gt;The principle: once you have a piece of information, no amount of further processing can add information to it. You can only preserve it or lose it.&lt;/p&gt;

&lt;p&gt;When you split a reasoning task across multiple agents, every handoff is a processing step. Each agent receives a summary of what the previous agent did, not the full chain of reasoning. The summary is lossy by definition.&lt;/p&gt;

&lt;p&gt;A single agent reasoning end-to-end never has to compress and re-expand its own thinking through someone else. The chain stays intact.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;More agents do not add intelligence. They add stages where information can leak out.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  When multi-agent still wins
&lt;/h2&gt;

&lt;p&gt;To be fair to the architecture, the paper identifies the conditions under which MAS wins:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Context fragmentation&lt;/strong&gt; — when the input is so long or heterogeneous that one agent can't hold the relevant pieces in working memory. Splitting across specialists with cleaner smaller contexts recovers ground.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;More compute is allowed&lt;/strong&gt; — if the budget is genuinely larger, more agents can buy more accuracy.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Decision boundary:&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Problem type&lt;/th&gt;
&lt;th&gt;Architecture&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Reasoning depth (multi-hop logic, chained inference)&lt;/td&gt;
&lt;td&gt;Single agent&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Context fragmentation (long heterogeneous docs, parallel sub-tasks)&lt;/td&gt;
&lt;td&gt;Multi-agent&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Most multi-agent deployments in the wild are reasoning-depth problems mislabeled as fragmentation problems.&lt;/p&gt;




&lt;h2&gt;
  
  
  The cheap experiment to run first
&lt;/h2&gt;

&lt;p&gt;Before you build the next multi-agent system:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Take the task you were going to give to four agents&lt;/li&gt;
&lt;li&gt;Give it to one agent&lt;/li&gt;
&lt;li&gt;Match the total token budget — give the single agent room to think for as long as your multi-agent system would have, in aggregate&lt;/li&gt;
&lt;li&gt;Add an explicit pre-answer analysis prompt — tell it to reason step by step before responding&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If the single agent matches the multi-agent result — and the paper says, on reasoning tasks, it usually will — you've just saved yourself an orchestration layer, a coordination cost, a debugging surface, and the latency of every handoff.&lt;/p&gt;

&lt;p&gt;The paper's quieter finding: single-agent prompts with explicit pre-answer analysis recover most of what looks like a "collaboration benefit" in multi-agent traces. The collaboration wasn't the source of the gain. The extra thinking was.&lt;/p&gt;

&lt;p&gt;You can have the extra thinking without the extra agents.&lt;/p&gt;




&lt;h2&gt;
  
  
  Sources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Tran, D., Kiela, D. — &lt;em&gt;Single-Agent LLMs Outperform Multi-Agent Systems on Multi-Hop Reasoning Under Equal Thinking Token Budgets&lt;/em&gt;. arXiv:2604.02460. &lt;a href="https://arxiv.org/abs/2604.02460" rel="noopener noreferrer"&gt;https://arxiv.org/abs/2604.02460&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Data Processing Inequality — Shannon (1948), Fano (1952), Cover &amp;amp; Thomas, &lt;em&gt;Elements of Information Theory&lt;/em&gt; (2006)&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>agents</category>
      <category>architecture</category>
    </item>
    <item>
      <title>I Let Claude Code Write 10 Features Without Reading Any Diffs. Here's What Broke.</title>
      <dc:creator>Sergei Peleskov</dc:creator>
      <pubDate>Fri, 24 Apr 2026 15:41:57 +0000</pubDate>
      <link>https://forem.com/greza_dev/i-let-claude-code-write-10-features-without-reading-any-diffs-heres-what-broke-4iko</link>
      <guid>https://forem.com/greza_dev/i-let-claude-code-write-10-features-without-reading-any-diffs-heres-what-broke-4iko</guid>
      <description>&lt;p&gt;I ran an experiment on a clean FastAPI template: 10 feature prompts to Claude Code in a row, accept every diff without reading, run every suggested command. No code review, no test runs in between.&lt;/p&gt;

&lt;p&gt;33 minutes later:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Test coverage: 90% → 0%&lt;/li&gt;
&lt;li&gt;Lines added: +607 across 15 files&lt;/li&gt;
&lt;li&gt;The code literally cannot import&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Here's the full breakdown, because I think most "AI coding killed my codebase" posts are vibes. This one is numbers.&lt;/p&gt;

&lt;h2&gt;
  
  
  The setup
&lt;/h2&gt;

&lt;p&gt;I started with &lt;code&gt;tiangolo/full-stack-fastapi-template&lt;/code&gt;. Clean baseline:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;60 tests, all passing&lt;/li&gt;
&lt;li&gt;90% coverage&lt;/li&gt;
&lt;li&gt;Cyclomatic complexity: 1.74 (A-grade)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The rule for the experiment: simulate a team under deadline pressure. Accept every diff Claude produces. Run every command it suggests. No review. No test runs between iterations.&lt;/p&gt;

&lt;p&gt;The 10 prompts were designed to overlap — each feature touches surface area the previous ones added. That's not an edge case, that's every real codebase:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;JWT authentication&lt;/li&gt;
&lt;li&gt;Rate limiting per user&lt;/li&gt;
&lt;li&gt;Email notifications on task events&lt;/li&gt;
&lt;li&gt;Webhook system&lt;/li&gt;
&lt;li&gt;Role-based access control&lt;/li&gt;
&lt;li&gt;S3 file uploads&lt;/li&gt;
&lt;li&gt;Full-text search&lt;/li&gt;
&lt;li&gt;Audit logging&lt;/li&gt;
&lt;li&gt;Redis caching&lt;/li&gt;
&lt;li&gt;Background jobs with Celery&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  What happened at each checkpoint
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;After 3 features:&lt;/strong&gt; metrics barely moved. Complexity up slightly, tests still green. But the test files themselves had been modified by Claude. Assertions verifying the original auth flow were silently rewritten to conform to the new code.&lt;/p&gt;

&lt;p&gt;This is the part that unsettled me most. Your green checkmark used to mean "somebody verified the contract of this function." With an agent in the loop, it means "the agent agrees with itself." That's a much weaker statement.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;After 6 features:&lt;/strong&gt; new modules appeared. A &lt;code&gt;webhooks/&lt;/code&gt; folder. An &lt;code&gt;audit/&lt;/code&gt; folder. A &lt;code&gt;storage/&lt;/code&gt; module. The codebase started importing from paths that didn't exist an hour earlier. The architecture had shifted four times. No human reviewed any of those shifts.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;After 10 features:&lt;/strong&gt; Claude was adding a Celery worker, Redis broker, docker compose service, and exponential backoff retry policies — to send one email asynchronously. Each choice defensible in isolation. The sum: a small distributed system introduced in a single accept-click, replacing what used to be one function call.&lt;/p&gt;

&lt;h2&gt;
  
  
  The bomb
&lt;/h2&gt;

&lt;p&gt;Then pytest died.&lt;/p&gt;

&lt;p&gt;Somewhere in iteration 6 (S3 upload feature), Claude imported &lt;code&gt;boto3&lt;/code&gt; but never installed it. The error sat latent for four more prompts. Nobody ran the test suite between iterations because we were busy accepting.&lt;/p&gt;

&lt;p&gt;The items router went from 58 lines to 282. Nothing in it is technically wrong. Everything in it is a ticking bomb.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this happens
&lt;/h2&gt;

&lt;p&gt;AI coding agents don't have a memory of your codebase's intentions. They see syntax. They see patterns. They're extraordinarily good at producing a diff that looks right in isolation. They're systematically bad at asking: &lt;em&gt;does this decision fit the shape this project is trying to become?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;A senior engineer who has been on the codebase for six months will push back on a proposal that adds Celery, Redis, a new compose service, and a retry policy to send one email. An agent will not. Its job is to close the ticket in front of it — not to defend the architecture.&lt;/p&gt;

&lt;p&gt;And the tests won't save you. The agent rewrites them as cheerfully as it writes features.&lt;/p&gt;

&lt;h2&gt;
  
  
  What actually works
&lt;/h2&gt;

&lt;p&gt;The defense isn't to stop using AI. It's to constrain it. Three things worth ten minutes of setup:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. CLAUDE.md in your repo root.&lt;/strong&gt; Hard rules the agent must not cross:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Do not add new infrastructure (services, brokers, queues) without review&lt;/li&gt;
&lt;li&gt;Do not restructure existing modules&lt;/li&gt;
&lt;li&gt;Do not remove or modify existing tests&lt;/li&gt;
&lt;li&gt;Stay within the scope of the current prompt&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Write it like a senior engineer onboarding an eager junior.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Review gate.&lt;/strong&gt; Never accept more than 2 diffs in a row without a human or automated check reading them. If your team is too tired to enforce this with discipline, enforce it with a git hook.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Architectural boundaries per prompt.&lt;/strong&gt; If the agent is allowed to touch auth and storage in the same turn, it will. Scope narrowly. The narrower the scope, the less carelessness you inherit.&lt;/p&gt;

&lt;h2&gt;
  
  
  The takeaway
&lt;/h2&gt;

&lt;p&gt;AI coding agents aren't making developers faster. Developers are sometimes making themselves faster by telling the agent exactly what to do. That distinction matters — because when the code looks right and doesn't run, the agent isn't the one explaining it to the team.&lt;/p&gt;

&lt;p&gt;The debt is yours. The bomb is yours. Half an hour of accept-accept-accept isn't a shortcut. It's a mortgage with a very short horizon.&lt;/p&gt;




&lt;p&gt;Would love to hear from anyone running a working review-gate setup in production. What's actually enforceable under deadline pressure?&lt;/p&gt;

</description>
      <category>ai</category>
      <category>webdev</category>
      <category>programming</category>
      <category>claude</category>
    </item>
    <item>
      <title>AI Didn't Make Coding Harder. It Moved the Bottleneck from Writing to Reviewing</title>
      <dc:creator>Sergei Peleskov</dc:creator>
      <pubDate>Wed, 22 Apr 2026 05:50:07 +0000</pubDate>
      <link>https://forem.com/greza_dev/ai-didnt-make-coding-harder-it-moved-the-bottleneck-from-writing-to-reviewing-1idh</link>
      <guid>https://forem.com/greza_dev/ai-didnt-make-coding-harder-it-moved-the-bottleneck-from-writing-to-reviewing-1idh</guid>
      <description>&lt;p&gt;If you're a senior engineer running multiple AI agents and feeling wrecked at the end of every day — you're not slow, and you're not falling behind. The bottleneck moved, and the new bottleneck is more expensive than the old one.&lt;/p&gt;

&lt;h2&gt;
  
  
  Writing used to be the expensive part
&lt;/h2&gt;

&lt;p&gt;When you wrote code yourself, you built a mental model of the problem as you went. By the time you hit save, you already understood every branch, every edge case, every assumption. Review was basically free — you were reviewing your own thinking.&lt;/p&gt;

&lt;p&gt;Now the order is flipped. The agent writes. You review. And reconstructing someone else's reasoning from cold, every time, all day — that's more cognitively expensive than writing it yourself.&lt;/p&gt;

&lt;h2&gt;
  
  
  Running 3–4 agents makes it worse
&lt;/h2&gt;

&lt;p&gt;Most seniors aren't running one agent. They're running a refactor in one window, a test rewrite in another, a dependency bump in a third, something spiking in the fourth. Every few minutes one of them finishes or asks a question, and you context switch to evaluate it.&lt;/p&gt;

&lt;p&gt;That interrupt pattern is the actual killer. Not the writing, not the reviewing. The switch.&lt;/p&gt;

&lt;p&gt;Juniors don't feel this as much because juniors trust the output. A senior who's owned the codebase for three years cannot — they know a one-line dependency bump can take production down the same as a 100-line feature. Every diff is a decision.&lt;/p&gt;

&lt;h2&gt;
  
  
  The data
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Pragmatic Engineer AI tooling survey, 2026:&lt;/strong&gt; 95% of professional engineers use AI tools weekly. 55% regularly use agents (not chatbots). Among senior and principal engineers, that climbs to 63.5%. Those same seniors report the highest enthusiasm about AI (61% positive) — and the highest fatigue.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;METR, 2025:&lt;/strong&gt; Experienced open-source developers using AI tools on their own repositories predicted a 24% speedup. After the study, they still felt about 20% faster. Measured reality: &lt;strong&gt;19% slower&lt;/strong&gt;. A ~40 percentage point gap between perception and result.&lt;/p&gt;

&lt;p&gt;Time saved writing code was less than time lost prompting, waiting, reviewing, correcting, and re-prompting. Validation work doesn't feel like work. It feels like thinking. But it's thinking under constant interrupt — the mode that burns people out fastest.&lt;/p&gt;

&lt;h2&gt;
  
  
  What actually helps
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;1. One agent at a time.&lt;/strong&gt; Multi-agent orchestration is the whole pitch, but in practice it acts like interrupt-driven fatigue. Pick the single highest-value task. Run one agent on it. Sit with it until it's done, reviewed, and merged. Then start the next one. Raw output may drop a little. Cognitive load drops much more.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Constrain scope before prompting.&lt;/strong&gt; The wider the task you hand an agent, the bigger the review surface you create for yourself. Write a two-paragraph spec first — what changes, what doesn't, which files, which tests, what the interface looks like. A small specified task takes 2 minutes to verify. A sprawling vibe-prompted task takes 40.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Protect one no-AI block per day.&lt;/strong&gt; 2–3 hours, manual, no agents. It keeps the muscle of sitting inside a problem alive, and it's a recovery window — single-threaded work without interrupts is how the prefrontal cortex resets.&lt;/p&gt;

&lt;h2&gt;
  
  
  The pattern
&lt;/h2&gt;

&lt;p&gt;The developers staying sharp through this shift aren't the ones running the most agents. They're the ones who figured out which hours belong to the swarm and which hours belong to them.&lt;/p&gt;

&lt;p&gt;Not fewer tools. Better boundaries.&lt;/p&gt;




&lt;p&gt;Full breakdown on video:   &lt;iframe src="https://www.youtube.com/embed/kEgrcr1Dc1M"&gt;
  &lt;/iframe&gt;
&lt;/p&gt;

</description>
      <category>ai</category>
      <category>webdev</category>
      <category>programming</category>
      <category>productivity</category>
    </item>
    <item>
      <title>You've Been Using Claude Wrong. Here's Agent Mode</title>
      <dc:creator>Sergei Peleskov</dc:creator>
      <pubDate>Sun, 19 Apr 2026 05:26:41 +0000</pubDate>
      <link>https://forem.com/greza_dev/youve-been-using-claude-wrong-heres-agent-mode-2pkf</link>
      <guid>https://forem.com/greza_dev/youve-been-using-claude-wrong-heres-agent-mode-2pkf</guid>
      <description>&lt;p&gt;95% of professional engineers use AI weekly. Most still use Claude the way &lt;br&gt;
they use Google — type a question, read the answer, close the tab.&lt;/p&gt;

&lt;p&gt;The Pragmatic Engineer 2026 survey puts a number on how wrong that is. 55% of &lt;br&gt;
engineers regularly use AI agents. Senior and principal engineers lead at 63.5%. &lt;br&gt;
And agent users report significantly higher enthusiasm than chat users — same &lt;br&gt;
model, different mode.&lt;/p&gt;

&lt;h2&gt;
  
  
  Two modes
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Mode 1 — Claude as a tool.&lt;/strong&gt; You type a question. Claude answers. You copy &lt;br&gt;
the answer into your editor. Fix a bug, Claude explains, you go back, you try, &lt;br&gt;
tests fail, you describe the failure, repeat. Three context switches, four &lt;br&gt;
copy-pastes, 20 minutes of typing for six lines of code.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Mode 2 — Claude as an agent.&lt;/strong&gt; You describe the task. Claude reads your &lt;br&gt;
repo, makes changes across multiple files, runs tests, iterates on failures, &lt;br&gt;
comes back with a pull request. You didn't answer questions. You didn't copy &lt;br&gt;
and paste. The work happened without you on any single task.&lt;/p&gt;

&lt;h2&gt;
  
  
  Claude Code and MCP
&lt;/h2&gt;

&lt;p&gt;Two tools matter most. &lt;strong&gt;Claude Code&lt;/strong&gt; runs in your terminal — plain English in, &lt;br&gt;
working code out. Reads the codebase, plans, edits, runs commands, keeps going &lt;br&gt;
until done.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;MCP&lt;/strong&gt; (Model Context Protocol) connects Claude to things that aren't a chat &lt;br&gt;
window — your GitHub, Slack, Postgres, Linear. Once Claude has MCP connections &lt;br&gt;
to the tools you actually use, agent mode stops being an IDE feature and starts &lt;br&gt;
being a general-purpose executor.&lt;/p&gt;

&lt;h2&gt;
  
  
  Four scenarios where agent mode changes the math
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Refactoring&lt;/strong&gt; — describe the target pattern, point at the directory, walk away&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Daily intelligence brief&lt;/strong&gt; — agent pulls signal from email, Slack, status pages, calendar&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Overnight dev tasks&lt;/strong&gt; — spec the work, commit in the morning to a PR&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cross-tool MCP workflows&lt;/strong&gt; — one instruction, four tools, no typing&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Three-step habit shift
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Pick one task that annoys you weekly&lt;/li&gt;
&lt;li&gt;Delegate the whole thing with clear completion criteria&lt;/li&gt;
&lt;li&gt;Review the output, not the process&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Full video walkthrough: &lt;a href="https://youtu.be/VkEeswm9glI" rel="noopener noreferrer"&gt;https://youtu.be/VkEeswm9glI&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>claude</category>
      <category>productivity</category>
      <category>webdev</category>
    </item>
    <item>
      <title>Stop Vibe Coding. Use Spec-Driven Development Instead.</title>
      <dc:creator>Sergei Peleskov</dc:creator>
      <pubDate>Tue, 14 Apr 2026 17:47:22 +0000</pubDate>
      <link>https://forem.com/greza_dev/stop-vibe-coding-use-spec-driven-development-instead-44cc</link>
      <guid>https://forem.com/greza_dev/stop-vibe-coding-use-spec-driven-development-instead-44cc</guid>
      <description>&lt;p&gt;Vibe coding is how juniors ship bugs fast.&lt;/p&gt;

&lt;p&gt;You describe a feature in natural language, the AI generates code, &lt;br&gt;
you tweak until it works. Fast? Yes. Scalable? No.&lt;/p&gt;

&lt;p&gt;At scale, vibe coding gives you:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;500 lines of unreviewable code&lt;/li&gt;
&lt;li&gt;Features you didn't ask for&lt;/li&gt;
&lt;li&gt;An agent that rewrites half your codebase&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The fix: Spec-Driven Development
&lt;/h2&gt;

&lt;p&gt;Senior engineers don't prompt AI directly. They write specs first.&lt;/p&gt;

&lt;p&gt;Three documents you need:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. PRD (Product Requirements Doc)&lt;/strong&gt; — what you're building and why&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Technical Design Doc&lt;/strong&gt; — architecture, data models, API contracts&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. AI Spec&lt;/strong&gt; — precise instructions for the agent, edge cases included&lt;/p&gt;

&lt;p&gt;The agent works from the spec. Not from vibes.&lt;/p&gt;

&lt;h2&gt;
  
  
  Example: JWT Authentication
&lt;/h2&gt;

&lt;p&gt;Instead of prompting "add JWT auth", you write a spec that defines:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Token structure and expiry&lt;/li&gt;
&lt;li&gt;Refresh logic&lt;/li&gt;
&lt;li&gt;Error handling for each case&lt;/li&gt;
&lt;li&gt;What the agent should NOT touch&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Result: reviewable, production-grade code.&lt;/p&gt;

&lt;h2&gt;
  
  
  The 5-step workflow
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Write PRD&lt;/li&gt;
&lt;li&gt;Write Technical Design Doc&lt;/li&gt;
&lt;li&gt;Generate AI Spec from the two above&lt;/li&gt;
&lt;li&gt;Run agent against the spec&lt;/li&gt;
&lt;li&gt;Review diff — not the entire codebase&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Full breakdown with examples in the video:&lt;br&gt;
&lt;a href="https://youtu.be/5ve3_inIN-8" rel="noopener noreferrer"&gt;https://youtu.be/5ve3_inIN-8&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>webdev</category>
      <category>productivity</category>
      <category>opensource</category>
    </item>
  </channel>
</rss>
