Forem: Eugene Zharkov

The agent over-applies everything: why “don’t” is my most-used word

Eugene Zharkov — Sun, 12 Apr 2026 20:17:41 +0000

Models: Opus 4.6, Sonnet 4.6, Composer 1.5, Composer 2

In 60% of my agent sessions, I had to correct the output. The single most common word in those corrections was "DON'T".

It was not "fix" or "try again" I typed "don't" 1,703 times.

Every one of those signals is a constraint that existed in my head but wasn't encoded anywhere the agent could find it. That's the actual problem — not agent quality, but the gap between implicit knowledge and what gets written down.

The data

I exported every agent transcript from a React Native-to-Web migration — 767 sessions across two repos. Wrote scripts to classify correction signals in user messages.

Corrections:        Web 60% vs RN 35%  
Don't/should not:   Web 1,703 vs RN 128
Keep/preserve:      Web 308 vs RN 35
Over-applied:       Web 182 vs RN 19
Same as before:     Web 179 vs RN 13

Web's correction rate is nearly double. The reason is mechanical: more files, more interconnected components, more established conventions. More surface area means more implicit constraints. More implicit constraints means more correction loops.

I'm looking at it as a scaling law instead of a quality issue.

Three failure modes

The agent doesn't fail randomly. Almost every correction falls into one of three patterns.

The eager editor. I ask to update a component's layout and the agent also "improves" imports, reformats styled-components, tweaks prop types. I requested one thing and agent touched twelve files. Once, a styling adjustment on one screen turned into 52 updated SVG icons. That's the diff I had to review.

This is the bulk of the 1,703 "don't" signals. The agent treats every session as an opportunity to leave things better than it found them. In a production codebase, that means noisy diffs and broken trust in code review.

The pattern breaker. 30 components follow the same structure, and when I ask for a new one, the agent invents its own. I had explicitly written 'make the fields look the same as in SignUpForm' The agent read the adjacent files; I can see it in the transcript. It read them, understood them, and then decided to be creative instead of consistent.

The maximalist. I say "add a loading state". I get a loading state, an error state, a retry mechanism, and a skeleton screen. "Any point to handle errors individually? It's too much". The agent interprets "add X" as "add X and everything X might eventually need".

The cost model

These sessions averaged 4.1 user turns. Most corrections land on turn 2 or 3 — the first response overshoots, I constrain, the agent narrows. One extra round-trip per session, roughly 15% overhead on median session length.

Multiply that by 392 sessions. That's 392 moments where I'm teaching the agent something my codebase already knows — how we name things, which files are off-limits, what "done" looks like here.

The agent's first pass still gets you 70-80% there. The correction turn is tuning, not rebuilding. Correction loops at scale are a tax on implicit knowledge. The question isn't whether the agent is good enough. It's how much of your codebase's constraints live only in your head.

What moved the needle

Explicit constraints in the prompt and specific file references as templates both help — but they're incremental, the real shift was structural.

I wrote a 321-line plan document for one complex migration. Explicit phases, file-by-file scope, clear "do not" boundaries. The agent ran 74 messages. Zero corrections.

The sessions where I spent the most time before the agent ran were the ones where I spent the least time during. That's the counterintuitive trade: planning isn't overhead, it's constraint encoding. And constraint encoding is the entire game.

The prompt that got zero corrections

These prompts produced zero corrections. It's not because they were long, but the agent had nothing to interpret.

"Update the dependency of react to the latest 18. available and check that code is compatible with react 18"* — 39 messages, 4 turns, zero corrections

"ForgotPassword back button should not cancel the flow but make it possible to go back at the previous step"_ — 50 messages, 4 turns, zero corrections

Compare that to a prompt that generated 38 corrections across 11 turns:

"Can you check app resources and optimize them if needed and also add best optimizations techniques to load faster in the browsers"

The difference comes not from the length, but from specificity of the scope. The first group pins the agent to a specific file and a specific behavior. The second group says "improve things" and hopes for the best.

The vague prompts are invitations to touch everything with a warm agent's "yes" to always accept it.

The agent doesn't have taste. It doesn't know your codebase has opinions. It doesn't feel the pain of a noisy diff.

After 1,703 "don't" signals later my lesson is simple: every implicit constraint you don't encode up front, you'll encode as a correction after the fact. Break your changes to a smaller scopes or build a plan for a bigger ones before, this will save you the time and the tokens (money).

The $1.8B Telehealth Company Built With AI: What the Playbook Actually Looks Like

Eugene Zharkov — Sat, 04 Apr 2026 17:41:37 +0000

The New York Times just profiled Medvi, a telehealth startup selling compounded GLP-1 weight-loss drugs that reportedly hit $401 million in revenue in its first year with two full-time employees. The founder, Matthew Gallagher, says he used ChatGPT, Claude, Grok, Midjourney, Runway, and ElevenLabs to build the website, generate ad creative, write the platform code, and run customer service. The Times says it reviewed the company's financials. The framing: AI made a billion-dollar company possible with $20,000 and a laptop.

That framing is incomplete in ways that matter if you are trying to build something real in a regulated industry. The AI stack is the visible layer. Underneath it sits a specific business architecture, infrastructure partners, and a regulatory surface area that determines whether the company survives the next twelve months.

What AI Reportedly Did

According to Gallagher, he used LLMs to write production code, build custom agents that connect internal systems, and generate the entire front-end experience. He says he deployed AI image and video generators for ad creative at scale, an AI chatbot for customer service, and ElevenLabs voice tools for communication. He told the Times he even cloned his own voice to handle personal scheduling.

We have no independent verification of the AI stack beyond Gallagher's account and the Times profile. What we can observe is the output: Medvi reportedly scaled to 250,000 customers by year end. If the founder's account is even partially accurate, the operational leverage is significant.

But AI did not prescribe a single medication, sign a single BAA, process a single pharmacy order, or handle a single compliance filing, and that gap is where the real story lives.

The MSO Architecture Underneath

Medvi is structured as a Management Services Organization. It owns the customer relationship: branding, website, checkout, paid media, support. Two infrastructure platforms, CareValidate and OpenLoop Health, provide the licensed physicians, prescription processing, pharmacy fulfillment, shipping logistics, and regulatory compliance.

This is the layer that makes the $20,000 bootstrap story possible. Someone else spent years and millions building the regulated infrastructure. State-by-state physician licensing is one of the hardest operational problems in telehealth; a doctor licensed in Florida cannot prescribe to a patient in Texas. OpenLoop solved that before Gallagher wrote his first prompt.

The payment processing layer is another hidden dependency. Telehealth combined with prescription drugs puts you in a high-risk merchant category. You cannot walk into a bank, open a merchant account, and start accepting payments. Processors treat pharmaceutical telehealth the same way they treat gambling or adult content: elevated chargeback risk, regulatory exposure, reputational liability. Most standard processors will not touch it. The ones that will charge 6 to 9 percent per transaction, compared to the 2.9 percent a normal SaaS company pays through Stripe. On $400 million in revenue, that spread is $24 to $36 million in processing fees. OpenLoop's existing Stripe relationship is a structural advantage that takes years to build independently.

LegitScript certification, required to run pharmaceutical ads on Google and Meta, typically takes 6 to 12 months. That timeline alone makes the "two months to launch" narrative feel like it is missing context about what was already in place before AI entered the picture.

Where AI Created Real Problems

The speed that AI enables also outpaces the compliance systems designed for slower-moving companies.

According to Gallagher's own account, the AI customer service chatbot fabricated drug prices that he then had to honor. It hallucinated product lines, telling customers Medvi sold hair-loss drugs before any such offering existed. These are familiar failure modes for anyone deploying LLMs in customer-facing roles, but in a regulated health context, a chatbot inventing drug prices is not a UX bug. It is a compliance event.

On the advertising side, Medvi's Meta ad library showed over 5,000 active ads, many running under fabricated physician personas. Names like "Professor Albust Dongledore" and "Dr. Tuckr Carlzyn MD" appeared on Facebook pages categorized as entertainment websites, running AI-generated video testimonials for prescription medications.

Multiple sources indicate these were not Medvi's own ads but affiliates running on its commission program without adequate vetting. Venture investor Sheel Mohnot flagged this on X, and others pointed out the distinction. It matters legally, but the pattern is the same: AI makes it trivially easy to generate convincing doctor personas at scale. Hims runs a comparable affiliate program but appears to vet more rigorously. When your tools can generate 800 fake doctor profiles in an afternoon, your compliance process needs to match that speed.

A separate and more serious issue surfaced in early 2026. A fintech founder who signed up for Medvi published a detailed writeup on Medium documenting that patient intake records were accessible via sequential URLs with no authentication. Changing a single digit in the URL exposed another patient's full record: name, email, phone, weight, medication order. This is a textbook Insecure Direct Object Reference vulnerability, one of the most documented security flaw classes, sitting in production across 250,000 patient records containing Protected Health Information.

The vulnerability was fixed within 90 minutes of being reported, which is actually a fast response. But under HIPAA's Breach Notification Rule, a breach affecting 500 or more individuals requires notification to patients, HHS, and media within 60 days. As of the disclosure publication, no such notification had been filed.

The FDA also issued a warning letter to Medvi in February 2026 for misbranding, specifically for implying FDA approval that compounded products do not have. The Times profile, published six weeks later, did not mention it. Drug Discovery & Development and Forbes both covered the regulatory context the Times omitted.

A separate class action complaint filed in federal court names OpenLoop and compounding pharmacy Triad Rx, with MEDVi cited as one of at least a dozen telehealth storefronts on the same backend. The complaint alleges the compounded oral tirzepatide sold through that network has no demonstrated mechanism of absorption, bringing RICO and consumer protection claims.

Where AI Can Actually Help in This Space

Strip away the headline and there are specific areas where AI tools deliver real value in telehealth.

Patient-facing portals for tracking medications, schedules, and results are a clear fit. AI builds these faster than traditional development. But you need HIPAA-compliant infrastructure with signed BAAs, and that means paying a premium. The leading options are AWS (signs BAAs across its stack), Microsoft Azure (HITRUST-certified), and Atlantic.Net (purpose-built for HIPAA). All require proper configuration to achieve compliance, not just signup.

AI-powered intake and triage can reduce the load on licensed providers, but the output still needs a human with a license to sign off. AI customer service works when constrained to a narrow domain with hard guardrails. Gallagher told the Times he learned this when his chatbot started offering lasagna recipes to patients.

Ad creative generation is where the speed advantage is most obvious and the compliance risk is most acute. Generating video testimonials and doctor personas takes minutes; verifying that the ad complies with FDA, FTC, and state pharmacy board regulations takes longer. That gap between creation speed and compliance review is where most of the problems in this story originated.

The Replicable Playbook

The real lesson from Medvi is not that AI replaced a company. It is that AI collapsed the non-regulated layers of a regulated business to near-zero headcount. The playbook: find a domain where infrastructure partners handle the licensed and capital-intensive operations; use AI to build everything on top.

That playbook is powerful, and it is exposed. Brian Blum, a founder who ran a competing GLP-1 business, described the market as "surprisingly easy to setup but ruthlessly competitive because there is virtually no differentiation." His company hit $4 million per month in revenue. He confirmed the revenue numbers are real and that competitors knew Medvi was doing $300M+; he also confirmed the structural weakness: every brand sells the same product through the same infrastructure, making it a pure marketing arms race. Medvi won that race through AI-generated ad creative, partnership ads, whitelisting, and affiliate advertorials.

Based on everything publicly known, Medvi holds no proprietary technology, no exclusive supplier relationships, and no physician network. The moat is execution speed and brand equity built during the window when compounded GLP-1s remain legally available.

The FDA declared the semaglutide shortage resolved in early 2025, narrowing the legal basis for compounded alternatives. Novo Nordisk's direct-to-consumer Wegovy subscription further compresses margins.

For builders looking at this space: AI is the accelerant, not the foundation. The foundation is the infrastructure partner, the compliance stack, and the regulatory window. Get those wrong and no amount of AI tooling saves you.