Forem: Keith MacKay

Situational Leadership for AI: More Like a Capable Colleague than a Fancy Formula

Keith MacKay — Mon, 11 May 2026 05:55:43 +0000

Situational Leadership for AI: More Like a Capable Colleague than a Fancy Formula

Treating LLMs like vending machines guarantees mediocre results—try situational leadership for better results

You wouldn’t hand a new hire a laptop, point vaguely at the codebase, and say “make it better.” You wouldn’t expect a brilliant data scientist to magically produce great marketing copy. You wouldn’t give the same instructions to a seasoned architect and a junior developer and expect identical results.

And yet—this is exactly how many (most?) organizations interact with large language models.

The executives who’ve spent decades learning to match tasks to talent, provide appropriate context, and adapt their leadership style to individual strengths suddenly forget everything when the “employee” runs on GPUs instead of caffeine. They pump generic prompts into their LLMs like quarters into a vending machine, then complain when the results disappoint.

Here’s the uncomfortable truth: the skills that make you effective at managing capable knowledge workers are precisely the skills that make you effective at getting value from LLMs. And if you’re bad at one, you’re probably bad at the other.

The Situational Leadership Connection

In the 1970s, Paul Hersey and Ken Blanchard developed Situational Leadership Theory—the radical notion that effective leaders (or parents) adapt their style to the person and task at hand. A directive approach works for someone new to a task. A supportive approach works for someone who knows what to do but lacks confidence. Delegation works for someone competent and committed.

The model seems obvious in retrospect. Yet organizations still promote technical experts into management roles and watch them fail by treating every report identically—usually by treating everyone the way they themselves would want to be treated (we all see the world through our own lens of needs and desires and strengths and past experiences—the best managers come to understand the lens through which their reports are perceiving the world).

Identical treatment for all gets shoddy results with teams. It also gives shoddy results with AI.

Different LLMs have different capabilities, different training, different strengths, different weaknesses. Claude excels at nuanced reasoning and following complex instructions. GPT-4 has particular strengths in some business settings. Grok has some of the best information on trending cultural topics. Smaller models trade capability for speed. Specialized models dominate narrow domains. Treating them interchangeably—or worse, treating them like omniscient oracles—is the management equivalent of giving identical assignments to your entire team regardless of skill set.

The situational leader asks: What does this specific person need from me to succeed at this specific task?

The effective AI user asks: What does this specific model need from me to succeed at this specific task?

The parallel is not metaphorical. It’s operational.

Context: the Briefing You Forgot to Give

Picture a new employee’s first day. A good manager doesn’t just assign tasks—they provide context. Here’s what we’re building. Here’s why it matters. Here’s how your work connects to the larger mission. Here’s who does what and who can help with specific things as needed. Here are the constraints we’re operating under. Here are the decisions that have already been made and why. Who. What. When. Where. Why. How.

Bad managers skip this. They assume context is obvious. They issue instructions and get frustrated when results miss the mark. “I asked for X, why did you give me Y?” Often it’s because they never explained what X meant in the organization, with its constraints, for its customers.

LLMs suffer the same failure mode, magnified.

When you prompt a model with “write me a marketing email,” you’ve given it essentially no context. What product? What audience? What tone? What’s the goal—awareness, conversion, retention? What’s worked before? What constraints exist? The model will generate something plausible and generic, because generic was all you gave it to work with.

Good managers are explicit about context. Effective AI users are explicit about context. The quality of the output is directly proportional to the quality of the input—not the quantity of words, but the relevance of the information provided.

This extends beyond single prompts. LLMs have session context—what’s been discussed, what’s been decided, what’s been tried. Just as a good manager maintains continuity across conversations with their team (”remember, we decided last week that...”), effective AI users maintain continuity across interactions.

The model that helped you architect a system yesterday can help you implement it today—if you’ve preserved the context of those decisions. Wipe the slate clean and start over, and you’re back to onboarding a new hire who knows nothing about your project.

Note that the level of granularity (what level of detail/specificity do you provide?) also matters. Provide too much, and you get what you ask for, with less opportunity for the AI to surprise you with some helpful additions. Provide too little, and the AI may drift away from your needs, creating something tangential but not quite right.

Task-Model Fit Is Just Role Clarity by Another Name

Smart managers don’t assign accounting work to engineers or graphical work to people who think in spreadsheets. They match tasks to talents, projects to strengths, challenges to capabilities.

This may strike you as more-basic-than-needed management advice. It’s also more rare than one might think for AI usage.

Organizations often standardize on a single model for all use cases—the enterprise equivalent of hiring one type of person for every role. They use the same massive model to answer simple questions that a smaller model could handle faster and cheaper. They ask generalist models to perform specialist tasks. They wonder why results vary wildly.

Different models for different jobs:

- Quick, factual queries don’t need your most powerful (and expensive) model. A faster, smaller model often performs equivalently for routine tasks—just as you don’t need your most senior engineer to reset a password.

- Complex reasoning benefits from models trained to think step-by-step. Asking a model optimized for speed to navigate nuanced trade-offs is like asking your fastest coder to lead a design review. You’ll see this implemented in many models as a “thinking mode” of one sort or another.

- Domain-specific work often benefits from fine-tuned or specialized models. General practitioners are valuable; specialists exist for a reason. If you don’t have specialized models, ask your generalized model to adopt a persona that would be adept in the given domain, and you’ll find your tokens better spent -- the predictive model will provide answers weighted toward the desired domain, providing inherently better answers. Tune both the persona and the questions to the specific domain.

- Creative tasks may warrant different models than analytical tasks. The person who writes brilliant copy isn’t necessarily the person who debugs distributed systems.

The effective AI user develops model literacy the way effective managers develop people literacy. They learn what each model does well, where each struggles, and how to route work accordingly.

Strengths and Weaknesses Aren’t Bugs—They’re Features

Every employee has strengths to leverage and weaknesses to manage around. The brilliant architect who can’t stand presenting. The charismatic salesperson who crumbles under detail work. The reliable executor who struggles with ambiguity.

Good managers realize that people are not uniformly excellent. They design teams and workflows that amplify strengths and compensate for weaknesses.

LLMs are similar, but many organizations keep pretending they aren’t.

Current-generation models tend to struggle with:

Precise numerical reasoning (your accountant should check the math)
Real-time information (they’re trained on historical data)
Guaranteed factual accuracy (they can hallucinate convincingly)
Following extremely long context with equal attention throughout
Knowing what they don’t know

Current-generation models tend to excel at:

Pattern recognition across large bodies of text
Generating plausible first drafts quickly
Explaining complex concepts at various levels
Synthesizing information from multiple sources
Brainstorming and ideation
Reformatting and restructuring content

The effective approach isn’t to wish away weaknesses—it’s to design workflows that play to strengths. Use the model for the first draft; have a human verify the facts. Use the model for synthesis; have a domain expert validate the conclusions. Use the model for ideation; have the team evaluate feasibility.

This is management, not magic.

The Feedback Loop That Everyone Ignores

Managing knowledge workers isn’t a one-way broadcast. It’s a dialogue. You give direction. They execute. You provide feedback. They adjust. You refine requirements and improve processes. They deliver again, and may provide feedback of their own. Good managers create tight feedback loops that quickly converge on the right outcome.

Most AI users treat prompts as one-shot commands, then complain when the first response isn’t perfect.

Would you judge an employee’s capability based on their first draft of their first project on their first day? Then why judge a model based on a single cold-start response?

Effective AI interaction is iterative. The first response is diagnostic—what did the model understand, what did it miss, where did it go wrong? The second prompt corrects course. The third refines. By the fourth or fifth exchange, you’ve co-created something neither you nor the model would have produced alone.

This is how you work with smart humans. Also how you work with smart machines. Note that this is a reason that some recommend less detail up front for some tasks — the AI will generate more concepts for consideration that can expand your idea when it is allowed more wiggle-room. Different model strengths will affect this — ChatGPT has gotten better and better at providing exactly what you ask for, while Claude has gotten better and better at expanding on your inputs with additional ideas. Knowing this might cause you to choose one or the other for a specific task, or shape how you write your prompt.

The executive who says “I tried ChatGPT once and it gave me garbage” is the executive who says “I told my team what I wanted once and they didn’t deliver exactly what I envisioned.” Both statements reveal more about the executive than about the team or tool.

The Uncomfortable Implication

Here’s where this gets interesting: if AI effectiveness requires the same skills as people management, then organizations’ AI results will correlate with their management culture.

Companies are expecting young non-managers to become expert AI users without helping them understand how to manage. I believe this is the primary reason for the statistics showing much more productive use of AI by older users than by younger users...the inverse of historical technology usage patterns. Companies that are trimming hiring of entry-level folks to do more of the work with AI may have a hard time if they don’t create a path to management, with management training, for their smaller classes of entry-level employees. They will also need to build retention mechanisms to keep a higher percentage of entry-level folks longer...or they’ll wind up with hollowed-out organizations.

Companies with clear communication norms and management training will get clearer AI outputs. Organizations that provide context will get contextually appropriate responses. Teams practiced in iterative refinement will iterate effectively with AI. Leaders who match tasks to capabilities will match tasks to models.

And organizations with terrible management practices? They’ll get terrible AI results and blame the technology.

AI is a mirror. It reflects the quality of the instructions it receives, the clarity of the context provided, the thoughtfulness of the task assignment. If your AI outputs are mediocre, the first place to look is the inputs.

This isn’t to say AI tools are perfect—they aren’t. Models have real limitations, and those limitations matter. But the gap between mediocre and excellent AI usage often has more to do with the human side than the machine side.

What This Means for Leaders

If you’re a new manager struggling to get value from AI tools, you might actually be struggling with management fundamentals that AI is simply making visible. The fix isn’t better prompting tips—it’s developing the skills you’d need to manage capable knowledge workers effectively. If you’re using the AI to write code, a great management model to think about is that of the product owner. It’s a very special set of skills, as some might say. And NOT AT ALL the set of skills you have trained in as a traditional software engineer.

If you’re a senior leader wondering why AI investments aren’t paying off, look at your organization’s management culture. Do your managers provide context effectively? Operate at the right level of granularity? Match tasks to capabilities? Create feedback loops? Adapt their approach to different situations and individuals?

If not, AI tools will underperform—because they require the same inputs that humans require, just in different formats.

The organizations that will excel with AI are the ones that already excel at knowledge work management. They understand that capable resources—human or artificial—need direction, not just commands. Context, not just tasks. Iteration, not just deadlines.

The rest will keep pumping formulaic prompts into their systems and wondering why the results disappoint.

Software Moats in the Age of AI: What's Actually Defensible?

Keith MacKay — Mon, 11 May 2026 05:25:37 +0000

Software Moats in the Age of AI: What's Actually Defensible?

Why the "AI writes code now" narrative misses the point—and where competitive advantages stubbornly persist

Another pitch deck lands on your desk. Another startup promises to "displace custom software development." Another analyst proclaims coding is now a commodity. Another LinkedIn thought leader announces developers are obsolete.

You've heard variations for eighteen months. And yet—enterprise software spending hasn't collapsed. Professional services firms keep hiring. COBOL programmers are getting paid (handsomely!).

Where's the disconnect?

The Moat Everyone Forgot

Traditional software moats centered on code: proprietary algorithms, accumulated functionality, engineering discipline, the sheer grind required to replicate something. Two years and $20 million to build something equivalent? That was your moat.

AI compresses that timeline—for greenfield development, building from scratch. But most enterprise software is brownfield: decades of accumulated code, undocumented business logic, integration points understood by the one developer that was deeply involved...or a black box written by retired engineers that nobody fully understands. AI hits some hard limits in brownfield. That's our focus here.

The Context Problem (Temporary)

AI assistants operate within fixed "context windows"—working memory of roughly 200,000 to 1 million tokens (a token is generally equivalent to about 3/4 of a word). This sounds generous until you realize enterprise systems span millions of lines across thousands of files. Context management is a huge issue at the moment -- EVERYTHING your agent needs as context for the current task (files, prompts, tool instructions, etc.) must be in its context window AT THE SAME TIME...otherwise, the agent doesn't know it exists. And even worse, elements at the beginning and end of the context window are given greater importance (like the recency and primacy effect in human cognition), so the muddled middle stuff can get de-emphasized. The longer a session runs, the worse performance can become because things get thrown away to make room or otherwise lost...a condition sometimes called "context rot". This can REALLY adversely affect performance.

Now, ask an AI to modify a billing calculation touching seven services, three databases, and two external APIs. You'll get confident suggestions that cheerfully ignore most of the complexity that isn't explicitly called out. The AI doesn't know what it doesn't know—a trait it shares with us all, but it can execute a bad solution faster.

This context challenge is a problem today, but solutions are coming. Recursive Language Models (RLMs) can now decompose arbitrarily large codebases, analyze pieces, and synthesize understanding across the whole. Within 18-36 months, "codebase too large" stops being a moat. (Note that RLM may sound like an object, but in reality it's a strategy, and one that can be used with any LLM)

Now, even if the size constraint goes away, understanding what code does differs from understanding why it exists. The political archaeology of enterprise systems—why this exception, why that workaround—remains opaque to any model trained on code alone. Nobody documented why the foundation was intentionally built with one crooked wall in 1987.

Implication: Codebase scale buys you 2-3 years. Institutional knowledge may buy you longer.

The Languages AI Hasn't Learned (Yet)

Large language models learned from public code. They excel at solving problems in Python and JavaScript. They may just hallucinate confidently in COBOL.

COBOL still processes 95% of ATM transactions and 80% of in-person financial transactions globally. RPG hums along on IBM midrange systems. ABAP powers SAP implementations. These languages lack the public training data AI needs—and ironically, they run the systems with the largest budgets.

Ask an AI to modify your COBOL billing module. It generates something plausible, but may be introducing subtle bugs you'll discover in production six months later.

Implication: Legacy languages create unexpected insulation. Constrained talent pools cut both ways.

Domain Knowledge Doesn't Scale

Software that works isn't software that sells. Software that works correctly for a specific domain sells.

Healthcare billing involves thousands of payer-specific rules changing quarterly. Energy trading handles physical delivery constraints across jurisdictions. Insurance policy administration encodes actuarial logic accumulated over decades. AI generates code that processes data—not code embodying expertise it never learned.

Implication: Vertical software companies with genuine domain expertise have stronger moats post-AI. The easy parts got easier. Encoding specialized knowledge stayed exactly as hard.

Battle Scars Have Value (and Institutional Knowledge Isn't In the Docs)

Enterprise systems connect to ERPs, CRMs, data warehouses, payment processors, and countless internal tools. Each integration point represents hard-won understanding of how systems actually behave—as opposed to how documentation claims they behave.

AI writes integration code. It cannot anticipate that upstream systems send malformed JSON on Tuesdays, that authentication tokens expire differently in production, or that the "deprecated" field is actually required for three specific customer configurations. (Every developer reading this just nodded ruefully.)

This knowledge lives in tribal memory and incident reports. Nobody writes it down because nobody realizes it's unusual until something breaks.

Implication: Deep integrations create sticky moats. Competitors must relearn every lesson the hard way.

Relationships Beat Tech Chops

CIOs choose vendors they trust to exist in five years who will navigate regulatory inquiries alongside them. A fintech startup with AI-generated code competes against thirty years of relationship and proven reliability. Good luck with that pitch.

AI-generated code also creates compliance questions nobody has answered. Who's liable when it processes protected health information incorrectly? When AI-built credit decisioning introduces unintended discrimination? Until liability frameworks mature, AI faces adoption friction in precisely the industries with the largest software budgets.

Implication: Relationship-intensive businesses in financial services, healthcare, and government hold moats invisible to technical assessments.

The Maintenance Cliff

Every line of code creates maintenance obligations. AI-generated code optimizes for working now, not being understood later—producing solutions humans wouldn't choose, using patterns inconsistently, generating technical debt at accelerated rates.

Organizations rapidly building AI-assisted systems simultaneously build maintenance liabilities. Developers five years from now will struggle to understand logic no human designed. Black boxes that work but nobody understands may come to dominate codebases. How good must AI get before you trust it to fix its own mysteries? Sometimes it's already there. Sometimes we'll always need humans.

Implication: Discount AI productivity gains by future maintenance costs. Ignoring accelerated technical debt means overestimating returns.

The Pragmatic Bottom Line

AI transforms software development. The competitive landscape for simple ground-up greenfield applications has shifted fundamentally. But enterprise software operates differently. Complexity, integration, domain expertise, relationships, and regulatory dynamics create moats AI hasn't breached.

These moats aren't permanent. RLMs will erode context advantages. Training data will expand. Regulatory frameworks will mature and become more machine-compatible. But defensible positions today require understanding something deeply, integrating thoroughly, and maintaining relationships that transcend technical capability.

Those advantages outlast the headlines—and organizations that distinguish temporary moats from durable ones will allocate capital more effectively than those chasing the narrative.

The question isn't "can AI build this?" It's "can AI build it correctly, integrate it properly, and support it reliably?" The answer determines where you put your money.

The Irony of AI Development: How Context Engineering Is Taking Us Back to Waterfall

Keith MacKay — Sun, 10 May 2026 22:26:50 +0000

The Irony of AI Development: How Context Engineering Is Taking Us Back to Waterfall

And Why That's Not Necessarily a Bad Thing

For three decades, the software industry has been on a journey away from waterfall development toward agile methodologies. Now, in an unexpected twist, the rise of AI-powered development tools and "context engineering" is quietly pushing us back toward sequential, specification-heavy workflows.

But this time, we're walking into a trap we've seen before—the waterbed problem. You must tackle this strategically and head-on in order to recognize AI efficiencies—otherwise AI acceleration will create more chaos than efficiency.

A Brief History: From Waterfall to Agile

The Waterfall Era (1970s-1990s)

Waterfall development emerged from manufacturing and engineering disciplines. The model was simple: define requirements completely, design the system, build it, test it, deploy it. Each phase flowed into the next like water over a cascade.

The approach made sense for its time. Computing was expensive. Mistakes were costly. The assumption was that thorough upfront planning would prevent downstream problems.

It didn't work out that way. Projects routinely ran over budget and behind schedule. By the time software shipped, requirements had changed. The market had moved. A running joke about large enterprise systems was that they were a perfect fit for the company...as of 18 months ago!

The Agile Revolution (2001-2020s)

The Agile Manifesto was a direct response to waterfall's failures. Its core insight: in complex, uncertain environments, you can't plan your way to success. You must iterate, learn, and adapt.

Agile shortened feedback loops. Instead of 18-month cycles, teams delivered working software in weeks. Requirements became conversations rather than contracts. Testing happened continuously, not just at the end.

The results spoke for themselves. Agile teams shipped faster, responded to change better, and produced software that more closely matched what users actually needed.

Note that there were exceptions where waterfall still made sense, like embedded software that needed to be tested against evolving hardware, or highly regulated industries.

For the most part, however, for two decades the industry consensus has been clear: agile beats waterfall. Iterate fast. Embrace uncertainty. Deliver incrementally.

Enter Context Engineering: The Return of the Specification

Now something interesting is happening.

The most effective AI-assisted development doesn't look like agile at all. It looks remarkably like waterfall.

When developers work with large language models like Claude or GPT-4, they quickly discover a pattern: the quality of the output is directly proportional to the quality of the input. Vague prompts produce vague code. Detailed specifications produce useful implementations.

This has given rise to "context engineering"—the practice of carefully crafting the information, constraints, and examples you provide to AI systems. Context engineering is essentially specification writing for machines.

The parallels to waterfall are striking:

Upfront investment in specification: Before touching code, developers spend significant time writing detailed requirements, examples, and constraints
Sequential phases: Define the context, generate the code, review the output, refine the specification
Heavy documentation: The context window has become the new requirements document

The irony is profound. After decades of moving away from heavy upfront specification, we're returning to it—not because humans need it, but because AI does.

The Waterbed Problem Returns

Here's where things get dangerous.

In engineering, the "waterbed problem" describes a phenomenon where compressing one part of a system creates pressure elsewhere. Push down on a waterbed here, it bulges up over there. You can't eliminate the complexity; you can only move it around.

AI development tools are creating exactly this dynamic.

The Math Is Merciless

Consider the numbers that are now being thrown around:

AI can generate code 10x to 100x faster than manual development
A single developer can now produce the output of a small team
Features that took weeks now take hours

This sounds like pure upside. It isn't.

If development speed increases 100x, what happens to testing? Does your QA capacity magically scale by 100x? What about code review? Security audits? Documentation? Integration testing?

The answer, of course, is that you've simply moved the bottleneck.

Where the Pressure Goes

When you compress development time through AI, the pressure shows up in predictable places:

Testing: AI-generated code requires testing—often more testing than human-written code, because AI systems can produce subtle bugs that humans wouldn't make
Review: Someone still needs to verify that the code does what it should, follows security best practices, integrates properly with existing systems, and provides a clear, useful user experience for its users
Architecture: Faster code generation means architectural decisions come faster, with less time for deliberation
Requirements: If you can implement anything quickly, choosing what to implement becomes the constraint
Operations: More code shipping faster means more deployments, more incidents, more maintenance
User Absorption: Users need to be able to keep up with how to use their software, what features are available, and so forth

Organizations that accelerate development without accelerating everything else are merely building technical debt at an unprecedented rate. They're pushing on the waterbed.

The Whole-Lifecycle Imperative

The lesson is clear: AI tools cannot be applied effectively in isolation. They must be applied across the entire development lifecycle.

This Is Not Optional

If you're using AI to accelerate coding but relying on manual testing, you're setting yourself up for quality disasters. If you're generating code faster but reviewing it at the same pace, defects will slip through. If you're shipping features rapidly but operating infrastructure manually, you'll drown in incidents.

The math doesn't work any other way.

What Whole-Lifecycle AI Looks Like

Organizations that successfully navigate this transition are applying AI comprehensively:

AI-assisted specification: Using AI to help write, validate, and refine requirements
AI-accelerated development: Code generation, completion, and transformation
AI-powered testing: Automated test generation, coverage analysis, and regression detection
AI-enhanced review: Automated code review, security scanning, and compliance checking
AI-driven operations: Incident detection, root cause analysis, and automated remediation
AI-supported architecture: Design review, pattern matching, and technical debt detection

The key insight: the acceleration ratio must be roughly consistent across all phases. If development gets 100x faster, testing needs to get close to 100x faster. Otherwise, testing becomes the bottleneck.

Your overall throughput is gated by your slowest phase.

Strategic Implications for Leaders

1. Don't Chase Point Solutions

The temptation is to start with the most visible opportunity—usually code generation—and optimize later. This is a mistake. Point solutions create imbalances. Imbalances create failures.

We have seen organizations begin learning how to use AI by implementing it in specific parts of the organization:

documentation (AI greatly reduces the key-person problem)
test creation (going from 0 automated tests to comprehensive automated testing, including integration and end-to-end tests, is low-risk, fast, and hugely valuable)
code review guidance (helping senior engineers more quickly zero in on the biggest challenges and learning opportunities from junior engineers to best use their valuable time)
tech debt evaluation (reviewing the code base, looking for future challenges)

These strategies each increase quality and provide longer-term value, but they don't radically affect the speed of the software lifecycle, and, once optimized, they don't provide the same ongoing value. These are great mechanisms to leap to a higher level of maturity, but different solutions are required to maintain this new posture going forward.

A different long-term approach is to start with a comprehensive view of your development lifecycle. Identify every phase where work happens. Map the current throughput of each. Then invest in AI capabilities for each phase.

2. Measure Throughput, Not Activity

It's easy to celebrate when developers report 10x productivity improvements. But developer productivity is not organizational throughput. If testing becomes the bottleneck, you haven't improved throughput—you've just moved work in progress from one queue to another.

Measure end-to-end cycle time. Measure defect rates. Measure incidents. These metrics tell you whether you're actually moving faster or just generating more chaos.

3. Rethink Team Structure

Traditional team structures assumed human-speed development. Ratios of developers to QA engineers, code reviewers to developers, ops engineers to services—all of these were calibrated to pre-AI velocities.

Those ratios no longer hold. Organizations need to fundamentally reconsider how work is distributed across roles when development velocity changes by an order of magnitude.

4. Embrace the New Waterfall—Thoughtfully

Context engineering and specification-heavy development aren't bad. They represent the right way to work with current AI capabilities. The key is to bring the benefits of agile thinking—fast feedback, iteration, continuous integration—to this new paradigm.

Write specifications, but test them quickly. Generate code, but review it immediately. Ship features, but instrument them comprehensively. The phases may be more sequential than agile purists would like, but the cycles can still be fast.

And one of the fundamental pillars of agile development -- frequent communication -- still adds tremendous value in context engineering. This communication is both agent-to-agent and human-agent communication via status and spec files and prompts. Frequent human-in-the-loop review is still required at every phase to make sure that systems are behaving as expected, but AI can be used to make sure these reviews are as streamlined and efficient as possible. "Trust but verify" is good policy.

The Path Forward

We're at an inflection point. AI tools offer genuine productivity improvements, but they also create genuine risks. The organizations that succeed will be those that:

Recognize that AI acceleration must be applied holistically
Invest proportionally across the entire development lifecycle
Measure system throughput rather than local optimization
Adapt their organizational structures to new velocity assumptions
Embrace specification-heavy approaches without abandoning fast feedback

The waterbed problem isn't new. Neither is the tendency to optimize locally while ignoring systemic effects. But the stakes are higher now. AI acceleration is too powerful to apply carelessly.

The choice isn't whether to adopt AI development tools. That's already inevitable. The choice is whether to adopt them strategically—across the whole lifecycle, in proper proportion, with clear-eyed understanding of the tradeoffs.

Push on the waterbed intelligently, or watch it bulge in unexpected and costly places.

The AI Bullwhip: What The Beer Game Teaches Us About Uneven AI Adoption

Keith MacKay — Sun, 10 May 2026 20:20:19 +0000

The AI Bullwhip: What The Beer Game Teaches Us About Uneven AI Adoption

Why introducing AI to one team might break others—and how to avoid the chaos

Several decades ago, I was involved in building a digital version of The Beer Game for HBS, and from its first run the lessons became viscerally clear.

What is the Beer Game? In 1960, MIT professor Jay Forrester created a deceptively simple simulation that would raise blood pressure for business school students for generations. Four players across a supply chain. Some poker chips representing beer. What could go wrong?

Everything, it turns out. And sixty-five years later, organizations rushing to adopt AI are relearning the same painful lessons—with considerably higher stakes than simulated beer.

The Beer Game: A Five-Minute Primer

If you've never played The Beer Game, here's the setup: four players represent different stages of a beer supply chain—a retailer, a wholesaler, a distributor, and a factory. Each week, customers buy beer from the retailer. Each player can only see their own inventory and incoming orders, not what's happening elsewhere in the chain. There's a time delay between placing orders and receiving shipments.

The goal seems simple: meet customer demand while minimizing costs from excess inventory or stockouts.

The result is reliably catastrophic.

Here's what happens: customer demand increases slightly—say, from four cases per week to eight. The retailer notices shelves emptying and orders more from the wholesaler. But shipments take time, so the shelves keep emptying. Panicking, the retailer orders even more. The wholesaler, now seeing a surge in orders, assumes demand is exploding and orders aggressively from the distributor. The distributor does the same to the factory. The factory ramps up production dramatically.

Then the delayed shipments start arriving. Everywhere. All at once.

Suddenly everyone is drowning in beer. The retailer stops ordering. The wholesaler, still receiving massive shipments, stops ordering. The distributor is buried. The factory has just finished a production run for demand that evaporated weeks ago. And beer begins to go stale in storage (which, to my collegiate colleagues, was a particularly egregious outcome).

This is the bullwhip effect: small fluctuations at the customer end create massive, destructive oscillations upstream. A 10% increase in consumer demand can translate to 40% swings at the factory. Careers are ruined. Simulated beer is wasted. Business school students stare at their inventory sheets in disbelief.

The culprit isn't stupidity. Every player makes locally rational decisions. The problem is systemic: limited visibility, time delays, and independent decision-making combine to amplify rather than dampen disruptions.

Now Replace "Beer" with "AI Productivity"

Organizations introducing AI tools are playing their own version of The Beer Game—and most don't realize it.

Consider a typical scenario: a development team adopts AI coding assistants. Productivity jumps. Code flows faster. Features that took weeks now take days. The team lead reports the wins. Leadership notices.

But no one downstream adjusted.

The QA team still has the same headcount. The same testing processes. The same throughput. Suddenly they're facing a tsunami of code. Defect backlogs balloon. Test coverage drops as testers scramble to keep pace. Quality issues slip into production.

Meanwhile, upstream teams notice something strange: requirements that used to take the dev team three sprints now complete in one. Product managers haven't recalibrated how much work to queue up. The backlog empties unexpectedly. Roadmap meetings get chaotic. "We need more features defined!" becomes the cry—but the product team is still operating at their old cadence.

The QA/Testing team has more tests to write, more features to evaluate. Often under-sized to begin with, they are swamped. With predictable quality results.

The DevOps team, accustomed to a predictable deployment rhythm, now sees triple the deployment requests. CI/CD pipelines bottleneck. Infrastructure provisioning can't keep pace. Developers who were flying now sit waiting for environments.

Each team is making locally rational decisions. Each team is overwhelmed or starved for reasons they can't quite see. The bullwhip cracks. (If this all feel familiar, in software circles this is sometimes also referred to as "the waterbed problem", and I wrote about it last week in those terms when talking about how AI is bringing us back to waterfall development)

How Organizations Are Approaching AI Adoption

Most organizations fall into one of three patterns when introducing AI development tools:

The Piecemeal Pioneers

The most common approach: individual teams or developers adopt AI tools organically. Someone tries GitHub Copilot. A team experiments with Claude Code. Results vary. Successes spread through word of mouth. There's no coordinated rollout, no systemic adjustment.

This is The Beer Game with each player ordering independently, without coordination.

The Mandate Push

Leadership declares AI adoption a strategic priority. Tools are procured. Training is scheduled. Metrics are established. The development organization gets AI capabilities—often simultaneously.

But adjacent functions don't. QA, product, DevOps, security review, documentation—they're still operating traditionally while development adopts new strategies. The mandate created a step function in (only) one part of the value stream.

This is like one Beer Game player getting instant teleportation while everyone else still waits for truck deliveries.

The Thoughtful Rollout

Rare but effective: organizations that map their entire value stream before introducing acceleration. They ask: if development velocity triples, what breaks? Where do bottlenecks emerge? Which handoffs become flood points?

Then they stage adoption to match capacity across the chain.

This is the only approach that avoids the bullwhip—and almost nobody does it.

The Bullwhip Effects of Uneven AI Adoption

Let's map the specific oscillations that emerge when AI productivity hits an unprepared organization:

The Quality Whiplash

Upstream acceleration: Dev team ships code 3x faster with AI assistance.

Downstream bottleneck: QA capacity unchanged.

Oscillation pattern: Quality team rushes reviews → defects escape → production incidents spike → emergency slowdowns → dev team idles waiting for fixes → QA catches up → dev accelerates again → cycle repeats.

Organizations stuck in this loop often conclude "AI is causing quality problems." The AI isn't causing anything—the uneven adoption is.

The Requirements Vacuum

Upstream bottleneck: Product team defines work at traditional pace.

Downstream acceleration: Dev team consumes requirements faster than they're created.

Oscillation pattern: Backlog empties → devs pull partially-formed work → rework increases → devs slow down → backlog fills again → devs accelerate on clear requirements → backlog empties → cycle repeats.

Teams trapped here often see erratic velocity charts and blame "unclear requirements." The requirements aren't less clear—they're just not flowing fast enough.

The Deployment Gridlock

Upstream acceleration: More code, more features, more changes.

Downstream bottleneck: Same CI/CD capacity, same deployment windows, same ops team.

Oscillation pattern: Deployment queue grows → batching increases → batch sizes create risk → releases get delayed → pressure builds → risky big-bang release → incidents → release freezes → queue grows again.

This pattern often ends with someone suggesting "maybe we should slow down development"—treating the symptom rather than the system.

The Security Squeeze

Upstream acceleration: More code surface area, faster.

Downstream bottleneck: Security review capacity fixed.

Oscillation pattern: Security backlog grows → reviews become perfunctory → vulnerabilities ship → incident occurs → security becomes blocker → development halts for remediation → security catches up → development accelerates → security backlog grows.

The security team isn't being obstructionist. They're being bullwhipped.

The Compounding Problem

What makes AI adoption particularly treacherous is that these oscillations compound.

In The Beer Game, there's one supply chain with one bullwhip. In software development, there are multiple parallel flows—and they interact. A quality slowdown affects deployment timing. A deployment bottleneck affects security review scheduling. A security delay affects requirements prioritization.

Introduce AI acceleration unevenly, and you don't get one bullwhip—you get several, out of phase, amplifying each other in unpredictable ways.

The organization experiences this as chaos, politics, and blame. "The dev team is cowboying." "QA is a bottleneck." "Product can't get their act together." "DevOps is always blocking us."

Nobody sees the system. Everyone sees their adjacent node failing them.

Planning to Avoid the Whip

The good news: The Beer Game has a solution. It's called information sharing and coordinated decision-making. When all players can see the entire supply chain and coordinate their orders, the bullwhip disappears.

The same principle applies to AI adoption:

Map Before You Accelerate

Before introducing AI to any team, map your value stream end-to-end. Identify every handoff. Measure current throughput at each stage. Find existing bottlenecks (you probably have some already).

Then ask: if we 2x this stage, what happens to the stage immediately downstream? What about two stages down?

Accelerate Bottlenecks First

Counterintuitively, the best place to introduce AI might not be where you'll see the biggest individual productivity gain—it's where you'll relieve the biggest systemic constraint.

If QA is already struggling to keep pace, accelerating development is pouring water into a backed-up drain. Consider AI-assisted testing tools first. Or semi-automated code review (so senior engineers can focus on the right quality elements and teaching opportunities with less review time). Or AI-enhanced security scanning.

Match AI adoption to system topology, not team enthusiasm.

Build Slack Intentionally

The Beer Game punishes systems with no buffer capacity. When everyone operates at maximum efficiency, there's no room to absorb variation.

As you introduce AI acceleration, deliberately create slack in adjacent functions. That might mean additional headcount. It might mean reduced WIP limits. It might mean explicit buffers between stages.

Yes, slack feels inefficient. It's also what prevents oscillation from becoming catastrophe.

Make the System Visible

The Beer Game's dysfunction persists because players can't see beyond their immediate neighbors. Create visibility across your development value stream:

End-to-end cycle time dashboards
WIP at each stage, visible to all
Bottleneck indicators that surface automatically
Regular cross-functional reviews of flow

When everyone can see the whole chain, locally rational decisions become globally rational decisions.

Stage Your Rollout

If you must introduce AI capability unevenly (and you probably will—budgets and readiness vary), stage it deliberately:

Start with the current bottleneck
Wait for throughput to stabilize
Identify the new bottleneck
Introduce AI there
Repeat

This is slower than a simultaneous rollout. It's also far less likely to create destructive oscillation.

The Meta-Lesson

The Beer Game has taught a consistent lesson for sixty-five years: optimizing parts degrades wholes.

AI tools offer genuine, dramatic acceleration. They also offer the ability to create genuine, dramatic dysfunction if deployed without systemic thinking.

The organizations that will succeed with AI aren't the ones that adopt fastest. They're the ones that adopt most coherently—matching capability to capacity across their entire value stream.

Every team is connected to every other team. Accelerate one without adjusting the others, and you're not improving the system—you're just moving the bottleneck, amplifying the oscillation, and cracking the bullwhip.

The beer, it turns out, was a metaphor all along.

We're Linear Thinkers in an Exponentially-Changing World

Keith MacKay — Sun, 10 May 2026 18:00:54 +0000

We're Linear Thinkers in an Exponentially-Changing World

The more time I spend in the AI ecosystem, the more convinced I become that the pace of change isn’t just fast—it's explosive…and increasingly so. Most people still think in terms of linear change while the world is accelerating exponentially.

That mismatch is where disruption happens.

We’re linear thinkers living in an exponential world

We weren’t built to intuit compounding curves. It’s why exponential progress feels like it comes out of nowhere.

Most AI charts are logarithmic for a reason: they’re trying to compress the 10X-after-10X reality into something our brains can process. Turn those accelerations we don't process into nice straight lines.

And this is why the fast eat the slow. By the time a large company finishes planning, the curve has already bent again.

I first learned this over 20 years ago—Kurzweil rewired my thinking

At a Ray Kurzweil talk at MIT in the mid-2000s, he described the “Law of Accelerating Returns.” It permanently rewired how I think about the pace of technology change.

He wasn’t the first to notice this pattern:

Henry Adams – law of acceleration
Buckminster Fuller – ephemeralization (doing more and more with less and less)
Moore’s Law – exponential chip complexity
Hans Moravec – robotics advancing at Moore’s-law speed

Kurzweil unified and expanded these ideas to encompass all technology and evolution—and even applied them to his business strategy. He claimed that he began designing products years ahead of time, and calculating when the enabling technologies would be available. He developed a photocopier-sized text scanner for the blind in the 70s, and began designing his handheld version of the same soon afterwards. He was in production within months of when the enabling tech was finally small and performant enough to work for the product. Is it true? I don’t know – but the fact that it would be a striking example of true exponential thinking demonstrates how rarely it occurs.

Critics point to Kurzweil’s dates being off in some cases by a decade. But I would argue that in most cases, the limiting factor wasn’t technological capability—it was political will, regulation, or distribution. Exponential growth of technical capability is the rule rather than the exception. Rapid adoption of what's possible is the exception rather than the rule.

As William Gibson said: “The future is already here—it’s just not evenly distributed.”

It's never been more true. An example would be that Kurzweil's predictions for 2009 (made in 1999!) included “self-driving cars”. While not available to consumers in 2009, Google had, in fact, successfully logged over 200,000 miles with their self-driving technology by 2009, and Nevada was putting self-driving vehicle laws on the books. That future wasn't yet evenly distributed.

Today’s AI acceleration makes those early curves look quaint

Consider just a few signals as outlined in the Stanford AI Index Report 2025[1]:

Hardware costs have been dropping 30% per year
Energy efficiency has been improving 40% per year

And then on the software and training sides of the equation:

Google’s Gemini 3.1 showed that models can still gain intelligence through smarter training—not just more parameters as the recent trend had been (combine smarter training with more parameters, and there's no reason for the capability curves to flatten out yet)
Breakthroughs (like Mixture-of-Experts, 1-bit quantization, the tabular foundation model, etc.) emerge constantly

And finally, practitioners are learning new ways to get the most from the tools, and increasing efficiency while reducing costs:

I code with AI completely differently than I did a few months ago.
And I expect the same a few months from now.

Every layer of the ecosystem is growing. And doing so faster.

10X improvement per year changes everything

Most of my clients invest with a 3–7 year horizon. If AI capability continues at the ~10X per year that we’ve seen since at least GPT2:

3 years → 1,000X more capable
7 years → 10,000,000X more capable

Even “only” 5 more years of this curve gives you 100,000X improvement.

Put simply: software moats have largely evaporated over the past year.

In a recent diligence project, I replicated ~80% of the target’s product in a single weekend (in spare time) for something like $60 of Claude Code time. When the target wrote the product two years ago, it was ground-breaking. Now it was a weekend’s part-time work.
Another colleague did something similar on a recent project.
We’re both experienced software developers, and deeply ensconced in the context engineering rabbit holes. But a motivated junior engineer could very likely do this in under a week.

It's never technology, always psychology

My colleagues and I have successfully used context engineering principles and the latest generation of AI coding tools and LLMs to:

rebuild significant legacy systems into modern stacks with full test infrastructure and mature coding practices
build greenfield apps at unbelievable speed with legit UI frontend work
create agents, skills, commands, and developer workflow tools to further accelerate our own work with these tools
analyze legacy codebases and plan monolith decomposition and modularization
read, analyze, and fix bugs in open source projects
develop documentation and visualization for codebases, with no prior exposure to the code

The hardest part?

Change management. Humans can’t mentally accelerate at the same rate as the tools.

Rote tasks? AI is already there. That future just isn’t yet evenly distributed. Creative tasks? They’re next.

So what should you do?

Three things matter more than ever:

Master the tools
Stay flexible and experiment constantly
Build moats around relationships, distribution, and trust—not code

Because the curve is still bending upward. And it’s bending faster than most people realize.

[1] Nestor Maslej, Loredana Fattorini, Raymond Perrault, Yolanda Gil, Vanessa Parli, Njenga Kariuki, Emily Capstick, Anka Reuel, Erik Brynjolfsson, John Etchemendy, Katrina Ligett, Terah Lyons, James Manyika, Juan Carlos Niebles, Yoav Shoham, Russell Wald, Toby Walsh, Armin Hamrah, Lapo Santarlasci, Julia Betts Lotufo, Alexandra Rome, Andrew Shi, Sukrut Oak. “The AI Index 2025 Annual Report,” AI Index Steering Committee, Institute for Human-Centered AI, Stanford University, Stanford, CA, April 2025. https://doi.org/10.48550/arXiv.2504.07139