<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Michael Tuszynski</title>
    <description>The latest articles on Forem by Michael Tuszynski (@michaeltuszynski).</description>
    <link>https://forem.com/michaeltuszynski</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1447774%2Fa99eea93-7845-4764-9fce-b1755bcfa456.png</url>
      <title>Forem: Michael Tuszynski</title>
      <link>https://forem.com/michaeltuszynski</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/michaeltuszynski"/>
    <language>en</language>
    <item>
      <title>38% of AI Answers Are Wrong — And It's Your Prompt's Fault</title>
      <dc:creator>Michael Tuszynski</dc:creator>
      <pubDate>Tue, 14 Apr 2026 15:16:26 +0000</pubDate>
      <link>https://forem.com/michaeltuszynski/your-prompt-has-too-many-jobs-25gi</link>
      <guid>https://forem.com/michaeltuszynski/your-prompt-has-too-many-jobs-25gi</guid>
      <description>&lt;p&gt;Every week someone posts about AI hallucination like it's a mystery. It's not. A &lt;a href="https://www.frontiersin.org/journals/artificial-intelligence/articles/10.3389/frai.2025.1622292/full" rel="noopener noreferrer"&gt;2025 Frontiers in AI study&lt;/a&gt; measured it: vague, multi-objective prompts hallucinate &lt;strong&gt;38.3% of the time&lt;/strong&gt;. Structured, single-focus prompts? &lt;strong&gt;18.1%&lt;/strong&gt;. That's a 20-point accuracy gap from how you write the prompt — not which model you pick.&lt;/p&gt;

&lt;p&gt;Everyone's debating GPT vs. Claude vs. Gemini. Nobody's talking about the fact that prompt structure matters more than model selection for most use cases.&lt;/p&gt;

&lt;h2&gt;
  
  
  The $0 Fix Nobody Uses
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://sqmagazine.co.uk/llm-hallucination-statistics/" rel="noopener noreferrer"&gt;Research from SQ Magazine&lt;/a&gt; breaks it down further: zero-shot prompts (no examples, no structure) hallucinate at &lt;strong&gt;34.5%&lt;/strong&gt;. Add a few examples and that drops to &lt;strong&gt;27.2%&lt;/strong&gt;. Add explicit instructions: &lt;strong&gt;24.6%&lt;/strong&gt;. Simply adding "If you're not sure, say so" cuts hallucination by another &lt;strong&gt;15%&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;That last one is worth repeating. One sentence — "If you're not confident, say you don't know" — is worth more than upgrading your model tier. And it costs nothing.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Multi-Task Prompts Are the Worst Offender
&lt;/h2&gt;

&lt;p&gt;"Summarize this doc, extract the key risks, and draft a response email" feels like one task. It's three. And each additional objective gives the model more room to fabricate connections between things that don't connect.&lt;/p&gt;

&lt;p&gt;Language models are next-token predictors. Single task = narrow probability distribution = the model knows where it's headed. Three tasks stacked together = triple the surface area for error. A small fabrication in the summary becomes a stated fact in the risk analysis becomes a confident assertion in the draft email.&lt;/p&gt;

&lt;p&gt;Longer, multi-part prompts increase error rates by &lt;a href="https://sqmagazine.co.uk/llm-hallucination-statistics/" rel="noopener noreferrer"&gt;roughly 10%&lt;/a&gt;. In legal contexts, hallucination rates run between &lt;strong&gt;58% and 88%&lt;/strong&gt;. That's not an AI problem. That's a prompting problem.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Actually Works (With Numbers)
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;One prompt, one job.&lt;/strong&gt; Summarize the doc. Stop. Review it. Then extract risks from the verified summary. Then draft the email from verified risks. Three prompts, each building on confirmed output.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Constrain the output.&lt;/strong&gt; JSON, numbered lists, specific templates. &lt;a href="https://sqmagazine.co.uk/llm-hallucination-statistics/" rel="noopener noreferrer"&gt;Structured prompts cut medical AI hallucinations by 33%&lt;/a&gt;. The less room to improvise, the less it fabricates.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Give examples.&lt;/strong&gt; Zero-shot to few-shot: 34.5% → 27.2%. Two examples costs you 30 seconds and buys a 7-point accuracy gain.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Set refusal conditions.&lt;/strong&gt; "If confidence is below 70% or no evidence supports the claim, say 'insufficient data.'" You're not weakening the model. You're giving it a pressure valve so it doesn't fill gaps with fiction.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Model Isn't the Bottleneck
&lt;/h2&gt;

&lt;p&gt;The best models went from &lt;a href="https://www.aboutchromebooks.com/ai-hallucination-rates-across-different-models/" rel="noopener noreferrer"&gt;21.8% hallucination in 2021 to 0.7% in 2025&lt;/a&gt; on benchmarks. But benchmarks test clean, single-objective tasks. Real-world, multi-step workflows — the kind actual professionals run — depend more on how you ask than what you ask.&lt;/p&gt;

&lt;p&gt;You wouldn't hand a contractor one work order that says "remodel the kitchen, fix the plumbing, and repaint the exterior." You'd scope each job, inspect the work, then move on.&lt;/p&gt;

&lt;p&gt;The people getting the best results from AI already know this. Everyone else is blaming the model.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>promptengineering</category>
      <category>llm</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>Building a Plugin Marketplace for AI-Native Workflows</title>
      <dc:creator>Michael Tuszynski</dc:creator>
      <pubDate>Fri, 10 Apr 2026 22:55:16 +0000</pubDate>
      <link>https://forem.com/michaeltuszynski/building-a-plugin-marketplace-for-ai-native-workflows-45lb</link>
      <guid>https://forem.com/michaeltuszynski/building-a-plugin-marketplace-for-ai-native-workflows-45lb</guid>
      <description>&lt;p&gt;Most AI coding tools ship as monoliths. One big system prompt, one set of capabilities, one-size-fits-all. That works fine for general software engineering. It falls apart the moment you need domain-specific workflows that vary by role, by team, and by engagement.&lt;/p&gt;

&lt;p&gt;I build presales systems at &lt;a href="https://www.presidio.com" rel="noopener noreferrer"&gt;Presidio&lt;/a&gt; — client research, SOW generation, meeting capture, deal operations. The kind of work where a solutions architect needs different tools than a deal desk analyst, and where loading everything into every session wastes tokens and degrades output quality.&lt;/p&gt;

&lt;p&gt;So I built a plugin marketplace for &lt;a href="https://docs.anthropic.com/en/docs/claude-code/overview" rel="noopener noreferrer"&gt;Claude Code&lt;/a&gt;. Nine modular plugins, independently installable, composable by role. Here's what I learned shipping it to a team.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Plugins Instead of One Big Workspace
&lt;/h2&gt;

&lt;p&gt;The original system was a monolith — 56 commands, 14 skills, 36 tools, all loaded into every session. It worked for me as the sole operator. The moment I tried to share it with other consultants, three problems surfaced immediately:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Context pollution.&lt;/strong&gt; A consultant doing SOW work doesn't need the meeting transcription pipeline, the competitive intel framework, or the deal management commands cluttering their context window. Every irrelevant token degrades the model's attention on the task at hand.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Onboarding friction.&lt;/strong&gt; "Clone this repo, read 200 lines of docs, configure 18 environment variables, and learn 56 commands" is not an adoption strategy. People need to install what they need and ignore what they don't.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Maintenance coupling.&lt;/strong&gt; A bug fix in the meeting recorder shouldn't require every user to pull an update that also touches their SOW pipeline. Independent versioning matters when your users are busy consultants, not developers.&lt;/p&gt;

&lt;p&gt;The fix was decomposition. Break the monolith into plugins that can be installed, updated, and removed independently.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Architecture That Emerged
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;fulcrum-plugins/
├── plugins/
│   ├── core/          # Foundation: persona, auth, OneDrive, shared rules
│   ├── intel/         # Client research pipeline
│   ├── meeting/       # Silent recording + transcription
│   ├── discovery/     # Call prep, qualification, opportunity analysis
│   ├── sow/           # SOW drafting, review, QA, redlines
│   ├── proposal/      # Solution decks and pricing workbooks
│   ├── ops/           # Daily briefing, weekly review, context switching
│   ├── engage/        # Deal management, delivery handoff
│   └── util/          # Freshness audits, triage, workspace maintenance
├── shared/            # Frameworks and templates used across plugins
└── docs/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each plugin is a self-contained directory with commands, scripts, rules, and hooks. One required dependency — &lt;code&gt;core&lt;/code&gt; — provides identity, authentication, and the shared file system. Everything else is optional.&lt;/p&gt;

&lt;p&gt;Installation is one command: &lt;code&gt;/plugin install sow@fulcrum-plugins&lt;/code&gt;. Uninstall is equally clean. No cross-plugin imports, no shared state beyond what &lt;code&gt;core&lt;/code&gt; provides.&lt;/p&gt;

&lt;h3&gt;
  
  
  Recommended Sets Over Required Bundles
&lt;/h3&gt;

&lt;p&gt;Rather than prescribing one configuration, the marketplace offers recommended sets:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Minimal (SOW-focused):&lt;/strong&gt; core + sow — for consultants who only write statements of work&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Meeting-heavy:&lt;/strong&gt; core + meeting + discovery — for consultants running client calls all day&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Full presales:&lt;/strong&gt; all 9 plugins — for people like me who touch every phase&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This respects how people actually work. Nobody uses every tool every day. The consultant who installs just &lt;code&gt;core + sow&lt;/code&gt; gets a focused, fast experience. The one who installs everything gets the full operating system. Both are first-class citizens.&lt;/p&gt;

&lt;h2&gt;
  
  
  Namespace Isolation Matters More Than You Think
&lt;/h2&gt;

&lt;p&gt;When I first decomposed the monolith, every plugin had commands like &lt;code&gt;/draft&lt;/code&gt;, &lt;code&gt;/review&lt;/code&gt;, &lt;code&gt;/status&lt;/code&gt;. Collisions everywhere. The fix was &lt;a href="https://docs.anthropic.com/en/docs/claude-code/plugins" rel="noopener noreferrer"&gt;namespace syntax&lt;/a&gt;: &lt;code&gt;/sow:draft&lt;/code&gt;, &lt;code&gt;/intel:company-intel&lt;/code&gt;, &lt;code&gt;/ops:weekly-review&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;This felt verbose at first. It turned out to be the single most important design decision. Namespaces do three things:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Eliminate ambiguity.&lt;/strong&gt; &lt;code&gt;/draft&lt;/code&gt; could mean a SOW draft, a proposal draft, or an email draft. &lt;code&gt;/sow:draft&lt;/code&gt; is unambiguous.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Enable discovery.&lt;/strong&gt; A new user can type &lt;code&gt;/sow:&lt;/code&gt; and see every SOW command without memorizing a list. The namespace IS the documentation.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Preserve plugin independence.&lt;/strong&gt; Two plugin authors can independently create a &lt;code&gt;status&lt;/code&gt; command without coordinating. &lt;code&gt;/engage:status&lt;/code&gt; and &lt;code&gt;/ops:status&lt;/code&gt; coexist without conflict.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;I renamed all plugins in v1.1 specifically to get shorter prefixes — &lt;code&gt;sow-pipeline&lt;/code&gt; became &lt;code&gt;sow&lt;/code&gt;, &lt;code&gt;research-intel&lt;/code&gt; became &lt;code&gt;intel&lt;/code&gt;, &lt;code&gt;engagement-lifecycle&lt;/code&gt; became &lt;code&gt;engage&lt;/code&gt;. Every keystroke matters when you're typing these dozens of times a day.&lt;/p&gt;

&lt;h2&gt;
  
  
  Shared Lessons: Institutional Memory Without a Database
&lt;/h2&gt;

&lt;p&gt;The feature I'm most proud of has zero lines of application code. It's a folder on OneDrive.&lt;/p&gt;

&lt;p&gt;When a consultant learns something the hard way — a Salesforce field that's locked to AMs, a client's preferred meeting format, a compliance requirement that isn't documented anywhere — they drop a markdown file into a shared folder:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;shared-lessons/jane-doe/2026-04-09-sfdc-stage-lock.md
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;At session start, a hook script scans all lesson files from the team, deduplicates by title, and renders them into a cached rule that loads into every session. The team's collective knowledge grows without anyone maintaining a wiki, attending a knowledge-sharing meeting, or filing a ticket.&lt;/p&gt;

&lt;p&gt;The constraints are deliberate: one lesson per file, no client-confidential data, attribution required, dates required (stale lessons get pruned). The format is simple enough that non-developers can contribute. The mechanism — a OneDrive folder sync'd through &lt;a href="https://www.microsoft.com/en-us/microsoft-teams/group-chat-software" rel="noopener noreferrer"&gt;Microsoft Teams&lt;/a&gt; — requires no new tools or logins.&lt;/p&gt;

&lt;p&gt;This is the pattern I keep returning to: &lt;strong&gt;use infrastructure people already have.&lt;/strong&gt; OneDrive, git, markdown files. Not a custom database, not a new SaaS tool, not an API integration. The best systems are the ones that disappear into workflows people already follow.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Didn't Work
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Plugin dependencies.&lt;/strong&gt; The original design had plugins declaring dependencies on each other — &lt;code&gt;sow&lt;/code&gt; depends on &lt;code&gt;intel&lt;/code&gt;, &lt;code&gt;engage&lt;/code&gt; depends on &lt;code&gt;discovery&lt;/code&gt;. In practice, this created install-order headaches and made it harder to reason about what was loaded. The v1.3 architecture dropped all inter-plugin dependencies. Each plugin is fully self-contained. If &lt;code&gt;sow&lt;/code&gt; needs client context, it reads the client's context file directly — it doesn't import a function from &lt;code&gt;intel&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Automatic plugin updates.&lt;/strong&gt; The original design auto-pulled updates on session start. In practice, this broke people mid-workflow when a command signature changed. The fix was making updates explicit — &lt;code&gt;/core:update&lt;/code&gt; when you're ready, with a statusline indicator showing when updates are available. Users update on their own schedule, not yours.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Granular versioning.&lt;/strong&gt; I initially tried to version each plugin independently. After three days of version coordination across nine plugins, I switched to a single marketplace version. All plugins share the version in &lt;code&gt;VERSION&lt;/code&gt;. SemVer at the marketplace level, not the plugin level. Simpler to reason about, simpler to communicate ("update to 1.3.1"), simpler to tag.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Adoption Signal
&lt;/h2&gt;

&lt;p&gt;The most telling metric isn't installs — it's which recommended set people choose. When most of your users install the minimal set and gradually add plugins over weeks, you've built something that earns trust incrementally. When they install everything on day one and complain it's overwhelming, you've just shipped a monolith with extra steps.&lt;/p&gt;

&lt;p&gt;So far, the pattern is healthy. New consultants start with &lt;code&gt;core + sow&lt;/code&gt; (the job requirement), then add &lt;code&gt;discovery&lt;/code&gt; after their first client call, then &lt;code&gt;meeting&lt;/code&gt; after they see someone else's AI-generated meeting notes. Pull, not push.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Plugin architectures aren't new. What's new is applying them to AI agent context — treating the model's knowledge, rules, and capabilities as composable modules rather than a static system prompt. The tools exist today in &lt;a href="https://docs.anthropic.com/en/docs/claude-code/plugins" rel="noopener noreferrer"&gt;Claude Code's plugin system&lt;/a&gt;. The hard part isn't the architecture. It's the discipline to decompose.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>claudecode</category>
      <category>plugins</category>
      <category>devtools</category>
    </item>
    <item>
      <title>Context Engineering Is the New Prompt Engineering</title>
      <dc:creator>Michael Tuszynski</dc:creator>
      <pubDate>Wed, 08 Apr 2026 22:06:58 +0000</pubDate>
      <link>https://forem.com/michaeltuszynski/context-engineering-is-the-new-prompt-engineering-2231</link>
      <guid>https://forem.com/michaeltuszynski/context-engineering-is-the-new-prompt-engineering-2231</guid>
      <description>&lt;p&gt;Everyone's writing better prompts. Few are building better context.&lt;/p&gt;

&lt;p&gt;That's the gap. Prompt engineering treats AI like a search box — craft the perfect query, get the perfect answer. Context engineering treats AI like a new team member — give them the right docs, the right access, and a clear understanding of how work actually gets done. As Andrej Karpathy &lt;a href="https://x.com/kaborepathy/status/1937902191498797514" rel="noopener noreferrer"&gt;put it&lt;/a&gt;, the hottest new programming language is English — but the program isn't the prompt. It's the context surrounding it.&lt;/p&gt;

&lt;p&gt;I've spent the last six months building AI-native workflows at &lt;a href="https://www.presidio.com" rel="noopener noreferrer"&gt;Presidio&lt;/a&gt;, where I'm a Principal Solutions Architect. Not chatbots. Not demos. Production systems where Claude Code agents run real presales operations — client research, proposal generation, meeting analysis, deal tracking. The kind of work that used to live in someone's head and a dozen browser tabs.&lt;/p&gt;

&lt;p&gt;Here's what I learned about making AI actually useful.&lt;/p&gt;

&lt;h2&gt;
  
  
  Prompts Are Requests. Skills Are Frameworks.
&lt;/h2&gt;

&lt;p&gt;The first mistake everyone makes: stuffing domain knowledge into prompts. "You are an expert in enterprise sales. When analyzing a deal, consider these 47 factors..."&lt;/p&gt;

&lt;p&gt;That breaks immediately. Prompts are ephemeral — they disappear when the conversation ends. Domain knowledge needs to persist across sessions, get version-controlled, and evolve as you learn what works.&lt;/p&gt;

&lt;p&gt;The pattern that works: &lt;strong&gt;skills files&lt;/strong&gt; — what &lt;a href="https://docs.anthropic.com/en/docs/claude-code/skills" rel="noopener noreferrer"&gt;Claude Code's plugin architecture&lt;/a&gt; calls reusable domain knowledge. Markdown documents that encode decision frameworks, not instructions. A skill isn't "analyze this deal." A skill is the 5-gate qualification framework your team actually uses, written as structured markdown with decision criteria, red flags, and exit conditions. The AI reads it and applies it. You update the framework once, every future session uses the new version.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;.claude/skills/
├── qualification-framework.md    # Decision gates with criteria
├── pricing-strategy.md           # Margin rules, discount authority
├── sow-review-rubric.md          # Evaluation checklist
└── competitive-positioning.md    # Differentiators by competitor
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Skills are reusable. Prompts are disposable. That distinction matters more than any prompting technique.&lt;/p&gt;

&lt;h2&gt;
  
  
  Context-as-Code: Version Control Your AI's Brain
&lt;/h2&gt;

&lt;p&gt;Every client engagement in my system has a single markdown file that serves as the AI's working memory for that account. Contacts, scope decisions, meeting history, action items, competitive intel — one file, version-controlled in git.&lt;/p&gt;

&lt;p&gt;Why markdown? Three reasons:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;It's grep-searchable.&lt;/strong&gt; When an agent needs to find every mention of a specific technology across all accounts, &lt;code&gt;grep -r "Kubernetes" clients/&lt;/code&gt; works instantly. Try that with a vector database.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;It diffs cleanly.&lt;/strong&gt; Git shows you exactly what changed in the AI's understanding of an account. Who updated it, when, and why.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;It's token-efficient.&lt;/strong&gt; Structured markdown compresses well in context windows. A 200-line context file gives an agent everything it needs to operate on an account without RAG retrieval latency.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The anti-pattern is treating AI memory as a black box — embeddings you can't inspect, vector stores you can't diff, context you can't version. If you can't &lt;code&gt;git blame&lt;/code&gt; your AI's knowledge, you don't control it.&lt;/p&gt;

&lt;h2&gt;
  
  
  One God-Agent Is a Trap
&lt;/h2&gt;

&lt;p&gt;I started with one agent that did everything. It was mediocre at all of it.&lt;/p&gt;

&lt;p&gt;The fix was domain specialization. Four agents, each with a clear role: one handles discovery and qualification, one handles technical design and proposals, one handles deal operations and pricing, one handles workspace maintenance. Each agent has its own tools, its own context, and a defined handoff protocol for passing work to another agent.&lt;/p&gt;

&lt;p&gt;This mirrors how real teams work. Your sales engineer doesn't do contract redlines. Your deal desk doesn't design architecture. Specialization isn't just about accuracy — it's about &lt;strong&gt;cost&lt;/strong&gt;. A maintenance agent running on Haiku costs 95% less than an Opus agent doing the same file cleanup.&lt;/p&gt;

&lt;p&gt;Model routing by task complexity is the easiest money you'll save in AI:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Task Type&lt;/th&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Why&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;File organization, validation&lt;/td&gt;
&lt;td&gt;Haiku&lt;/td&gt;
&lt;td&gt;Structured, predictable&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Research, summarization&lt;/td&gt;
&lt;td&gt;Sonnet&lt;/td&gt;
&lt;td&gt;Good reasoning, fast&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Strategy, complex writing&lt;/td&gt;
&lt;td&gt;Opus&lt;/td&gt;
&lt;td&gt;Needs deep reasoning&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Mistakes Must Become Infrastructure
&lt;/h2&gt;

&lt;p&gt;Every production AI system has failure modes you won't predict. The question is whether failures teach the system or just annoy you.&lt;/p&gt;

&lt;p&gt;My approach: every time an agent makes a mistake that I have to correct, it becomes a numbered rule in the system's configuration file. Not a mental note. Not a prompt tweak. A permanent, version-controlled rule that every future session reads on startup.&lt;/p&gt;

&lt;p&gt;After six months, I have 39 of these. Examples:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"State files in .gitignore can vanish silently during merges — default to safe fallback values"&lt;/li&gt;
&lt;li&gt;"Never infer what a client said in a meeting — only quote from the transcript or flag it as an assumption"&lt;/li&gt;
&lt;li&gt;"Contract reverts produce the same error on every RPC — don't retry, they're non-recoverable"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These aren't prompt engineering. They're institutional memory encoded as code. The system gets smarter every time it fails, without retraining, fine-tuning, or hoping the model "remembers."&lt;/p&gt;

&lt;h2&gt;
  
  
  Don't Dump Everything Into Context
&lt;/h2&gt;

&lt;p&gt;The biggest performance killer in AI systems isn't the model — it's context pollution. Loading every piece of knowledge into every session degrades output quality and burns tokens.&lt;/p&gt;

&lt;p&gt;The pattern that works: &lt;strong&gt;modular context loading&lt;/strong&gt;. My system has 14 skills, 56 commands, and context files for dozens of accounts. But any given session loads only what's relevant — the specific client context, the specific workflow skills, the specific agent role. Everything else stays on disk until needed.&lt;/p&gt;

&lt;p&gt;Think of it like imports in code. You wouldn't &lt;code&gt;import *&lt;/code&gt; from every module in your codebase. Don't do it with AI context either.&lt;/p&gt;

&lt;p&gt;This also means your context files need to be &lt;strong&gt;current state, not changelogs&lt;/strong&gt;. A context file that accumulates three months of historical notes becomes noise. Describe what the system &lt;em&gt;is right now&lt;/em&gt; in 150 scannable lines. Put the changelog somewhere else.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Stack That Actually Works
&lt;/h2&gt;

&lt;p&gt;After building this across multiple enterprise engagements, here's the architecture I'd recommend for anyone building AI-native workflows:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Structured context files&lt;/strong&gt; (markdown, git-tracked) over vector databases for domain knowledge&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Skills&lt;/strong&gt; (persistent frameworks) over prompts (ephemeral instructions) for domain expertise
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Specialized agents&lt;/strong&gt; with handoff protocols over one general-purpose agent&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cost-aware model routing&lt;/strong&gt; — match model capability to task complexity&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Error-to-rule pipelines&lt;/strong&gt; — every failure becomes a permanent system improvement&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Modular loading&lt;/strong&gt; — only load context relevant to the current task&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;None of this requires a framework. No &lt;a href="https://github.com/langchain-ai/langchain" rel="noopener noreferrer"&gt;LangChain&lt;/a&gt;, no LlamaIndex, no orchestration layer. It's markdown files, a CLI, and good engineering discipline. The AI does the reasoning. You do the architecture.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;The tools for building AI-native workflows exist today. The bottleneck isn't model capability — it's context architecture. Start treating your AI's knowledge like code: structured, versioned, reviewed, and intentionally loaded. That's the difference between a chatbot and a system.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>contextengineering</category>
      <category>claudecode</category>
      <category>devtools</category>
    </item>
    <item>
      <title>AWS Frontier Agents: What $50/Hour Pen Testing and $30/Hour SRE Means for Platform Teams</title>
      <dc:creator>Michael Tuszynski</dc:creator>
      <pubDate>Mon, 06 Apr 2026 00:57:56 +0000</pubDate>
      <link>https://forem.com/michaeltuszynski/aws-frontier-agents-what-50hour-pen-testing-and-30hour-sre-means-for-platform-teams-5jk</link>
      <guid>https://forem.com/michaeltuszynski/aws-frontier-agents-what-50hour-pen-testing-and-30hour-sre-means-for-platform-teams-5jk</guid>
      <description>&lt;p&gt;AWS just launched two autonomous AI agents — Security Agent and DevOps Agent — and they're both generally available now. These aren't chatbots with polished wrappers. They're persistent, autonomous systems that run for hours or days without human oversight, doing work that previously required dedicated teams.&lt;/p&gt;

&lt;p&gt;Here's what caught my attention, and why platform engineers should be paying close attention.&lt;/p&gt;

&lt;h2&gt;
  
  
  Two Agents, Two Big Problems
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://aws.amazon.com/security-agent/" rel="noopener noreferrer"&gt;AWS Security Agent&lt;/a&gt;&lt;/strong&gt; handles penetration testing. Not the "run a scanner and hand you a PDF" kind — it ingests your source code, architecture diagrams, and documentation, then operates like a human pen tester. It identifies vulnerabilities, builds attack chains, and validates that findings are real exploitable risks. &lt;a href="https://aws.amazon.com/blogs/machine-learning/aws-launches-frontier-agents-for-security-testing-and-cloud-operations/" rel="noopener noreferrer"&gt;Bamboo Health reported&lt;/a&gt; it "surfaced findings that no other tool has uncovered." HENNGE K.K. cut their testing duration by over 90%.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://aws.amazon.com/devops-agent/" rel="noopener noreferrer"&gt;AWS DevOps Agent&lt;/a&gt;&lt;/strong&gt; handles incident response and operational tasks. It correlates telemetry, code, and deployment data across your stack — AWS, Azure, hybrid, on-prem — and integrates with the observability tools you already use: CloudWatch, Datadog, Dynatrace, New Relic, Splunk, Grafana. Western Governor's University cut mean time to resolution from two hours to 28 minutes during a production incident by pinpointing a Lambda configuration issue that had been buried in undiscovered internal docs.&lt;/p&gt;

&lt;p&gt;The preview numbers are worth noting: up to 75% lower MTTR, 80% faster investigations, and 94% root cause accuracy.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Pricing Makes the Strategy Obvious
&lt;/h2&gt;

&lt;p&gt;This is where it gets interesting.&lt;/p&gt;

&lt;p&gt;DevOps Agent costs &lt;a href="https://aws.amazon.com/devops-agent/pricing/" rel="noopener noreferrer"&gt;$0.0083 per agent-second&lt;/a&gt; — roughly $29.88 per hour. AWS's own pricing examples show a small team running 10 investigations per month pays about $40. An enterprise running 500 incidents per month pays around $2,300. For context, a single on-call SRE costs you $150k-$200k/year fully loaded.&lt;/p&gt;

&lt;p&gt;Security Agent runs at &lt;a href="https://aws.amazon.com/security-agent/pricing/" rel="noopener noreferrer"&gt;$50 per task-hour&lt;/a&gt;. A small API test costs about $173. A full application pen test runs around $1,200. Compare that to external pen testing firms charging $15k-$50k per engagement, with weeks of lead time and limited scope.&lt;/p&gt;

&lt;p&gt;Both agents include a 2-month free trial. AWS is clearly betting on adoption velocity — get teams hooked on the speed and economics, then make it sticky through integration depth.&lt;/p&gt;

&lt;p&gt;The DevOps Agent pricing also ties into existing AWS Support plans. Enterprise Support customers get 75% of their support charges back as DevOps Agent credits. Unified Operations customers get 100%. AWS is effectively saying: your support spend now buys you autonomous operations capacity.&lt;/p&gt;

&lt;h2&gt;
  
  
  What This Actually Means for Platform Teams
&lt;/h2&gt;

&lt;p&gt;AWS calls these "frontier agents" — autonomous systems that work independently, scale massively across concurrent tasks, and run persistently. The framing matters because it signals a product category, not a one-off feature.&lt;/p&gt;

&lt;p&gt;Three implications stand out:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Security becomes continuous, not periodic.&lt;/strong&gt; Most organizations pen test their top 5-10 applications once or twice a year because of cost and staffing constraints. At $50/task-hour, you can afford to test everything, continuously. The security posture shift from "we tested our critical apps last quarter" to "every app gets tested every sprint" is significant.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Incident response gets a tireless first responder.&lt;/strong&gt; The DevOps Agent doesn't replace your SRE team — it augments the 3am on-call rotation. It can start investigating before a human even picks up the page, correlating signals across your entire stack. By the time your engineer opens their laptop, the agent has already identified probable root cause with 94% accuracy.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The multicloud angle is deliberate.&lt;/strong&gt; AWS built the DevOps Agent to work with Azure DevOps, GitHub, GitLab, and non-AWS observability tools. This isn't altruism — it's a land-and-expand play. Once your operational intelligence lives in AWS, migrating workloads away gets harder. But for teams running hybrid environments today, having a single agent that understands your entire topology is genuinely useful.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Bigger Picture
&lt;/h2&gt;

&lt;p&gt;Google and Microsoft have their own agentic plays, but AWS shipping two production-ready autonomous agents with per-second billing and free trials is a concrete move. The fact that both agents work with tools like &lt;a href="https://aws.amazon.com/blogs/machine-learning/aws-launches-frontier-agents-for-security-testing-and-cloud-operations/" rel="noopener noreferrer"&gt;Claude Code and Kiro&lt;/a&gt; for generating validated fixes signals that AWS sees these agents as part of a broader autonomous development loop — not isolated point solutions.&lt;/p&gt;

&lt;p&gt;For platform engineering teams, the takeaway is practical: evaluate these agents against your current pen testing costs and incident response metrics. The economics alone justify a proof of concept. The free trial removes any excuse not to try.&lt;/p&gt;

&lt;p&gt;The real question isn't whether AI agents will handle DevOps and security tasks. It's how fast your organization adapts its processes, roles, and trust models to work alongside them.&lt;/p&gt;

</description>
      <category>aws</category>
      <category>aiagents</category>
      <category>devops</category>
      <category>security</category>
    </item>
    <item>
      <title>The systems behind enterprise AI adoption success - IBM</title>
      <dc:creator>Michael Tuszynski</dc:creator>
      <pubDate>Sat, 28 Mar 2026 17:10:20 +0000</pubDate>
      <link>https://forem.com/michaeltuszynski/the-systems-behind-enterprise-ai-adoption-success-ibm-53n0</link>
      <guid>https://forem.com/michaeltuszynski/the-systems-behind-enterprise-ai-adoption-success-ibm-53n0</guid>
      <description>&lt;h1&gt;
  
  
  Everyone's Buying GPUs. Almost Nobody's Ready to Feed Them.
&lt;/h1&gt;

&lt;p&gt;The enterprise AI conversation has a blind spot the size of a data center. Every budget meeting I've sat in over the past 18 months has the same shape: GPU allocation gets 70% of the discussion time, model selection gets 20%, and the data infrastructure that actually feeds those models gets whatever's left over. Usually about ten minutes and a vague reference to "we'll figure out storage later."&lt;/p&gt;

&lt;p&gt;This is why most enterprise AI deployments stall after the proof of concept.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Bottleneck Nobody Budgets For
&lt;/h2&gt;

&lt;p&gt;Here's what happens in practice. A team spins up a promising AI workload — retrieval-augmented generation, a fine-tuning pipeline, an inference service. It works great on a curated dataset in a dev environment. Then they try to run it against production data at scale and everything falls apart. Not because the model is wrong, but because the storage layer can't deliver data fast enough, the pipeline can't unify sources across hybrid environments, and nobody planned for the I/O characteristics of AI workloads.&lt;/p&gt;

&lt;p&gt;AI training and inference workloads have fundamentally different storage profiles than traditional enterprise applications. Training jobs need sustained sequential throughput across massive datasets. Inference needs low-latency random reads. Fine-tuning needs both, sometimes simultaneously. Your SAN that runs ERP just fine will choke on a distributed training job that's trying to saturate eight GPUs.&lt;/p&gt;

&lt;p&gt;IBM's recent framing of &lt;a href="https://www.ibm.com/think/insights/systems-behind-enterprise-ai-adoption-success" rel="noopener noreferrer"&gt;AI-ready infrastructure&lt;/a&gt; gets this right: the systems layer — storage, compute fabric, automation — is where enterprise AI succeeds or dies. Not in the model layer.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Data Gravity Problem
&lt;/h2&gt;

&lt;p&gt;The reason storage matters so much for AI isn't just throughput. It's data gravity.&lt;/p&gt;

&lt;p&gt;Enterprise data doesn't live in one place. It's spread across on-prem databases, cloud object stores, SaaS platforms, edge devices, and that one team's PostgreSQL instance that nobody wants to touch. &lt;a href="https://www.ibm.com/think/topics/enterprise-ai" rel="noopener noreferrer"&gt;IBM defines enterprise AI&lt;/a&gt; as the integration of AI across large organizations — but integration implies the data is accessible. In most companies, it isn't. Not in any unified, performant way.&lt;/p&gt;

&lt;p&gt;This creates a cascading failure. Your RAG pipeline needs product data from SAP, customer interactions from Salesforce, and technical documentation from Confluence. Each source has different access patterns, different latency profiles, different security boundaries. Stitching them together with API calls and batch ETL jobs introduces hours of lag and creates brittle pipelines that break every time someone changes a schema.&lt;/p&gt;

&lt;p&gt;The companies I've seen succeed at enterprise AI solve this problem first. They build a unified storage layer that can serve multiple AI workloads without requiring six different integration patterns. IBM's approach with Storage Fusion and FlashSystem targets exactly this — &lt;a href="https://www.ibm.com/think/insights/systems-behind-enterprise-ai-adoption-success" rel="noopener noreferrer"&gt;high-performance, unified storage&lt;/a&gt; that can handle the mixed I/O profiles of AI workloads across hybrid environments. Whether you're on their stack or not, the architectural principle holds: if your AI workloads can't access unified data at the speed they need it, no amount of GPU spend will fix your pipeline.&lt;/p&gt;

&lt;h2&gt;
  
  
  Hybrid Cloud Is the Reality, Not the Exception
&lt;/h2&gt;

&lt;p&gt;There's still a persistent fantasy in some planning meetings that AI workloads will live entirely in one public cloud. Maybe someday. Right now, for regulated industries, for companies with significant on-prem investments, and for anyone who's done the math on data egress costs, hybrid is the reality.&lt;/p&gt;

&lt;p&gt;And hybrid AI infrastructure is hard. You need consistent orchestration across environments. You need storage tiering that can move hot data close to compute without manual intervention. You need security and governance that doesn't collapse the moment data crosses a network boundary.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.ibm.com/think/insights/ai-adoption-challenges" rel="noopener noreferrer"&gt;IBM identifies inadequate infrastructure as one of the top five AI adoption challenges&lt;/a&gt; — and in my experience, "inadequate" usually means "designed for a different era." The infrastructure that runs your web applications, your CI/CD pipelines, your traditional analytics workloads — it wasn't built for the throughput patterns, the data volumes, or the operational demands of production AI.&lt;/p&gt;

&lt;p&gt;This isn't a rip-and-replace argument. Nobody's going to throw out their storage infrastructure overnight. But you need a plan for how your existing infrastructure evolves to support AI workloads, and that plan needs to happen before you commit to production deployments.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Actually Works: Three Patterns From the Field
&lt;/h2&gt;

&lt;p&gt;After spending time with organizations that have moved past the POC phase into production AI, I see three common patterns:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Storage-first capacity planning.&lt;/strong&gt; Successful teams model their data pipeline throughput requirements before they size GPU clusters. They ask: "How fast can we feed data to training jobs?" and "What's our p99 latency for inference-time retrieval?" If the answers don't match the model's appetite, they fix storage before buying more compute.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Unified data access across environments.&lt;/strong&gt; Whether it's IBM Storage Fusion, a well-architected MinIO deployment, or a managed cloud storage layer with on-prem caching, the pattern is the same: AI workloads get a single namespace to read from, regardless of where the source data physically lives. This eliminates the integration tax that kills most pipelines.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Automation of the data lifecycle.&lt;/strong&gt; Production AI generates enormous amounts of intermediate data — checkpoints, embeddings, feature stores, evaluation datasets. Teams that automate tiering, retention, and cleanup avoid the "we ran out of storage on a Friday night" incident that's practically a rite of passage.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Uncomfortable Math
&lt;/h2&gt;

&lt;p&gt;Here's a rough calculation that sobers up most planning conversations. A mid-size enterprise running a fine-tuning pipeline on proprietary data with a 70B parameter model needs approximately 500TB of accessible, high-performance storage just for the training data, checkpoints, and model artifacts. That's before you add your RAG corpus, your vector store, and your evaluation datasets.&lt;/p&gt;

&lt;p&gt;Now multiply that by the number of AI initiatives in your roadmap. Most enterprises I talk to have between five and fifteen active AI projects. The storage footprint adds up fast, and it needs to perform — not just exist.&lt;/p&gt;

&lt;p&gt;The GPU shortage got all the headlines in 2024. The storage and data infrastructure gap is the quieter crisis that will define which companies actually ship production AI in 2026.&lt;/p&gt;

&lt;h2&gt;
  
  
  What CTOs Should Do Next Week
&lt;/h2&gt;

&lt;p&gt;Stop treating infrastructure as a downstream consequence of model selection. Flip it around.&lt;/p&gt;

&lt;p&gt;Audit your current storage throughput against the I/O demands of your planned AI workloads. Map where your training data lives and how many network hops separate it from your compute. Calculate the real cost of your data integration layer — not just the cloud bill, but the engineering hours spent maintaining brittle pipelines.&lt;/p&gt;

&lt;p&gt;Then have an honest conversation about whether your infrastructure roadmap matches your AI ambitions. If there's a gap — and there almost certainly is — close it before you scale your GPU footprint. The fastest accelerator in the world is useless if it's starving for data.&lt;/p&gt;

&lt;p&gt;The companies that figure this out won't just run AI. They'll run AI that actually works in production, at scale, without the 2 AM pages. That's a meaningful competitive advantage — and it starts with the infrastructure layer that nobody wants to talk about at the budget meeting.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>architecture</category>
      <category>dataengineering</category>
      <category>systems</category>
    </item>
    <item>
      <title>Why Enterprise AI Infrastructure is Going Hybrid – and Geographic</title>
      <dc:creator>Michael Tuszynski</dc:creator>
      <pubDate>Thu, 26 Mar 2026 04:39:27 +0000</pubDate>
      <link>https://forem.com/michaeltuszynski/why-enterprise-ai-infrastructure-is-going-hybrid-and-geographic-1dba</link>
      <guid>https://forem.com/michaeltuszynski/why-enterprise-ai-infrastructure-is-going-hybrid-and-geographic-1dba</guid>
      <description>&lt;h1&gt;
  
  
  The Cloud Repatriation Nobody Expected: Why Enterprise AI Is Pulling Compute Back from the Cloud
&lt;/h1&gt;

&lt;p&gt;The original pitch for cloud computing was simple: stop buying servers, rent someone else's. For most workloads over the past fifteen years, that trade worked. But AI infrastructure has rewritten the economics, and enterprises are responding by doing something few predicted — they're moving compute &lt;em&gt;closer&lt;/em&gt; to the data, not further away.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.databank.com/resources/blogs/why-enterprise-ai-infrastructure-is-going-hybrid-and-geographic/" rel="noopener noreferrer"&gt;A recent DataBank survey&lt;/a&gt; found that 76% of enterprises plan geographic expansion of their AI infrastructure, while 53% are actively adding colocation to their deployment strategies. This isn't a minor adjustment. It's a structural shift in how organizations think about where AI workloads should run.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Economics Changed Before the Strategy Did
&lt;/h2&gt;

&lt;p&gt;Running inference on a large language model in a hyperscaler region costs real money. Not "line item you can bury in OpEx" money — more like "the CFO is asking questions in the quarterly review" money. GPU instance pricing on AWS, Azure, and GCP has remained stubbornly high because demand outstrips supply, and the cloud providers know it.&lt;/p&gt;

&lt;p&gt;The math gets worse when you factor in data gravity. Most enterprises generate data in dozens of locations — retail stores, manufacturing plants, regional offices, edge devices. Shipping all that data to us-east-1 for processing, then shipping results back, creates latency and egress costs that compound as AI adoption scales.&lt;/p&gt;

&lt;p&gt;Colocation flips this equation. You place GPU-dense compute in facilities close to where data originates, connect to cloud services where they make sense (object storage, managed databases, identity), and keep the expensive part — inference and fine-tuning — on hardware you control or lease at predictable rates.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why "Cloud-Smart" Beats "Cloud-First"
&lt;/h2&gt;

&lt;p&gt;The industry is moving toward what &lt;a href="https://seekingalpha.com/article/4843221-world-of-enterprise-ai-turning-hybrid" rel="noopener noreferrer"&gt;Seeking Alpha describes as a "cloud-smart" strategy&lt;/a&gt; — using public cloud, private cloud, and edge computing based on the workload profile rather than defaulting to one deployment model for everything.&lt;/p&gt;

&lt;p&gt;This makes sense when you break down what AI workloads actually need:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Training&lt;/strong&gt; still belongs in the cloud for most organizations. You need massive, bursty GPU capacity for weeks or months, then nothing. Buying that hardware outright is a terrible investment unless you're running training continuously. Hyperscaler reserved instances or on-demand capacity work fine here.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Inference&lt;/strong&gt; is the opposite profile. It's steady-state, latency-sensitive, and runs 24/7. The cost-per-token adds up fast at scale. Running inference on colocated or on-premises hardware — especially with purpose-built accelerators — can cut costs 40-60% compared to cloud GPU instances, depending on utilization rates.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fine-tuning&lt;/strong&gt; sits in the middle. You need GPU capacity for days, not months, and the data involved is often sensitive enough that you don't want it leaving your network. A colocated setup with good connectivity to your data sources handles this well.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Geography Problem Nobody Planned For
&lt;/h2&gt;

&lt;p&gt;Data sovereignty and residency requirements are accelerating the geographic distribution of AI infrastructure in ways that pure cloud strategies can't easily accommodate.&lt;/p&gt;

&lt;p&gt;The EU's AI Act imposes requirements on where and how AI systems process data. Healthcare organizations in the US deal with HIPAA locality requirements. Financial services firms face data residency rules that vary by jurisdiction. When your AI model needs to process customer data from Germany, running inference in a Virginia data center creates compliance headaches that no amount of architectural cleverness fully solves.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.databank.com/resources/blogs/why-enterprise-ai-infrastructure-is-going-hybrid-and-geographic/" rel="noopener noreferrer"&gt;Enterprises are responding by deploying AI infrastructure across multiple geographies&lt;/a&gt; — not because they want the operational complexity, but because regulators and customers demand it. The 76% planning geographic expansion aren't chasing some multicloud vision. They're meeting regulatory reality.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Edge Dimension
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://www.techaheadcorp.com/blog/why-modern-enterprises-need-hybrid-edge-cloud-ai/" rel="noopener noreferrer"&gt;Hybrid edge-cloud architectures&lt;/a&gt; add another layer. Manufacturing plants running quality inspection models can't tolerate 200ms round-trip latency to a cloud region. Autonomous systems need inference at the point of action. Retail environments process customer interactions in real time.&lt;/p&gt;

&lt;p&gt;These use cases demand on-site or near-site compute with cloud connectivity for model updates, monitoring, and periodic retraining. The architecture looks less like "cloud with edge caching" and more like "distributed compute with cloud coordination." The control plane lives in the cloud. The data plane runs where the data lives.&lt;/p&gt;

&lt;p&gt;This is a harder architecture to build and operate than a cloud-native deployment. It requires teams who understand networking, hardware lifecycle management, and distributed systems — skills that many organizations let atrophy during the cloud migration years.&lt;/p&gt;

&lt;h2&gt;
  
  
  What This Means for Infrastructure Teams
&lt;/h2&gt;

&lt;p&gt;If you're an infrastructure leader planning AI capacity for the next 2-3 years, here's the framework I'd use:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Audit your inference costs first.&lt;/strong&gt; Most organizations are surprised by how much they're spending on cloud GPU instances for inference once they aggregate across teams and projects. This number is your baseline for a hybrid business case.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Map data gravity.&lt;/strong&gt; Where does your training data originate? Where do inference requests come from? Where do results need to arrive? If the answer to all three is "the same cloud region," stay in the cloud. If it's "twelve different locations across three countries," you need a distributed strategy.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Don't build a GPU data center.&lt;/strong&gt; Colocation with GPU leasing gives you the economics of owned hardware without the capital expenditure and refresh cycles. Companies like DataBank, Equinix, and CoreWeave are building exactly this model — dense GPU compute in colocation facilities with direct cloud interconnects.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Plan for heterogeneous accelerators.&lt;/strong&gt; NVIDIA's dominance in training is real, but inference has viable alternatives — AMD Instinct, Intel Gaudi, AWS Inferentia, Google TPUs. A hybrid strategy lets you match accelerators to workload profiles instead of paying the NVIDIA tax on everything.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Invest in platform engineering.&lt;/strong&gt; Hybrid AI infrastructure without a solid platform layer becomes an operational nightmare. You need consistent deployment pipelines, observability, and model lifecycle management that works across cloud regions, colo facilities, and edge locations. Kubernetes helps here, but it's the starting point, not the whole answer.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Uncomfortable Reality
&lt;/h2&gt;

&lt;p&gt;Going hybrid is operationally harder than going all-in on a single cloud provider. Anyone who tells you otherwise is selling colocation space. You'll manage more vendor relationships, more network paths, more failure modes.&lt;/p&gt;

&lt;p&gt;But the economics and the regulatory environment have shifted enough that "just put it all in AWS" is no longer a defensible strategy for AI-heavy workloads. The organizations figuring out hybrid now — while GPU supply is still constrained and cloud pricing remains elevated — will have a meaningful cost advantage over those who wait.&lt;/p&gt;

&lt;p&gt;The cloud isn't going away. It's just no longer the default answer for every AI workload. And the sooner infrastructure teams internalize that distinction, the better positioned they'll be when AI spending goes from "experimental budget" to "largest line item on the infrastructure bill."&lt;/p&gt;

</description>
      <category>ai</category>
      <category>architecture</category>
      <category>cloud</category>
      <category>news</category>
    </item>
    <item>
      <title>Enterprise AI has an 80% failure rate. The models aren't the problem. What is?</title>
      <dc:creator>Michael Tuszynski</dc:creator>
      <pubDate>Sat, 21 Mar 2026 18:27:58 +0000</pubDate>
      <link>https://forem.com/michaeltuszynski/enterprise-ai-has-an-80-failure-rate-the-models-arent-the-problem-what-is-16k0</link>
      <guid>https://forem.com/michaeltuszynski/enterprise-ai-has-an-80-failure-rate-the-models-arent-the-problem-what-is-16k0</guid>
      <description>&lt;h1&gt;
  
  
  Enterprise AI Fails at 80% — And the Models Have Nothing to Do With It
&lt;/h1&gt;

&lt;p&gt;Most enterprise AI projects die quietly. No dramatic failure, no post-mortem email chain. They just... stop. The prototype gets a demo, leadership nods approvingly, and then six months later the Slack channel goes silent and the budget gets reallocated.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://matthopkins.com/business/the-154-billion-mistake-why-80-percent-of-companies-get-nothing-from-ai/" rel="noopener noreferrer"&gt;Roughly 80% of enterprise AI projects fail&lt;/a&gt; — double the failure rate of traditional software projects. That number has held steady for years now, even as the models themselves have gotten dramatically better. GPT-4, Claude, Gemini — pick your favorite. They all work. They work remarkably well, actually.&lt;/p&gt;

&lt;p&gt;So why does the enterprise keep fumbling the ball?&lt;/p&gt;

&lt;h2&gt;
  
  
  The Model Isn't Your Problem. Your Plumbing Is.
&lt;/h2&gt;

&lt;p&gt;Here's what I keep seeing from the architecture side: teams treat AI like a features problem when it's actually an infrastructure problem. They spin up a proof of concept using an API, get promising results in a notebook, and then hit a wall when someone asks "OK, how do we run this in production?"&lt;/p&gt;

&lt;p&gt;That wall has a name. It's called &lt;a href="https://medium.com/@archie.kandala/the-production-ai-reality-check-why-80-of-ai-projects-fail-to-reach-production-849daa80b0f3" rel="noopener noreferrer"&gt;the deployment gap&lt;/a&gt; — the distance between a working model and a production system that real users depend on. And it's enormous.&lt;/p&gt;

&lt;p&gt;A platform engineer on Reddit &lt;a href="https://www.reddit.com/r/platformengineering/comments/1ryqpn3/enterprise_ai_has_an_80_failure_rate_the_models/" rel="noopener noreferrer"&gt;put it bluntly&lt;/a&gt;: the failure pattern repeats across organizations regardless of size or industry. The models work fine. The org doesn't.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Five Ways Smart Companies Still Blow It
&lt;/h2&gt;

&lt;p&gt;I've watched this play out at dozens of enterprise accounts. The failure modes are predictable enough to catalog.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. They Solve the Wrong Problem
&lt;/h3&gt;

&lt;p&gt;This is the most common and most expensive mistake. A team picks a use case because it sounds impressive in a board deck, not because it maps to an actual operational bottleneck. "We'll use AI to predict customer churn!" Great. Do you have clean customer data? Do you have a process to act on those predictions? Is churn actually your biggest revenue leak?&lt;/p&gt;

&lt;p&gt;&lt;a href="https://matthopkins.com/business/the-154-billion-mistake-why-80-percent-of-companies-get-nothing-from-ai/" rel="noopener noreferrer"&gt;Companies that fail at AI overwhelmingly choose the wrong problems first&lt;/a&gt;. They optimize for what's exciting instead of what's painful. The successful projects I've seen start with someone saying "this manual process costs us 200 hours a month and we keep getting it wrong."&lt;/p&gt;

&lt;h3&gt;
  
  
  2. They Live in Data Fantasy Land
&lt;/h3&gt;

&lt;p&gt;Every AI project starts with an assumption about data quality that turns out to be wildly optimistic. The data exists, sure. It's in four different systems, three different formats, with no consistent identifiers, maintained by teams who don't talk to each other.&lt;/p&gt;

&lt;p&gt;I worked with an enterprise that wanted to build an AI-powered inventory optimization system. The model was straightforward. The data pipeline took eleven months — not because the engineering was hard, but because getting three business units to agree on what "inventory" meant took that long.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. They Skip the Platform Layer
&lt;/h3&gt;

&lt;p&gt;This one hits close to home. Teams build AI applications without investing in the platform that supports them. No model registry. No feature store. No monitoring for drift. No rollback mechanism. No cost controls.&lt;/p&gt;

&lt;p&gt;Then the model goes sideways in production — and it will — and there's no way to detect it, debug it, or revert it. You're flying blind with a system that's making real decisions.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://medium.com/@archie.kandala/the-production-ai-reality-check-why-80-of-ai-projects-fail-to-reach-production-849daa80b0f3" rel="noopener noreferrer"&gt;The production gap isn't a model problem — it's a platform engineering problem&lt;/a&gt;. The organizations that ship AI successfully treat ML infrastructure with the same rigor they'd apply to any other production system: observability, CI/CD, access controls, cost management.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. They Start With Tech Instead of Humans
&lt;/h3&gt;

&lt;p&gt;I've seen teams spend months evaluating which LLM to use, which vector database to pick, whether to fine-tune or RAG, which embedding model performs best on their benchmark — and zero time figuring out who will actually use this thing and how it fits into their workflow.&lt;/p&gt;

&lt;p&gt;The best AI system in the world is worthless if the end user Alt-F4s out of it because it adds three clicks to their existing process. &lt;a href="https://matthopkins.com/business/the-154-billion-mistake-why-80-percent-of-companies-get-nothing-from-ai/" rel="noopener noreferrer"&gt;Starting with tech instead of humans&lt;/a&gt; is the classic engineering trap: we build what's interesting to build, not what's useful to use.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. They Treat AI as a Project, Not a Product
&lt;/h3&gt;

&lt;p&gt;AI models degrade. The world changes, user behavior shifts, data distributions drift. A model that was 94% accurate in January might be 71% accurate by June. Traditional software doesn't do this. You deploy a calculator app and it keeps calculating correctly forever.&lt;/p&gt;

&lt;p&gt;AI requires ongoing investment: retraining, monitoring, evaluation, data quality maintenance. When leadership treats AI as a one-time project with a ship date and a done state, they're guaranteeing that the system will rot.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Actually Works
&lt;/h2&gt;

&lt;p&gt;After 25 years in tech — the last several spent watching enterprise AI projects succeed and fail — here's what separates the 20% that ship from the 80% that don't.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Start with the workflow, not the model.&lt;/strong&gt; Find a process where humans are doing repetitive cognitive work, making inconsistent decisions, or drowning in volume. Build AI into that workflow. Not as a standalone app — as an augmentation of what people already do.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Invest in platform before product.&lt;/strong&gt; You need model serving infrastructure, monitoring, cost tracking, and rollback capabilities before you need a sophisticated model. A simple model on a solid platform beats a sophisticated model on nothing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Set a 90-day production deadline.&lt;/strong&gt; If your AI project hasn't touched a real user in 90 days, it probably never will. Scope ruthlessly. Ship something small. Learn from real usage. The organizations that perpetually prototype never ship.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Budget for operations, not just development.&lt;/strong&gt; AI is more like a garden than a bridge. You don't build it and walk away. Plan for ongoing model evaluation, data quality work, and retraining cycles. If your budget only covers development, you're planning to fail.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Make the ROI case boring and specific.&lt;/strong&gt; Not "AI will transform our customer experience." Instead: "This model will reduce manual review time from 6 hours to 45 minutes per day for the claims processing team, saving $280K annually." When the value is that concrete, the project survives leadership changes and budget cuts.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Uncomfortable Truth
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://matthopkins.com/business/the-154-billion-mistake-why-80-percent-of-companies-get-nothing-from-ai/" rel="noopener noreferrer"&gt;The enterprise AI failure rate represents roughly $154 billion in wasted spend&lt;/a&gt;. That money didn't evaporate because GPT wasn't smart enough. It evaporated because organizations treated AI adoption as a technology challenge when it's actually an organizational design challenge.&lt;/p&gt;

&lt;p&gt;The models are good enough. They've been good enough for a while now. The question was never "can AI do this?" — it's "can your organization support AI doing this?"&lt;/p&gt;

&lt;p&gt;If you can't answer yes to that second question, no amount of model capability will save you. Fix the plumbing. Define the problem. Invest in the platform. Then — and only then — worry about which model to use.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Beyond Comprehension Debt: Why Context Architecture Is the Real AI Moat</title>
      <dc:creator>Michael Tuszynski</dc:creator>
      <pubDate>Thu, 19 Mar 2026 03:00:36 +0000</pubDate>
      <link>https://forem.com/michaeltuszynski/beyond-comprehension-debt-why-context-architecture-is-the-real-ai-moat-kfk</link>
      <guid>https://forem.com/michaeltuszynski/beyond-comprehension-debt-why-context-architecture-is-the-real-ai-moat-kfk</guid>
      <description>&lt;p&gt;Addy Osmani dropped a piece last week that's been making the rounds: "&lt;a href="https://addyosmani.com/blog/comprehension-debt/" rel="noopener noreferrer"&gt;Comprehension Debt — The Hidden Cost of AI-Generated Code&lt;/a&gt;." His thesis is sharp. Teams are shipping AI-generated code faster than anyone can understand it. Tests pass, PRs look clean, and nobody notices the growing gap between what's been deployed and what any human actually comprehends. When something breaks at 3am, that gap becomes the bill.&lt;/p&gt;

&lt;p&gt;He's right. And he's not seeing the whole picture.&lt;/p&gt;

&lt;p&gt;Osmani diagnosed one symptom of a larger condition. Comprehension debt — the gap between shipped code and understood code — is real, and it matters. But it's one line item on a ledger that most engineering organizations haven't even opened yet. If you're a CTO or VP of Engineering adopting AI-assisted development, comprehension debt is the problem you can &lt;em&gt;see&lt;/em&gt;. The ones that will actually sink you are the ones you can't.&lt;/p&gt;

&lt;h2&gt;
  
  
  The AI Debt Paradox
&lt;/h2&gt;

&lt;p&gt;There's a seductive argument circulating in engineering circles right now: AI makes rewriting cheap, so tech debt doesn't matter anymore. Why maintain a crumbling monolith when you can regenerate services in an afternoon?&lt;/p&gt;

&lt;p&gt;It's about 40% right, which makes it dangerous.&lt;/p&gt;

&lt;p&gt;Yes, the cost curve of rewriting code has collapsed. For bounded, well-specified modules, AI absolutely turns "rewrite" from a quarter-long initiative into a day's work. The classic excuse — "we can't touch that, it'll take months" — is dying. That's real progress.&lt;/p&gt;

&lt;p&gt;But here's the paradox: every time you "start from scratch," you throw away embedded knowledge about &lt;em&gt;why&lt;/em&gt; decisions were made. The 47 edge cases handled one by one over 18 months. The compliance requirement someone baked in after an audit. The OAuth flow three partners hardcoded against. AI can regenerate the code layer fast. It cannot regenerate the institutional context that made that code correct.&lt;/p&gt;

&lt;p&gt;Tech debt didn't disappear. It shape-shifted. And the new forms are harder to detect, harder to measure, and harder to pay down.&lt;/p&gt;

&lt;h2&gt;
  
  
  Three Debts, Not One
&lt;/h2&gt;

&lt;p&gt;Osmani gave us a name for the first one. Here are the other two.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Comprehension Debt (Osmani's Contribution)
&lt;/h3&gt;

&lt;p&gt;This is the gap between the code your team ships and the code your team understands. Osmani nailed the mechanics: AI-generated code passes review because it &lt;em&gt;looks&lt;/em&gt; right, engineers approve PRs they haven't fully internalized, and the organizational assumption that "reviewed = understood" quietly breaks down.&lt;/p&gt;

&lt;p&gt;The insight is correct. But the prescription — slow down, review more carefully, quiz your engineers — is a &lt;em&gt;cultural&lt;/em&gt; intervention. It treats comprehension debt as a discipline problem. For Google-scale teams with senior engineers and strong review culture, that might work. For the other 95% of engineering organizations? You need more than discipline. You need infrastructure.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Context Debt
&lt;/h3&gt;

&lt;p&gt;This is the one nobody's naming clearly, and it's the most dangerous.&lt;/p&gt;

&lt;p&gt;Context debt is the accumulated loss of institutional knowledge about &lt;em&gt;why&lt;/em&gt; systems are built the way they are. It's not about whether engineers understand the code in front of them — it's about whether anyone understands the decisions, constraints, trade-offs, and edge cases that shaped it.&lt;/p&gt;

&lt;p&gt;Consider what lives in a mature codebase beyond the code itself:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Architectural rationale.&lt;/strong&gt; Why this service exists as its own deployment rather than a module in the monolith. Why the database schema looks the way it does. Why that particular API contract was chosen over three alternatives that were debated for weeks.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Boundary knowledge.&lt;/strong&gt; Which downstream consumers depend on specific response shapes. Which partner integrations are fragile. Which compliance requirements are baked into the data flow and why.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Failure memory.&lt;/strong&gt; The incident that revealed a race condition nobody anticipated. The scaling problem that drove the caching strategy. The security audit finding that explains the seemingly redundant validation layer.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;None of this lives in the code. Very little of it lives in documentation. Most of it lives in the heads of engineers who were there when the decisions were made. When AI regenerates a service from scratch, it produces code that compiles, passes tests, and handles the happy path. What it cannot produce is the scar tissue — the hard-won understanding of what goes wrong and why.&lt;/p&gt;

&lt;p&gt;Context debt accumulates every time a team rewrites without capturing context first. Every time an AI-generated solution replaces a human-authored one without preserving the &lt;em&gt;reasoning&lt;/em&gt; behind the original. Every time an engineer leaves and their knowledge of why things are the way they are walks out with them.&lt;/p&gt;

&lt;p&gt;This isn't new — context loss has always been a risk in software organizations. What's new is the &lt;em&gt;velocity&lt;/em&gt; at which AI-assisted development can destroy context. When rewriting is cheap, the incentive to understand before replacing drops to zero.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Integration Debt
&lt;/h3&gt;

&lt;p&gt;The third form is architectural. Integration debt is the growing inconsistency between AI-generated components that were each built in isolation without awareness of the broader system.&lt;/p&gt;

&lt;p&gt;AI coding assistants operate within a context window. They see the file you're working on, maybe some adjacent files, maybe a system prompt describing your stack. What they don't see is the full topology of your system — every service, every contract, every shared assumption that holds your architecture together.&lt;/p&gt;

&lt;p&gt;When three different engineers use AI to independently build three services that interact, each service might be internally excellent. Clean code, good patterns, thorough tests. But the interfaces between them — data formats, error handling conventions, retry semantics, authentication flows — will diverge unless someone is deliberately enforcing coherence.&lt;/p&gt;

&lt;p&gt;The numbers back this up. &lt;a href="https://www.coderabbit.ai/blog/2025-was-the-year-of-ai-speed-2026-will-be-the-year-of-ai-quality" rel="noopener noreferrer"&gt;CodeRabbit's 2026 analysis&lt;/a&gt; found teams merged 98% more PRs that were 154% larger year-over-year, while 61% of developers reported that AI produces code that "looks correct but is unreliable." The generation pressure is real. The verification pressure is downstream — and growing.&lt;/p&gt;

&lt;p&gt;Integration debt compounds quietly. It shows up as unexpected failures during deployment. As subtle data inconsistencies between services. As "it works on my machine" problems that are actually contract mismatches between components that were never designed to work together, despite technically needing to.&lt;/p&gt;

&lt;p&gt;The faster you generate components, the faster integration debt accumulates. And unlike code quality — which AI can actually help improve — integration coherence requires exactly the kind of big-picture architectural thinking that AI tools are worst at.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Composition Shift
&lt;/h2&gt;

&lt;p&gt;Here's why this matters strategically, not just technically.&lt;/p&gt;

&lt;p&gt;For two decades, "tech debt" mostly meant code-level debt: poor abstractions, missing tests, duplicated logic, outdated dependencies. That's the debt AI is genuinely good at paying down. Refactoring, test generation, dependency updates, code modernization — these are tasks where AI excels. If your tech debt balance sheet was entirely code-level debt, the "AI makes debt irrelevant" crowd would be right.&lt;/p&gt;

&lt;p&gt;But in any system of real complexity, code-level debt was always just the visible portion. The deeper liabilities — context, comprehension, integration — were always there. They were just overshadowed by the sheer volume of code-level problems.&lt;/p&gt;

&lt;p&gt;AI didn't eliminate tech debt. It paid down the most visible kind, revealing the structural kinds that were hiding underneath. The balance sheet didn't shrink. The composition changed. And most organizations are still using the old chart of accounts.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://dora.dev/research/2024/" rel="noopener noreferrer"&gt;2024 DORA report&lt;/a&gt; hints at this shift: despite widespread AI adoption, throughput dipped 1.5% and stability dropped 7.2% across 39,000 respondents. Teams are generating more code and shipping it less reliably. The metrics that matter — lead time, change failure rate, recovery time — aren't improving with velocity alone.&lt;/p&gt;

&lt;h2&gt;
  
  
  Context as Infrastructure, Not Culture
&lt;/h2&gt;

&lt;p&gt;This is where I part ways with the prevailing conversation.&lt;/p&gt;

&lt;p&gt;Osmani and others are framing these new debts as cultural and organizational challenges. Slow down. Review more carefully. Keep humans in the loop. These aren't wrong, but they're incomplete — and for many teams, impractical. You can't tell a startup burning runway to slow down their AI-assisted velocity for the sake of comprehension hygiene. You can't tell a team of five to institute Google-style code review rituals.&lt;/p&gt;

&lt;p&gt;What you &lt;em&gt;can&lt;/em&gt; do is treat context as infrastructure.&lt;/p&gt;

&lt;p&gt;I've been arguing for a while now that context management is the real skill gap in AI-assisted development — that getting value from AI tools is less about prompt engineering and more about maintaining rich, current, accessible context that those tools can leverage. (I wrote about this in "&lt;a href="https://mpt.solutions/context-management-generative-ai/" rel="noopener noreferrer"&gt;This Above All: To Thine Own Context Be True&lt;/a&gt;" earlier this year.)&lt;/p&gt;

&lt;p&gt;The same principle applies to the debt problem, but at organizational scale. Context debt isn't an inevitable consequence of AI adoption. It's an infrastructure failure — a failure to build systems that capture, maintain, and surface the institutional knowledge that makes code comprehensible and architecturally coherent.&lt;/p&gt;

&lt;p&gt;What does context infrastructure look like in practice?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Living architectural decision records.&lt;/strong&gt; Not dusty wiki pages nobody updates, but actively maintained documents that live alongside the code and get updated as part of the development workflow. &lt;a href="https://www.cognitect.com/blog/2011/11/15/documenting-architecture-decisions" rel="noopener noreferrer"&gt;Michael Nygard formalized the ADR pattern&lt;/a&gt; back in 2011 — Title, Status, Context, Decision, Consequences. The format is fifteen years old. What's changed is that AI makes the cost of &lt;em&gt;not&lt;/em&gt; having ADRs catastrophically higher. When AI generates a new implementation, the context for &lt;em&gt;why&lt;/em&gt; the old implementation existed this way needs to be right there — available to the engineer reviewing the change and to the AI generating it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Structured project memory.&lt;/strong&gt; Tools and conventions so the reasoning behind decisions persists beyond the individual who made them. This means treating context documents — system descriptions, constraint inventories, edge case catalogs — as first-class artifacts that get versioned, reviewed, and maintained with the same rigor as code.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Integration contracts as explicit artifacts.&lt;/strong&gt; Rather than letting service interfaces emerge organically from individually-generated components, defining and maintaining explicit contracts that AI tools can reference during generation. The contract becomes the source of truth for integration coherence, not the individual developer's mental model of how everything fits together.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Context-aware generation workflows.&lt;/strong&gt; Configuring AI tools to ingest project context before generating code, rather than generating in a vacuum and hoping for coherence. This means investing in the scaffolding — the context files, the system prompts, the reference documents — that turn AI from a talented but amnesiac intern into a contributor who understands the system they're working within.&lt;/p&gt;

&lt;p&gt;None of this is revolutionary. It's the kind of engineering discipline that good teams have always practiced. What's different is the urgency. When humans wrote all the code, context accumulated naturally — slowly, and with gaps, but it accumulated. When AI generates code at 10x the velocity, context dissipates at 10x the rate unless you deliberately counteract it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Assessment Gap
&lt;/h2&gt;

&lt;p&gt;The biggest opportunity right now isn't in tooling — it's in assessment.&lt;/p&gt;

&lt;p&gt;Most engineering organizations have no way to measure their exposure to these new forms of debt. They can tell you their test coverage percentage, their deployment frequency, their mean time to recovery. They cannot tell you how much institutional context has been lost in the last six months of AI-assisted development. They cannot quantify how many of their AI-generated components have integration assumptions that conflict with each other. They cannot assess whether their team's comprehension of the codebase has kept pace with its growth.&lt;/p&gt;

&lt;p&gt;This is the gap that needs to be closed first. Before you can manage context debt, comprehension debt, and integration debt, you need to be able to see them. And right now, almost nobody can.&lt;/p&gt;

&lt;h2&gt;
  
  
  What This Means
&lt;/h2&gt;

&lt;p&gt;If you're leading an engineering organization that's adopting AI-assisted development — and at this point, that's nearly everyone — the question isn't whether these new forms of debt are accumulating. They are. The question is whether you're managing them deliberately or discovering them during incidents.&lt;/p&gt;

&lt;p&gt;The organizations that will thrive in the AI era aren't the ones that generate code fastest. They're the ones that maintain the richest context while moving fast. That's a different capability than what most teams are building right now, and it's one that compounds over time. The team that invests in context architecture today will be moving faster &lt;em&gt;and&lt;/em&gt; more safely a year from now. The team that optimizes only for generation velocity will be drowning in debt they can't see and can't name.&lt;/p&gt;

&lt;p&gt;The old tech debt conversation was about code quality. The new one is about knowledge architecture. And it's just getting started.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Michael Tuszynski is the founder of &lt;a href="https://mpt.solutions" rel="noopener noreferrer"&gt;MPT Solutions&lt;/a&gt;, where he writes about AI strategy, cloud architecture, and engineering leadership. With 25 years in software — including six years as a Senior Solutions Architect at AWS and a stint as CTO at Fandor — he focuses on helping teams adopt AI-assisted development without sacrificing the institutional context that makes their systems work.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Previously: &lt;a href="https://mpt.solutions/context-management-generative-ai/" rel="noopener noreferrer"&gt;This Above All: To Thine Own Context Be True&lt;/a&gt; — on why context management matters more than prompt engineering.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>engineeringleadership</category>
      <category>technicaldebt</category>
      <category>contextarchitecture</category>
    </item>
    <item>
      <title>Why does nobody teach the infrastructure problems that destroy developer productivity before production breaks</title>
      <dc:creator>Michael Tuszynski</dc:creator>
      <pubDate>Wed, 18 Mar 2026 02:36:58 +0000</pubDate>
      <link>https://forem.com/michaeltuszynski/why-does-nobody-teach-the-infrastructure-problems-that-destroy-developer-productivity-before-32mb</link>
      <guid>https://forem.com/michaeltuszynski/why-does-nobody-teach-the-infrastructure-problems-that-destroy-developer-productivity-before-32mb</guid>
      <description>&lt;h1&gt;
  
  
  The Production Gap: Why Nobody Teaches the Infrastructure That Actually Matters
&lt;/h1&gt;

&lt;p&gt;Every bootcamp, CS program, and YouTube tutorial series teaches you how to build features. Almost none of them teach you what happens when those features meet real traffic, real failure modes, and real users who do things you never anticipated.&lt;/p&gt;

&lt;p&gt;The result is predictable: developers ship code that works on their laptop, passes CI, and then falls apart the moment it hits production at scale. Not because the logic is wrong — because nobody taught them about connection pooling, graceful degradation, or what happens when your database runs out of connections at 2 AM on a Saturday.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Curriculum Blind Spot
&lt;/h2&gt;

&lt;p&gt;A &lt;a href="https://www.reddit.com/r/ExperiencedDevs/comments/1rvyprt/why_does_nobody_teach_the_infrastructure_problems/" rel="noopener noreferrer"&gt;thread on r/ExperiencedDevs&lt;/a&gt; captured this frustration perfectly: educational content focuses almost entirely on writing code and building features, while operational concerns — monitoring, error handling, memory management, rate limiting — only become relevant when applications break in production. By then, you're learning under fire.&lt;/p&gt;

&lt;p&gt;This isn't a minor gap. It's the gap between "I can build software" and "I can build software that stays running." And it's enormous.&lt;/p&gt;

&lt;p&gt;Think about what a typical full-stack course covers: React components, REST APIs, database queries, authentication flows. Maybe some Docker basics. Now think about what actually causes production incidents: thread pool exhaustion, cascading failures from a single downstream dependency, memory leaks that only manifest after 72 hours of uptime, DNS resolution failures, certificate expiration, connection storms after a deploy.&lt;/p&gt;

&lt;p&gt;These aren't edge cases. They're Tuesday.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why This Gap Exists
&lt;/h2&gt;

&lt;p&gt;Three forces keep operational knowledge out of the curriculum.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;First, it's hard to teach without real systems.&lt;/strong&gt; You can't simulate connection pool exhaustion on a laptop running SQLite. You can't demonstrate cascading failures with a single-service tutorial app. The infrastructure problems that destroy productivity only emerge at a certain scale of complexity, traffic, and time — none of which exist in a classroom.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Second, it's not glamorous.&lt;/strong&gt; "Build a full-stack app in 30 minutes" gets clicks. "Understanding TCP keepalive settings and why they matter for your connection pool" does not. Content creators optimize for engagement, and operational topics feel boring until the moment they're the only thing that matters.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Third, the people who know this stuff learned it the hard way and are too busy to teach it.&lt;/strong&gt; The senior SRE who understands why your Kubernetes pods are getting OOMKilled at 3x expected memory usage is probably dealing with an incident right now, not writing blog posts. Operational knowledge lives in war stories, incident retrospectives, and tribal knowledge passed between teammates — not in structured curricula.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Real Cost
&lt;/h2&gt;

&lt;p&gt;This isn't just an education problem. It's a &lt;a href="https://medium.com/@chain.love/why-developer-productivity-is-really-an-infra-problem-af89528aef7a" rel="noopener noreferrer"&gt;productivity problem that masquerades as a people problem&lt;/a&gt;. When teams complain about slow velocity, the instinct is to look at process, hiring, or morale. But often the real bottleneck is that developers spend hours debugging infrastructure issues they were never trained to anticipate.&lt;/p&gt;

&lt;p&gt;A developer who doesn't understand connection pooling will open a new database connection per request, wonder why the app works in dev but times out under load, and then spend two days tracking down the issue. A developer who doesn't understand backpressure will build a message consumer that looks correct but silently drops events when the queue backs up. A developer who doesn't understand DNS caching will deploy a service that works perfectly until the load balancer rotates IPs.&lt;/p&gt;

&lt;p&gt;Each of these costs days — sometimes weeks — of debugging time. Multiply that across a team, and &lt;a href="https://coder.com/blog/the-uncomfortable-truth-about-developer-productivity-in-apac-tools-arent-the-prob" rel="noopener noreferrer"&gt;the infrastructure gap becomes the single largest drag on developer productivity&lt;/a&gt;. Not the tools, not the process, not the sprint ceremonies. The fact that half the team has never been taught how production systems actually behave.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Specific Knowledge That's Missing
&lt;/h2&gt;

&lt;p&gt;Here's my list of operational topics that every developer should understand before they're responsible for a production system. None of these show up in a typical CS degree or bootcamp:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Connection management.&lt;/strong&gt; How connection pools work, why they have limits, what happens when you exhaust them, and how to size them for your workload. This single topic prevents more production incidents than any framework feature.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Graceful degradation.&lt;/strong&gt; What your application should do when a dependency is slow or unavailable. The answer is never "throw a 500 and hope for the best," but that's what most tutorial code does.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Observability fundamentals.&lt;/strong&gt; Not "install Datadog" — actual understanding of what metrics matter, how to correlate logs across services, what a useful alert looks like vs. one that wakes you up for nothing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Memory and resource management.&lt;/strong&gt; How garbage collection actually works in your runtime. What causes memory leaks in languages that claim to manage memory for you. Why your Node.js service uses 2GB of RAM after running for a week.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Rate limiting and backpressure.&lt;/strong&gt; How to protect your service from being overwhelmed, and how to be a good citizen when calling someone else's service. This is the difference between a service that handles traffic spikes and one that cascades failures across your entire platform.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Failure modes of distributed systems.&lt;/strong&gt; Partial failures, network partitions, split-brain scenarios, exactly-once delivery myths. You don't need a PhD in distributed systems theory, but you need to understand that the network is not reliable, clocks are not synchronized, and retries without backoff are a denial-of-service attack on your own infrastructure.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Should Change
&lt;/h2&gt;

&lt;p&gt;I'm not expecting universities to overhaul their CS curricula overnight. But a few things would help.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bootcamps should include a "production readiness" module.&lt;/strong&gt; Before graduation, every student should deploy an app, load test it until it breaks, diagnose the failure, and fix it. That single exercise teaches more about real-world engineering than a semester of algorithm problems.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Senior engineers need to write down what they know.&lt;/strong&gt; The gap persists partly because operational knowledge stays locked in people's heads. Incident retrospectives should be shared broadly. Internal tech talks on "how we debugged X" are worth 10x more than another talk on the latest framework.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Companies should invest in structured onboarding for production systems.&lt;/strong&gt; Don't throw a new hire at the codebase and hope they figure out the monitoring stack. Walk them through the architecture, show them where things break, explain the failure modes you've already seen. This is not hand-holding — it's preventing the next incident.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Platform teams should build paved roads.&lt;/strong&gt; If connection pooling is tricky, provide a standard library that does it correctly. If observability requires too much configuration, bake it into the deployment pipeline. Don't rely on every developer independently learning every operational concern — make the right thing the easy thing.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Uncomfortable Truth
&lt;/h2&gt;

&lt;p&gt;The industry has a weird relationship with operational knowledge. We celebrate feature velocity and treat infrastructure work as unglamorous plumbing. We promote the developer who shipped the flashy new feature and overlook the one who quietly prevented 47 production incidents through better error handling and circuit breakers.&lt;/p&gt;

&lt;p&gt;Until we value the skills that keep systems running as much as the skills that build new ones, the production gap will persist. New developers will keep learning the hard way — at 2 AM, on a Saturday, with a Slack channel full of escalations and no idea why the connection pool is exhausted.&lt;/p&gt;

&lt;p&gt;The fix starts with acknowledging that knowing how to write code and knowing how to run code are two different skills. We teach the first one extensively. The second one, we mostly leave to chance. That's a choice, and it's the wrong one.&lt;/p&gt;

</description>
      <category>devops</category>
      <category>learning</category>
      <category>productivity</category>
      <category>softwareengineering</category>
    </item>
    <item>
      <title>The AI CapEx Arms Race Is Coming for Your Cloud Bill</title>
      <dc:creator>Michael Tuszynski</dc:creator>
      <pubDate>Mon, 16 Mar 2026 13:34:54 +0000</pubDate>
      <link>https://forem.com/michaeltuszynski/the-ai-capex-arms-race-is-coming-for-your-cloud-bill-mn6</link>
      <guid>https://forem.com/michaeltuszynski/the-ai-capex-arms-race-is-coming-for-your-cloud-bill-mn6</guid>
      <description>&lt;p&gt;The three major cloud providers are spending money like it's going out of style. Oracle is &lt;a href="https://www.techtarget.com/searchcloudcomputing/news/366638851/Cloud-infrastructure-suffers-AI-growing-pains" rel="noopener noreferrer"&gt;financing a $300 billion deal with OpenAI through $50 billion in stock sales and debt&lt;/a&gt;. Google just pledged to double its capital spending. And while the press releases talk about "meeting demand," some providers have &lt;a href="https://www.techtarget.com/searchcloudcomputing/news/366638851/Cloud-infrastructure-suffers-AI-growing-pains" rel="noopener noreferrer"&gt;quietly raised prices on existing services&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Someone has to pay for all that GPU infrastructure. If you're running workloads on any of the major clouds, that someone is you.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Numbers Don't Add Up for Customers
&lt;/h2&gt;

&lt;p&gt;Let's put Oracle's deal in perspective. $300 billion is roughly the GDP of Ireland. Financing $50 billion of that through stock dilution and debt means Oracle needs massive returns from AI infrastructure to justify the capital structure. The same logic applies to Google doubling its CapEx — these aren't charity projects, and the ROI has to come from somewhere.&lt;/p&gt;

&lt;p&gt;That somewhere is cloud pricing.&lt;/p&gt;

&lt;p&gt;Here's what makes this cycle different from previous infrastructure buildouts. When AWS, Azure, and GCP built out their initial cloud regions, they were competing for greenfield workloads. Prices trended down because the providers were buying market share. The AI infrastructure buildout flips that dynamic. Providers are spending enormous sums on specialized hardware — GPUs, custom AI chips, liquid cooling systems — that serves a narrow set of workloads. And they're doing it while &lt;a href="https://www.techtarget.com/searchcloudcomputing/news/366638851/Cloud-infrastructure-suffers-AI-growing-pains" rel="noopener noreferrer"&gt;demand appears insatiable&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;When demand outstrips supply and the capital costs are this high, prices go up. Not just for AI services — for everything running on the same infrastructure.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Quiet Price Creep Nobody's Tracking
&lt;/h2&gt;

&lt;p&gt;The stealth price increases are the part that should worry enterprise IT leaders most. GPU instance pricing gets the headlines, but the real cost pressure shows up in the boring stuff: egress fees, storage tiers, network transit, and support plans.&lt;/p&gt;

&lt;p&gt;Cloud providers have a well-documented playbook here. They absorb you with competitive initial pricing, build switching costs through proprietary services, then adjust pricing once you're locked in. The AI spending spree accelerates this pattern because the providers need to recoup capital faster.&lt;/p&gt;

&lt;p&gt;I've watched this movie before. In 2022-2023, all three major clouds quietly adjusted reserved instance pricing, modified savings plan terms, and restructured support tiers. Most enterprises didn't notice until their next true-up. The AI CapEx cycle will produce the same pattern, just bigger.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Hybrid Tax Gets More Expensive
&lt;/h2&gt;

&lt;p&gt;AWS recently entered the hybrid AI infrastructure market with &lt;a href="https://www.techtarget.com/searchitoperations/news/366636033/AWS-AI-Factories-target-hybrid-cloud-AI-infrastructure" rel="noopener noreferrer"&gt;AI Factories&lt;/a&gt;, joining an already crowded field. Every major provider now offers some flavor of "run AI on your hardware, managed by our control plane." The pitch sounds good: keep sensitive data on-prem, use cloud for burst capacity, get the best of both worlds.&lt;/p&gt;

&lt;p&gt;The reality is more complicated. These hybrid offerings create a new dependency layer. You're not just buying compute — you're buying into an orchestration framework, a model serving stack, and a monitoring stack that ties back to the provider's cloud. The more AI infrastructure you deploy through these hybrid products, the harder it becomes to move workloads between providers or back to fully self-managed infrastructure.&lt;/p&gt;

&lt;p&gt;This matters because when the provider raises prices — and they will — your negotiating position is weaker than it was before you adopted their hybrid AI stack.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Smart Teams Are Doing Right Now
&lt;/h2&gt;

&lt;p&gt;The enterprises handling this well share three characteristics:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;They're tracking AI infrastructure costs separately from general cloud spend.&lt;/strong&gt; Most FinOps practices lump GPU instances, AI API calls, and model training costs into their general cloud bill. That makes it impossible to see the AI cost trajectory independently. Break it out. You need a clear trendline on AI-specific spending to make informed build-vs-buy decisions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;They're building abstraction layers before they need them.&lt;/strong&gt; The teams that survived the last round of cloud price adjustments had already abstracted their workloads away from provider-specific services. The same principle applies to AI infrastructure. If your inference pipeline is hard-coded to SageMaker or Vertex AI, you have zero negotiating power when pricing changes. Tools like KServe, Ray Serve, or even a simple API gateway in front of your model endpoints give you options.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;They're doing the math on owned infrastructure.&lt;/strong&gt; For sustained AI workloads — inference serving at steady-state volume, fine-tuning jobs on a regular cadence — the economics of owned GPU clusters have shifted significantly. An NVIDIA H100 that costs $30,000 to buy will cost you $40,000+ per year to rent from a cloud provider at current rates. If your utilization stays above 60%, owned hardware wins on a 2-year horizon. That calculation gets even more favorable as cloud prices creep up.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Consolidation Nobody's Talking About
&lt;/h2&gt;

&lt;p&gt;There's a second-order effect of this spending race that deserves attention. Not every cloud provider can sustain this level of capital investment. Oracle's $50 billion financing structure is aggressive. Smaller cloud providers and regional players simply can't compete on AI infrastructure spending.&lt;/p&gt;

&lt;p&gt;This means the market is consolidating around fewer providers with the capital to build AI-scale infrastructure. Fewer providers means less competition. Less competition means higher prices. The AI CapEx arms race is, paradoxically, reducing the competitive pressure that kept cloud pricing in check for the last decade.&lt;/p&gt;

&lt;p&gt;Enterprise architects need to plan for a world where cloud infrastructure costs 15-25% more than it does today, with the increases concentrated in compute and networking. That's not a pessimistic estimate — it's what happens when three companies collectively spend hundreds of billions on infrastructure that needs to generate returns.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Bottom Line
&lt;/h2&gt;

&lt;p&gt;The cloud providers aren't wrong to invest in AI infrastructure. The demand is real, and the companies that build capacity now will capture enormous markets. But don't confuse their strategic interests with yours.&lt;/p&gt;

&lt;p&gt;Your job is to use AI infrastructure cost-effectively, not to subsidize someone else's capital buildout. That means tracking costs obsessively, maintaining architectural flexibility, and being willing to own hardware when the math supports it.&lt;/p&gt;

&lt;p&gt;The AI infrastructure spending spree will produce better, faster, more capable cloud services. It will also produce higher bills. Plan for both.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>cloudcomputing</category>
      <category>googlecloud</category>
      <category>news</category>
    </item>
    <item>
      <title>Your Platform Team Needs an Agent Policy — Yesterday</title>
      <dc:creator>Michael Tuszynski</dc:creator>
      <pubDate>Sun, 15 Mar 2026 02:28:26 +0000</pubDate>
      <link>https://forem.com/michaeltuszynski/your-platform-team-needs-an-agent-policy-yesterday-eab</link>
      <guid>https://forem.com/michaeltuszynski/your-platform-team-needs-an-agent-policy-yesterday-eab</guid>
      <description>&lt;p&gt;On March 3rd, an attacker &lt;a href="https://www.stepsecurity.io/blog/xygeni-action-compromised-c2-reverse-shell-backdoor-injected-via-tag-poisoning" rel="noopener noreferrer"&gt;compromised the Xygeni GitHub Action&lt;/a&gt; by poisoning a mutable tag. Every CI runner referencing &lt;code&gt;xygeni/xygeni-action@v5&lt;/code&gt; quietly started executing a reverse shell to a C2 server. The exposure window lasted a week. &lt;a href="https://github.com/xygeni/xygeni-action/security/advisories/GHSA-f8q5-h5qh-33mh" rel="noopener noreferrer"&gt;137+ repositories were affected&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The root cause wasn't exotic. A GitHub App private key with overly broad permissions got compromised. Combined with a maintainer's personal access token, the attacker could create a PR and move the tag — no human review required.&lt;/p&gt;

&lt;p&gt;This is what happens when automated actors run without governance. And it's about to get much worse.&lt;/p&gt;

&lt;h2&gt;
  
  
  Agents Are a New User Persona
&lt;/h2&gt;

&lt;p&gt;Your platform team already manages identities for developers, service accounts, and CI bots. But AI agents are a fundamentally different category.&lt;/p&gt;

&lt;p&gt;A developer reads docs, thinks, and opens a PR. A service account runs a fixed script. An AI agent does something in between — it reasons about what to do, then acts. It might create infrastructure, modify configurations, call APIs, or chain together a dozen tools. The blast radius of a compromised or misconfigured agent is closer to a rogue admin than a broken cron job.&lt;/p&gt;

&lt;p&gt;Yet most organizations treat agents like any other service account. Same IAM roles. Same broad permissions. Same lack of runtime monitoring.&lt;/p&gt;

&lt;p&gt;The numbers back this up. A &lt;a href="https://www.gravitee.io/blog/state-of-ai-agent-security-2026-report-when-adoption-outpaces-control" rel="noopener noreferrer"&gt;2026 Gravitee report&lt;/a&gt; found that 80.9% of technical teams have pushed agents into active testing or production, but only 14.4% went live with full security and IT approval. And here's the kicker: &lt;a href="https://www.gravitee.io/state-of-ai-agent-security" rel="noopener noreferrer"&gt;82% of executives&lt;/a&gt; feel confident their existing policies cover unauthorized agent actions, while &lt;a href="https://securityboulevard.com/2026/02/the-invisible-risk-1-5-million-unmonitored-ai-agents-threaten-corporate-security/" rel="noopener noreferrer"&gt;only 21% have actual visibility&lt;/a&gt; into what their agents access, which tools they call, or what data they touch.&lt;/p&gt;

&lt;p&gt;That's not a gap. That's a canyon.&lt;/p&gt;

&lt;h2&gt;
  
  
  What an Agent Policy Actually Looks Like
&lt;/h2&gt;

&lt;p&gt;An agent policy isn't a PDF that legal signs off on. It's a set of enforced constraints that your platform team builds into the golden path. Here's what that means in practice:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Identity and RBAC&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Every agent gets a dedicated identity — not a shared service account, not a developer's credentials. Each identity maps to a role with explicitly scoped permissions. If an agent writes Terraform, it gets write access to the specific modules it manages and nothing else.&lt;/p&gt;

&lt;p&gt;This sounds obvious. In practice, most teams hand agents the same broad IAM role they use for local development because it's faster to ship.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Runtime Boundaries&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Static permissions aren't enough. Agents make decisions at runtime, and those decisions need guardrails:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Rate limits on API calls and resource creation&lt;/li&gt;
&lt;li&gt;Allowlists for which tools and endpoints an agent can invoke&lt;/li&gt;
&lt;li&gt;Cost ceilings per execution (an agent that spins up 50 GPU instances because the prompt was ambiguous is an expensive mistake)&lt;/li&gt;
&lt;li&gt;Mandatory human-in-the-loop for destructive operations — deleting resources, modifying security groups, pushing to main&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Audit and Observability&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Every agent action should produce a trace. Not just logs — structured traces that capture the reasoning chain, the tools invoked, the data accessed, and the outcome. When something goes wrong (and it will), you need to reconstruct exactly what the agent did and why.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.cncf.io/blog/2026/01/23/the-autonomous-enterprise-and-the-four-pillars-of-platform-control-2026-forecast/" rel="noopener noreferrer"&gt;The CNCF's 2026 forecast&lt;/a&gt; frames this well: the enterprise shift to autonomy will be defined by four control mechanisms — golden paths, guardrails, safety nets, and manual review workflows. All four apply to agents.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Supply Chain Verification&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The Xygeni attack was a supply chain attack on an automated actor. Your agent policy needs to cover the agents' own dependencies: pinned versions (not mutable tags), signature verification, and provenance checks for any action or tool an agent consumes. If your CI agent references &lt;code&gt;some-action@v3&lt;/code&gt;, you're trusting that the tag hasn't been moved. Pin to a commit SHA instead.&lt;/p&gt;

&lt;h2&gt;
  
  
  Start With the Blast Radius
&lt;/h2&gt;

&lt;p&gt;You don't need to boil the ocean. Start by answering one question for every agent in production: &lt;em&gt;what's the worst thing this agent could do with its current permissions?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;If the answer makes you uncomfortable, you've found your first policy item.&lt;/p&gt;

&lt;p&gt;From there:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Inventory your agents.&lt;/strong&gt; You can't govern what you can't see. &lt;a href="https://www.globenewswire.com/news-release/2026/03/09/3251861/0/en/OneTrust-Expands-AI-Governance-to-Meet-the-Demands-of-Scalable-Real-Time-AI.html" rel="noopener noreferrer"&gt;OneTrust&lt;/a&gt;, &lt;a href="https://www.cloudeagle.ai/blogs/10-best-ai-governance-platforms-in-2026" rel="noopener noreferrer"&gt;CloudEagle&lt;/a&gt;, and similar platforms now offer agent discovery — continuously scanning for AI agents, their ownership, integrations, and data access.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Scope permissions to the task.&lt;/strong&gt; Apply least-privilege like you would for any identity. An agent that summarizes Jira tickets doesn't need write access to your infrastructure.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Add runtime guardrails before production.&lt;/strong&gt; &lt;a href="https://galileo.ai/blog/announcing-agent-control" rel="noopener noreferrer"&gt;Galileo's open-source Agent Control&lt;/a&gt; and &lt;a href="https://www.paloaltonetworks.com/cyberpedia/what-is-agentic-ai-governance" rel="noopener noreferrer"&gt;Palo Alto's agentic governance tools&lt;/a&gt; are both worth evaluating. The pattern is the same: intercept agent actions at runtime, check them against policy, and block or escalate violations.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Pin your dependencies.&lt;/strong&gt; Mutable tags are a liability. Every action, plugin, or tool your agents consume should be pinned to an immutable reference.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Build the audit trail now.&lt;/strong&gt; Retroactively reconstructing what an agent did is painful. Instrument from day one.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  This Is a Platform Problem
&lt;/h2&gt;

&lt;p&gt;Some teams try to solve agent governance at the application layer — each team building their own guardrails. That doesn't scale, and it doesn't produce consistent policy enforcement.&lt;/p&gt;

&lt;p&gt;This is a platform engineering problem. The same team that builds your internal developer platform, manages your golden paths, and enforces your deployment policies should own agent governance. They have the infrastructure context. They have the policy enforcement mechanisms. And they're already thinking about developer experience, which matters because overly restrictive agent policies that slow teams down will just get bypassed.&lt;/p&gt;

&lt;p&gt;The Xygeni attack was a preview. The attack surface for AI agents in CI/CD, infrastructure management, and code generation is growing fast. Your platform team needs an agent policy — not next quarter, not after the first incident. Yesterday.&lt;/p&gt;

</description>
      <category>platformengineering</category>
      <category>ai</category>
      <category>security</category>
      <category>devops</category>
    </item>
    <item>
      <title>Building Resilient Microservices: Lessons from Production</title>
      <dc:creator>Michael Tuszynski</dc:creator>
      <pubDate>Mon, 09 Dec 2024 21:10:45 +0000</pubDate>
      <link>https://forem.com/michaeltuszynski/building-resilient-microservices-lessons-from-production-1n24</link>
      <guid>https://forem.com/michaeltuszynski/building-resilient-microservices-lessons-from-production-1n24</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fwww.mpt.solutions%2Fcontent%2Fimages%2F2024%2F12%2FBuilding-Resilient-Microservices--Lessons-from-Production.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fwww.mpt.solutions%2Fcontent%2Fimages%2F2024%2F12%2FBuilding-Resilient-Microservices--Lessons-from-Production.webp" alt="Building Resilient Microservices: Lessons from Production" width="800" height="800"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In today's distributed systems landscape, building resilient microservices isn't just about writing code—it's about preparing for failure at every level. After years of managing production microservices at scale, I've learned that resilience is more about architecture and patterns than individual lines of code. Let me share some battle-tested insights that have proven invaluable in real-world scenarios.&lt;/p&gt;

&lt;h2&gt;
  
  
  Key Resilience Patterns
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Circuit Breakers: Your First Line of Defense
&lt;/h3&gt;

&lt;p&gt;Circuit breakers are essential in preventing cascade failures across your microservices architecture. Think of them as electrical circuit breakers for your code—they automatically "trip" when they detect potential problems, preventing system overload.&lt;/p&gt;

&lt;p&gt;In my experience, implementing circuit breakers has saved our systems countless times, especially during unexpected downstream service failures. The key is to configure them with sensible thresholds:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A failure count threshold (typically 5-10 failures)&lt;/li&gt;
&lt;li&gt;A reset timeout (usually 30-60 seconds)&lt;/li&gt;
&lt;li&gt;A half-open state to test recovery
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;class CircuitBreaker {
  private failures = 0;
  private lastFailureTime?: Date;
  private state: 'CLOSED' | 'OPEN' | 'HALF_OPEN' = 'CLOSED';

  constructor(
    private readonly failureThreshold = 5,
    private readonly resetTimeout = 60000, // 60 seconds
  ) {}

  async execute&amp;lt;T&amp;gt;(operation: () =&amp;gt; Promise&amp;lt;T&amp;gt;): Promise&amp;lt;T&amp;gt; {
    if (this.state === 'OPEN') {
      if (this.shouldAttemptReset()) {
        this.state = 'HALF_OPEN';
      } else {
        throw new Error('Circuit breaker is OPEN');
      }
    }

    try {
      const result = await operation();
      this.onSuccess();
      return result;
    } catch (error) {
      this.onFailure();
      throw error;
    }
  }

  private onSuccess(): void {
    this.failures = 0;
    this.state = 'CLOSED';
  }

  private onFailure(): void {
    this.failures++;
    this.lastFailureTime = new Date();
    if (this.failures &amp;gt;= this.failureThreshold) {
      this.state = 'OPEN';
    }
  }

  private shouldAttemptReset(): boolean {
    return this.lastFailureTime! &amp;amp;&amp;amp;
           Date.now() - this.lastFailureTime.getTime() &amp;gt; this.resetTimeout;
  }
}

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Retry Strategies: Smart Persistence
&lt;/h3&gt;

&lt;p&gt;While retry logic seems straightforward, implementing it correctly requires careful consideration. Exponential backoff with jitter has proven to be the most effective approach in production environments. Here's why:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;It prevents thundering herd problems during recovery&lt;/li&gt;
&lt;li&gt;It accounts for transient failures that resolve quickly&lt;/li&gt;
&lt;li&gt;It gracefully handles longer-term outages
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;class RetryWithExponentialBackoff {
  constructor(
    private readonly maxAttempts = 3,
    private readonly baseDelay = 1000,
    private readonly maxDelay = 10000
  ) {}

  async execute&amp;lt;T&amp;gt;(operation: () =&amp;gt; Promise&amp;lt;T&amp;gt;): Promise&amp;lt;T&amp;gt; {
    let lastError: Error | undefined;

    for (let attempt = 0; attempt &amp;lt; this.maxAttempts; attempt++) {
      try {
        return await operation();
      } catch (error) {
        lastError = error as Error;
        if (attempt &amp;lt; this.maxAttempts - 1) {
          await this.delay(attempt);
        }
      }
    }

    throw lastError;
  }

  private async delay(attempt: number): Promise&amp;lt;void&amp;gt; {
    const jitter = Math.random() * 100;
    const delay = Math.min(
      this.maxDelay,
      (Math.pow(2, attempt) * this.baseDelay) + jitter
    );

    await new Promise(resolve =&amp;gt; setTimeout(resolve, delay));
  }
}

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Service Discovery and Health Checks
&lt;/h3&gt;

&lt;p&gt;Robust health checking is fundamental to maintaining system reliability. A comprehensive health check should:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Verify connectivity to all critical dependencies&lt;/li&gt;
&lt;li&gt;Monitor system resources (memory, CPU, disk)&lt;/li&gt;
&lt;li&gt;Check application-specific metrics&lt;/li&gt;
&lt;li&gt;Report detailed status information&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;I've found that implementing different health check levels (liveness vs readiness) provides better control over container orchestration and load balancing decisions.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;interface HealthStatus {
  status: 'healthy' | 'unhealthy';
  checks: Record&amp;lt;string, boolean&amp;gt;;
  metrics: {
    memory: number;
    cpu: number;
    disk: number;
  };
}

class HealthChecker {
  async check(): Promise&amp;lt;HealthStatus&amp;gt; {
    const [dbStatus, cacheStatus, metrics] = await Promise.all([
      this.checkDatabase(),
      this.checkCache(),
      this.getMetrics()
    ]);

    return {
      status: this.isHealthy(dbStatus, cacheStatus, metrics) ? 'healthy' : 'unhealthy',
      checks: {
        database: dbStatus,
        cache: cacheStatus
      },
      metrics
    };
  }

  private async checkDatabase(): Promise&amp;lt;boolean&amp;gt; {
    try {
      // Implement actual DB check
      return true;
    } catch {
      return false;
    }
  }

  private async checkCache(): Promise&amp;lt;boolean&amp;gt; {
    try {
      // Implement actual cache check
      return true;
    } catch {
      return false;
    }
  }

  private async getMetrics(): Promise&amp;lt;{ memory: number; cpu: number; disk: number }&amp;gt; {
    // Implement actual metrics collection
    return {
      memory: process.memoryUsage().heapUsed,
      cpu: process.cpuUsage().user,
      disk: 0 // Implement actual disk usage check
    };
  }

  private isHealthy(dbStatus: boolean, cacheStatus: boolean, metrics: any): boolean {
    return dbStatus &amp;amp;&amp;amp; cacheStatus &amp;amp;&amp;amp; metrics.memory &amp;lt; 1024 * 1024 * 1024; // 1GB
  }
}

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Handling Cascading Failures
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The Bulkhead Pattern
&lt;/h3&gt;

&lt;p&gt;Named after ship compartmentalization, the bulkhead pattern is crucial for isolation. In our production systems, we implement bulkheads by:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Separating critical and non-critical operations&lt;/li&gt;
&lt;li&gt;Maintaining separate connection pools&lt;/li&gt;
&lt;li&gt;Implementing request quotas per client&lt;/li&gt;
&lt;li&gt;Using dedicated resources for different service categories&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Rate Limiting and Load Shedding
&lt;/h3&gt;

&lt;p&gt;One often-overlooked aspect of resilience is knowing when to say "no." Implementing rate limiting at service boundaries helps maintain system stability under load. Consider:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Per-client rate limits&lt;/li&gt;
&lt;li&gt;Global rate limits&lt;/li&gt;
&lt;li&gt;Adaptive rate limiting based on system health&lt;/li&gt;
&lt;li&gt;Graceful degradation strategies
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;class RateLimiter {
  private readonly requests: Map&amp;lt;string, number[]&amp;gt; = new Map();

  constructor(
    private readonly limit: number = 100,
    private readonly windowMs: number = 60000 // 1 minute
  ) {}

  async isAllowed(clientId: string): Promise&amp;lt;boolean&amp;gt; {
    this.clearStaleRequests(clientId);

    const requests = this.requests.get(clientId) || [];
    if (requests.length &amp;lt; this.limit) {
      requests.push(Date.now());
      this.requests.set(clientId, requests);
      return true;
    }

    return false;
  }

  private clearStaleRequests(clientId: string): void {
    const now = Date.now();
    const requests = this.requests.get(clientId) || [];
    const validRequests = requests.filter(
      timestamp =&amp;gt; now - timestamp &amp;lt; this.windowMs
    );

    if (validRequests.length &amp;gt; 0) {
      this.requests.set(clientId, validRequests);
    } else {
      this.requests.delete(clientId);
    }
  }
}

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Monitoring and Observability
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Distributed Tracing
&lt;/h3&gt;

&lt;p&gt;In a microservices architecture, distributed tracing isn't optional—it's essential. Key aspects to monitor include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Request paths across services&lt;/li&gt;
&lt;li&gt;Latency at each hop&lt;/li&gt;
&lt;li&gt;Error propagation patterns&lt;/li&gt;
&lt;li&gt;Service dependencies and bottlenecks&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Here is an example using OpenTelemetry:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import { trace, context } from '@opentelemetry/api';
import { Resource } from '@opentelemetry/resources';
import { SemanticResourceAttributes } from '@opentelemetry/semantic-conventions';
import { NodeTracerProvider } from '@opentelemetry/sdk-trace-node';
import { SimpleSpanProcessor } from '@opentelemetry/sdk-trace-base';
import { JaegerExporter } from '@opentelemetry/exporter-jaeger';

export function setupTracing() {
  const provider = new NodeTracerProvider({
    resource: new Resource({
      [SemanticResourceAttributes.SERVICE_NAME]: 'my-service',
    }),
  });

  const exporter = new JaegerExporter();
  provider.addSpanProcessor(new SimpleSpanProcessor(exporter));
  provider.register();

  return trace.getTracer('my-service-tracer');
}

// Usage example
async function tracedOperation() {
  const tracer = setupTracing();
  const span = tracer.startSpan('operation-name');

  try {
    // Your operation logic here
    span.setAttributes({ 'custom.attribute': 'value' });
  } catch (error) {
    span.recordException(error as Error);
    throw error;
  } finally {
    span.end();
  }
}

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Metrics That Matter
&lt;/h3&gt;

&lt;p&gt;Focus on these key metrics for each service:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Request rate&lt;/li&gt;
&lt;li&gt;Error rate&lt;/li&gt;
&lt;li&gt;Latency percentiles (p95, p99)&lt;/li&gt;
&lt;li&gt;Resource utilization&lt;/li&gt;
&lt;li&gt;Circuit breaker status&lt;/li&gt;
&lt;li&gt;Retry counts&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Lessons Learned
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Start Simple&lt;/strong&gt; : Begin with basic resilience patterns and evolve based on actual failure modes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Test Failure&lt;/strong&gt; : Regularly practice chaos engineering to verify resilience&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Monitor Everything&lt;/strong&gt; : You can't improve what you can't measure&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Document Decisions&lt;/strong&gt; : Keep records of why certain resilience patterns were chosen&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Review Incidents&lt;/strong&gt; : Learn from every failure and adjust patterns accordingly&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Building truly resilient microservices is an iterative process that requires constant attention and refinement. The patterns described above have proven their worth in production environments, but they must be adapted to your specific context.&lt;/p&gt;

&lt;p&gt;Remember: resilience is not a feature you add—it's a property you build into your system from the ground up.&lt;/p&gt;

&lt;h2&gt;
  
  
  Next Steps
&lt;/h2&gt;

&lt;p&gt;In my next post, we'll explore performance comparisons between Rust and Node.js implementations of these resilience patterns, with a focus on real-world benchmarks and trade-offs.&lt;/p&gt;

</description>
    </item>
  </channel>
</rss>
