<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Spicy</title>
    <description>The latest articles on Forem by Spicy (@spicykim).</description>
    <link>https://forem.com/spicykim</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3930300%2Ff80e2e97-ebe7-4ee3-b2cb-d70ff6eac7bc.png</url>
      <title>Forem: Spicy</title>
      <link>https://forem.com/spicykim</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/spicykim"/>
    <language>en</language>
    <item>
      <title>MCP Explained: The Protocol That's Becoming the USB Standard for AI Agents</title>
      <dc:creator>Spicy</dc:creator>
      <pubDate>Thu, 14 May 2026 06:09:00 +0000</pubDate>
      <link>https://forem.com/spicykim/mcp-explained-the-protocol-thats-becoming-the-usb-standard-for-ai-agents-27cc</link>
      <guid>https://forem.com/spicykim/mcp-explained-the-protocol-thats-becoming-the-usb-standard-for-ai-agents-27cc</guid>
      <description>&lt;p&gt;Every AI agent needs tools. A web search here, a database query there, a calendar update somewhere else.&lt;/p&gt;

&lt;p&gt;The problem: every team was building their own connectors, in their own format, from scratch. Until MCP.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Is MCP?
&lt;/h2&gt;

&lt;p&gt;Model Context Protocol (MCP) is an open standard introduced by Anthropic that defines how AI models connect to external tools and data sources. Think of it like USB-C — one standard port, infinite compatible devices.&lt;/p&gt;

&lt;p&gt;Before MCP, integrating an AI agent with your internal tools meant:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Custom API wrappers per tool&lt;/li&gt;
&lt;li&gt;Different auth schemes per integration&lt;/li&gt;
&lt;li&gt;No reusability across agents or teams&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;With MCP, you build a server once. Any MCP-compatible AI client can connect to it.&lt;/p&gt;

&lt;h2&gt;
  
  
  How It Works
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;MCP Servers&lt;/strong&gt; expose tools, resources, and prompts in a standardized format.&lt;br&gt;&lt;br&gt;
&lt;strong&gt;MCP Clients&lt;/strong&gt; (Claude, Cursor, VS Code, etc.) connect to any server without custom code.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Developers Are Adopting It Fast
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Reusability&lt;/strong&gt; — build one MCP server for your database; every agent in your org can use it&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Ecosystem&lt;/strong&gt; — hundreds of pre-built MCP servers already exist (GitHub, Notion, Slack, Google Drive)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Local + remote&lt;/strong&gt; — runs over stdio for local tools or HTTP/SSE for remote services&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Open standard&lt;/strong&gt; — not locked to any single AI provider&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Real Use Cases
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Connect Claude Desktop to your local filesystem, databases, or APIs&lt;/li&gt;
&lt;li&gt;Give Cursor AI access to your internal docs without copy-pasting&lt;/li&gt;
&lt;li&gt;Build a company-wide tool registry that any AI agent can tap into&lt;/li&gt;
&lt;li&gt;Replace fragmented LangChain tool wrappers with a single MCP layer&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Who's Already Using It
&lt;/h2&gt;

&lt;p&gt;Major IDE and AI tool providers have adopted MCP: Cursor, VS Code Copilot, Windsurf, Zed, and dozens more. The ecosystem is growing fast enough that "MCP support" is becoming a checkbox in enterprise AI tool evaluations.&lt;/p&gt;

&lt;p&gt;Full breakdown — architecture, server types, and enterprise implementation guide:&lt;br&gt;
&lt;a href="https://lucas8.com/mcp-model-context-protocol-ai-agent-tool-connector-guide/" rel="noopener noreferrer"&gt;MCP: The Universal USB for AI Agents&lt;/a&gt;&lt;/p&gt;

</description>
      <category>mcp</category>
      <category>ai</category>
      <category>devtools</category>
    </item>
    <item>
      <title>Why Your Enterprise AI Keeps Failing in Production (And How Multiagent Systems Fix It)</title>
      <dc:creator>Spicy</dc:creator>
      <pubDate>Thu, 14 May 2026 04:59:31 +0000</pubDate>
      <link>https://forem.com/spicykim/why-your-enterprise-ai-keeps-failing-in-production-and-how-multiagent-systems-fix-it-1bo7</link>
      <guid>https://forem.com/spicykim/why-your-enterprise-ai-keeps-failing-in-production-and-how-multiagent-systems-fix-it-1bo7</guid>
      <description>&lt;p&gt;Your AI demo worked perfectly. Production is a different story.&lt;/p&gt;

&lt;p&gt;The root cause is almost always structural: real business workflows aren't single tasks — they're sequences of decisions, handoffs, and system calls that no single model can handle at scale.&lt;/p&gt;

&lt;p&gt;That's exactly the problem &lt;strong&gt;Multiagent Systems (MAS)&lt;/strong&gt; solve.&lt;/p&gt;

&lt;h2&gt;
  
  
  What a Multiagent System Actually Is
&lt;/h2&gt;

&lt;p&gt;Instead of one AI doing everything, MAS deploys a network of specialized agents — each with a defined role, memory, and toolset — coordinated by an orchestrator.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Component&lt;/th&gt;
&lt;th&gt;Role&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Orchestrator agent&lt;/td&gt;
&lt;td&gt;Breaks down goals, manages handoffs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Specialist agents&lt;/td&gt;
&lt;td&gt;Execute defined tasks (research, classify, draft, call APIs)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Memory layer&lt;/td&gt;
&lt;td&gt;Shared context agents read/write to&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tool integrations&lt;/td&gt;
&lt;td&gt;CRMs, ERPs, databases each agent can access&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Guardrail layer&lt;/td&gt;
&lt;td&gt;Monitoring and controls to keep agents in scope&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Gartner named MAS one of its Top 10 Strategic Technology Trends for 2026. Here's why.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where Single Agents Break Down
&lt;/h2&gt;

&lt;p&gt;A single LLM agent works fine for contained tasks. When a workflow requires:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Multiple steps across different systems&lt;/li&gt;
&lt;li&gt;Parallel execution&lt;/li&gt;
&lt;li&gt;Domain specialization at each stage&lt;/li&gt;
&lt;li&gt;An audit trail regulators can follow&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;...a single agent hits hard limits: context overflow, degraded accuracy, no true parallelism.&lt;/p&gt;

&lt;h2&gt;
  
  
  Real Enterprise Use Cases
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Financial Services&lt;/strong&gt; — Loan processing compressed from days to hours. One agent pulls credit data, another runs risk scoring, a third handles compliance checks, all coordinated in real time.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;HR&lt;/strong&gt; — Recruiting pipelines with dedicated agents for screening, scheduling, communication, and compliance — running concurrently instead of sequentially.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Supply Chain&lt;/strong&gt; — Monitoring agents per data source feed a forecasting agent, which triggers an action agent to reroute shipments or escalate to human planners when thresholds are crossed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Customer Service&lt;/strong&gt; — Intake → knowledge retrieval → response generation → quality check, all automated. Edge cases escalated to humans with full context attached.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Deployment Framework That Actually Works
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Map the workflow first&lt;/strong&gt; — before building a single agent&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Define agent boundaries explicitly&lt;/strong&gt; — scope creep = unpredictable production behavior&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Build governance before you scale&lt;/strong&gt; — log every action, add human checkpoints for high-risk decisions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Integrate via MCP or well-defined APIs&lt;/strong&gt; — agents that fail silently create hard-to-diagnose errors&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Start with one bottleneck, measure, then expand&lt;/strong&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Platforms Worth Evaluating in 2026
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Microsoft AutoGen&lt;/strong&gt; — best for Microsoft enterprise stack&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LangGraph&lt;/strong&gt; — most flexible for custom workflows&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CrewAI&lt;/strong&gt; — fastest to prototype&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Amazon Bedrock Agents&lt;/strong&gt; — best if you're already on AWS&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Full breakdown with deployment framework and evaluation criteria:&lt;br&gt;
&lt;a href="https://lucas8.com/multiagent-systems-enterprise-guide/" rel="noopener noreferrer"&gt;Multiagent Systems: Enterprise Use Cases Guide&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agenticai</category>
      <category>enterprise</category>
    </item>
    <item>
      <title>SLM vs LLM: How to Pick the Right Model for Your Enterprise Workload</title>
      <dc:creator>Spicy</dc:creator>
      <pubDate>Thu, 14 May 2026 02:29:31 +0000</pubDate>
      <link>https://forem.com/spicykim/slm-vs-llm-how-to-pick-the-right-model-for-your-enterprise-workload-3c4o</link>
      <guid>https://forem.com/spicykim/slm-vs-llm-how-to-pick-the-right-model-for-your-enterprise-workload-3c4o</guid>
      <description>&lt;p&gt;Every time a new frontier model drops, the benchmarks go wild.&lt;br&gt;
But somewhere between the hype and the monthly bill, enterprise teams are asking a quieter question: &lt;strong&gt;do we actually need the biggest model?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;In 2026, Small Language Models (SLMs) have become a genuine enterprise option — not a compromise.&lt;/p&gt;

&lt;h2&gt;
  
  
  SLM vs LLM: 6 Dimensions That Matter
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Dimension&lt;/th&gt;
&lt;th&gt;SLM&lt;/th&gt;
&lt;th&gt;LLM&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Cost&lt;/td&gt;
&lt;td&gt;$500–$2,000/mo (self-hosted)&lt;/td&gt;
&lt;td&gt;$5,000–$50,000/mo at scale&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Speed&lt;/td&gt;
&lt;td&gt;Sub-second inference&lt;/td&gt;
&lt;td&gt;Higher latency&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Privacy&lt;/td&gt;
&lt;td&gt;Runs on-prem, data never leaves&lt;/td&gt;
&lt;td&gt;External API by default&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Accuracy&lt;/td&gt;
&lt;td&gt;Excellent for narrow tasks&lt;/td&gt;
&lt;td&gt;Better for complex reasoning&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Deployment&lt;/td&gt;
&lt;td&gt;Edge, mobile, single GPU&lt;/td&gt;
&lt;td&gt;Multi-GPU cloud required&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Fine-tuning&lt;/td&gt;
&lt;td&gt;Fast + cheap (LoRA)&lt;/td&gt;
&lt;td&gt;Expensive&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  When to choose SLM
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Task is narrow and well-defined (classification, FAQ, routing)&lt;/li&gt;
&lt;li&gt;Data must stay on-prem (healthcare, legal, finance)&lt;/li&gt;
&lt;li&gt;Needs to run on edge/mobile devices&lt;/li&gt;
&lt;li&gt;Latency is critical (real-time apps)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  When to stick with LLM
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Open-ended, unpredictable inputs&lt;/li&gt;
&lt;li&gt;Complex multi-step reasoning&lt;/li&gt;
&lt;li&gt;Creative synthesis across domains&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The pattern most teams use in 2026
&lt;/h2&gt;

&lt;p&gt;Route high-volume, narrow tasks → SLM&lt;br&gt;&lt;br&gt;
Route complex, unpredictable queries → LLM&lt;/p&gt;

&lt;p&gt;Popular SLMs right now: &lt;strong&gt;Phi-4&lt;/strong&gt;, &lt;strong&gt;Gemma 3&lt;/strong&gt;, &lt;strong&gt;Ministral 3B&lt;/strong&gt;, &lt;strong&gt;Llama 3.2&lt;/strong&gt;, &lt;strong&gt;Qwen3&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Full breakdown with decision framework and enterprise adoption guide here:&lt;br&gt;&lt;br&gt;
&lt;a href="https://lucas8.com/small-language-models-vs-llms/" rel="noopener noreferrer"&gt;Small Language Models vs LLMs: Business Guide 2026&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
      <category>llm</category>
      <category>programming</category>
    </item>
    <item>
      <title>Neocloud vs Hyperscaler: What Engineers Need to Know in 2026</title>
      <dc:creator>Spicy</dc:creator>
      <pubDate>Thu, 14 May 2026 02:12:48 +0000</pubDate>
      <link>https://forem.com/spicykim/neocloud-vs-hyperscaler-what-engineers-need-to-know-in-2026-2a82</link>
      <guid>https://forem.com/spicykim/neocloud-vs-hyperscaler-what-engineers-need-to-know-in-2026-2a82</guid>
      <description>&lt;p&gt;Your AI training job is queued on AWS. You're waiting. The bill is climbing. Meanwhile, a team at CoreWeave just provisioned 512 H100s in under 15 minutes — paying 40% less per GPU-hour.&lt;/p&gt;

&lt;p&gt;That gap is real, and it's why more engineering teams are rethinking their AI infrastructure stack.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's a Neocloud?
&lt;/h2&gt;

&lt;p&gt;Neoclouds are GPU-first cloud providers — CoreWeave, Nebius, Lambda Labs, Voltage Park — built exclusively for AI workloads. No managed databases, no serverless functions, no CDN. Just bare-metal GPU compute at scale, fast.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Core Tradeoffs
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Hyperscaler&lt;/th&gt;
&lt;th&gt;Neocloud&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;GPU availability&lt;/td&gt;
&lt;td&gt;Waitlisted&lt;/td&gt;
&lt;td&gt;Fast provisioning&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Pricing&lt;/td&gt;
&lt;td&gt;Complex, bundled&lt;/td&gt;
&lt;td&gt;Transparent per-GPU-hour&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cost vs baseline&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;30–60% cheaper&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Service breadth&lt;/td&gt;
&lt;td&gt;Thousands of services&lt;/td&gt;
&lt;td&gt;Compute-focused&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Compliance&lt;/td&gt;
&lt;td&gt;Extensive&lt;/td&gt;
&lt;td&gt;Growing&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  When to use a Neocloud
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Pure AI training / fine-tuning / inference workloads&lt;/li&gt;
&lt;li&gt;When GPU availability is blocking your team&lt;/li&gt;
&lt;li&gt;When you want bare-metal performance without virtualization overhead&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  When to stick with a Hyperscaler
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;AI workload is tightly coupled with managed services (RDS, Lambda, etc.)&lt;/li&gt;
&lt;li&gt;Multi-region compliance requirements today&lt;/li&gt;
&lt;li&gt;Team bandwidth is limited&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Most mature teams in 2026 are running &lt;strong&gt;hybrid&lt;/strong&gt; — neocloud for training, hyperscaler for the application stack.&lt;/p&gt;

&lt;p&gt;I wrote a full breakdown with provider comparisons (CoreWeave vs Nebius vs Lambda vs Nscale) here:&lt;br&gt;
&lt;a href="https://lucas8.com/neocloud-vs-hyperscaler-enterprise-guide/" rel="noopener noreferrer"&gt;Neocloud vs Hyperscaler: 2026 Enterprise Guide&lt;/a&gt;&lt;/p&gt;

</description>
      <category>neocloud</category>
      <category>cloud</category>
      <category>ai</category>
    </item>
  </channel>
</rss>
