<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Kael Tiwari</title>
    <description>The latest articles on Forem by Kael Tiwari (@kaeltiwari).</description>
    <link>https://forem.com/kaeltiwari</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3781118%2F0c4469cb-e26a-4d68-accb-b6f70868a762.png</url>
      <title>Forem: Kael Tiwari</title>
      <link>https://forem.com/kaeltiwari</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/kaeltiwari"/>
    <language>en</language>
    <item>
      <title>GPU Economics: What Inference Actually Costs in 2026</title>
      <dc:creator>Kael Tiwari</dc:creator>
      <pubDate>Wed, 25 Feb 2026 03:35:12 +0000</pubDate>
      <link>https://forem.com/kaeltiwari/gpu-economics-what-inference-actually-costs-in-2026-2goo</link>
      <guid>https://forem.com/kaeltiwari/gpu-economics-what-inference-actually-costs-in-2026-2goo</guid>
      <description>&lt;p&gt;The question every AI team eventually asks: should we rent GPUs and run models ourselves, or just pay per token through an API?&lt;/p&gt;

&lt;p&gt;The answer changed a lot in the last six months. GPU rental prices dropped. API prices dropped faster. New GPU generations shipped. And mixture-of-experts models made the whole calculation messier than it used to be.&lt;/p&gt;

&lt;p&gt;Here's the actual math, with real numbers from real providers.&lt;/p&gt;




&lt;h2&gt;
  
  
  GPU rental prices right now
&lt;/h2&gt;

&lt;p&gt;These are on-demand, publicly listed prices as of February 2026. No negotiated enterprise deals, no reserved instances.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;GPU&lt;/th&gt;
&lt;th&gt;Provider&lt;/th&gt;
&lt;th&gt;Config&lt;/th&gt;
&lt;th&gt;$/hour&lt;/th&gt;
&lt;th&gt;VRAM (GB)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;NVIDIA B200&lt;/td&gt;
&lt;td&gt;&lt;a href="https://coreweave.com/pricing" rel="noopener noreferrer"&gt;CoreWeave&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;8x GPU&lt;/td&gt;
&lt;td&gt;$68.80&lt;/td&gt;
&lt;td&gt;180&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;NVIDIA GB200 NVL72&lt;/td&gt;
&lt;td&gt;&lt;a href="https://coreweave.com/pricing" rel="noopener noreferrer"&gt;CoreWeave&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;4-GPU slice&lt;/td&gt;
&lt;td&gt;$42.00&lt;/td&gt;
&lt;td&gt;186&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;NVIDIA HGX H200&lt;/td&gt;
&lt;td&gt;&lt;a href="https://coreweave.com/pricing" rel="noopener noreferrer"&gt;CoreWeave&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;8x GPU&lt;/td&gt;
&lt;td&gt;$50.44&lt;/td&gt;
&lt;td&gt;141&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;NVIDIA HGX H100&lt;/td&gt;
&lt;td&gt;&lt;a href="https://coreweave.com/pricing" rel="noopener noreferrer"&gt;CoreWeave&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;8x GPU&lt;/td&gt;
&lt;td&gt;$49.24&lt;/td&gt;
&lt;td&gt;80&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;NVIDIA GH200&lt;/td&gt;
&lt;td&gt;&lt;a href="https://coreweave.com/pricing" rel="noopener noreferrer"&gt;CoreWeave&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;1x GPU&lt;/td&gt;
&lt;td&gt;$6.50&lt;/td&gt;
&lt;td&gt;96&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;NVIDIA A100 80GB&lt;/td&gt;
&lt;td&gt;&lt;a href="https://coreweave.com/pricing" rel="noopener noreferrer"&gt;CoreWeave&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;8x GPU&lt;/td&gt;
&lt;td&gt;$21.60&lt;/td&gt;
&lt;td&gt;80&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;NVIDIA L40S&lt;/td&gt;
&lt;td&gt;&lt;a href="https://coreweave.com/pricing" rel="noopener noreferrer"&gt;CoreWeave&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;8x GPU&lt;/td&gt;
&lt;td&gt;$18.00&lt;/td&gt;
&lt;td&gt;48&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;NVIDIA RTX PRO 6000&lt;/td&gt;
&lt;td&gt;&lt;a href="https://coreweave.com/pricing" rel="noopener noreferrer"&gt;CoreWeave&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;8x GPU&lt;/td&gt;
&lt;td&gt;$20.00&lt;/td&gt;
&lt;td&gt;96&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;A few things stand out. The B200 costs 40% more than the H100 per hour, but delivers roughly 2.5x the inference throughput for large models according to &lt;a href="https://www.nvidia.com/en-us/data-center/b200/" rel="noopener noreferrer"&gt;NVIDIA's own benchmarks&lt;/a&gt;. The H200 is barely more expensive than the H100 despite having 76% more VRAM. And the A100 — which was the default choice 18 months ago — is now less than half the price of current gen.&lt;/p&gt;

&lt;p&gt;CoreWeave dominates GPU cloud pricing. They're seeking an &lt;a href="https://www.techmeme.com/260224/p44#a260224p44" rel="noopener noreferrer"&gt;$8.5B loan backed by a Meta contract&lt;/a&gt; worth up to $14.2B, which tells you the scale of demand here.&lt;/p&gt;

&lt;h2&gt;
  
  
  What does it cost to serve a model yourself?
&lt;/h2&gt;

&lt;p&gt;Let's do the math on running Llama 3.1 405B — a model big enough to compete with GPT-5 mini on most benchmarks, and the most common choice for self-hosted production deployments.&lt;/p&gt;

&lt;p&gt;Hardware requirement: 405B parameters at FP8 precision need roughly 405GB of VRAM. That's a minimum of 6x H100 80GB GPUs, or more practically an 8-GPU H100 node.&lt;/p&gt;

&lt;p&gt;Hourly cost on CoreWeave: $49.24/hr for 8x H100.&lt;/p&gt;

&lt;p&gt;Throughput: with vLLM and continuous batching, expect roughly 2,000-3,000 output tokens per second on an 8x H100 setup running Llama 405B at FP8. Call it 2,500 tok/s as a conservative estimate based on &lt;a href="https://blog.vllm.ai/2024/09/05/perf-update.html" rel="noopener noreferrer"&gt;vLLM benchmarks&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Cost per million output tokens:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;2,500 tokens/second = 9,000,000 tokens/hour&lt;/li&gt;
&lt;li&gt;$49.24 / 9M tokens = $5.47 per million output tokens&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Compare that to API pricing for models in this class:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Provider&lt;/th&gt;
&lt;th&gt;Input $/M&lt;/th&gt;
&lt;th&gt;Output $/M&lt;/th&gt;
&lt;th&gt;Source&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Llama 3.1 405B&lt;/td&gt;
&lt;td&gt;&lt;a href="https://www.together.ai/pricing" rel="noopener noreferrer"&gt;Together AI&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;$3.50&lt;/td&gt;
&lt;td&gt;$3.50&lt;/td&gt;
&lt;td&gt;Serverless&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DeepSeek-R1-0528&lt;/td&gt;
&lt;td&gt;&lt;a href="https://www.together.ai/pricing" rel="noopener noreferrer"&gt;Together AI&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;$3.00&lt;/td&gt;
&lt;td&gt;$7.00&lt;/td&gt;
&lt;td&gt;Serverless&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GPT-5 mini&lt;/td&gt;
&lt;td&gt;&lt;a href="https://openai.com/api/pricing/" rel="noopener noreferrer"&gt;OpenAI&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;$0.25&lt;/td&gt;
&lt;td&gt;$2.00&lt;/td&gt;
&lt;td&gt;API&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GPT-5.2&lt;/td&gt;
&lt;td&gt;&lt;a href="https://openai.com/api/pricing/" rel="noopener noreferrer"&gt;OpenAI&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;$1.75&lt;/td&gt;
&lt;td&gt;$14.00&lt;/td&gt;
&lt;td&gt;API&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Gemini 3 Pro&lt;/td&gt;
&lt;td&gt;&lt;a href="https://cloud.google.com/vertex-ai/generative-ai/pricing" rel="noopener noreferrer"&gt;Google Cloud&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;$2.00&lt;/td&gt;
&lt;td&gt;$12.00&lt;/td&gt;
&lt;td&gt;Vertex AI&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Gemini 3 Flash&lt;/td&gt;
&lt;td&gt;&lt;a href="https://cloud.google.com/vertex-ai/generative-ai/pricing" rel="noopener noreferrer"&gt;Google Cloud&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;$0.50&lt;/td&gt;
&lt;td&gt;$3.00&lt;/td&gt;
&lt;td&gt;Vertex AI&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Qwen3.5-397B-A17B&lt;/td&gt;
&lt;td&gt;&lt;a href="https://www.together.ai/pricing" rel="noopener noreferrer"&gt;Together AI&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;$0.60&lt;/td&gt;
&lt;td&gt;$3.60&lt;/td&gt;
&lt;td&gt;Serverless&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Self-hosting Llama 405B at $5.47/M output tokens is more expensive than calling Together AI's API for the same model at $3.50/M. That's the efficiency of shared infrastructure at scale. Together AI batches requests from thousands of customers across the same GPUs. You're paying for idle time; they're not.&lt;/p&gt;

&lt;h2&gt;
  
  
  When self-hosting wins
&lt;/h2&gt;

&lt;p&gt;The math flips in three scenarios.&lt;/p&gt;

&lt;p&gt;First, when you're running at near-100% capacity. If your inference demand is constant and maxes out the hardware — say, a consumer product doing millions of requests per day — your effective per-token cost drops because you're eliminating idle time. At 90%+ load, self-hosted Llama 405B drops to roughly $4.00/M output. Still not cheaper than Together AI's serverless rate, but cheaper than OpenAI's GPT-5.2 at $14.00/M.&lt;/p&gt;

&lt;p&gt;Second, data isolation. Some industries (healthcare, defense, finance) can't send prompts to third-party APIs. The premium you pay for self-hosting is really a compliance cost. CoreWeave and Lambda offer single-tenant nodes for this.&lt;/p&gt;

&lt;p&gt;Third, smaller models. A 7B or 8B model on a single L40S ($2.25/hr for one GPU) can push 10,000+ tokens/second. That works out to about $0.06/M output tokens — roughly matching the cheapest API options like Llama 3.2 3B at &lt;a href="https://www.together.ai/pricing" rel="noopener noreferrer"&gt;$0.06/M on Together AI&lt;/a&gt;. But if you're running a fine-tuned version of that model, the API option doesn't exist.&lt;/p&gt;

&lt;h2&gt;
  
  
  When APIs win
&lt;/h2&gt;

&lt;p&gt;For most teams, most of the time. Here's why.&lt;/p&gt;

&lt;p&gt;Mixture-of-experts models destroyed the self-hosting value proposition. Qwen3.5-397B has 397B total parameters but only activates 17B per token. Together AI charges &lt;a href="https://www.together.ai/pricing" rel="noopener noreferrer"&gt;$0.60/M input and $3.60/M output&lt;/a&gt; for it. Running it yourself requires enough VRAM to hold all 397B parameters even though you're only using 17B at inference. You're paying for dead weight.&lt;/p&gt;

&lt;p&gt;The same applies to DeepSeek V3.1 ($0.60/$1.70 on Together AI), Llama 4 Maverick ($0.27/$0.85), and most new open models shipping with MoE architectures. API providers handle the memory overhead across a shared fleet. You'd handle it alone.&lt;/p&gt;

&lt;p&gt;Batch pricing cuts costs in half. OpenAI's Batch API gives you 50% off both input and output tokens in exchange for 24-hour turnaround. For non-realtime workloads — data processing, content generation, analysis pipelines — that brings GPT-5 mini down to $0.125/$1.00. No GPU rental comes close for a model of that quality.&lt;/p&gt;

&lt;p&gt;You don't need to hire anyone. Running inference infrastructure requires MLOps engineers. Kubernetes. Monitoring. Model updates. Quantization debugging. One senior ML infra engineer costs $200K+/year. That's equivalent to roughly 4,000 H100-hours at CoreWeave, or about 36 trillion tokens through GPT-5 mini's API.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Blackwell generation changes the math (slightly)
&lt;/h2&gt;

&lt;p&gt;NVIDIA's B200 delivers roughly &lt;a href="https://www.nvidia.com/en-us/data-center/b200/" rel="noopener noreferrer"&gt;2.5x the inference throughput of an H100&lt;/a&gt; for FP8 workloads. At $68.80/hr for 8x B200 on CoreWeave versus $49.24 for 8x H100, you're paying 40% more for 2.5x the throughput. Per-token cost drops by about 44%.&lt;/p&gt;

&lt;p&gt;That brings self-hosted Llama 405B on B200s down to roughly $3.10/M output tokens — finally competitive with Together AI's API rate. But B200 availability is still constrained. CoreWeave's GB200 NVL72 (the rack-scale option at $42/hr for a 4-GPU slice) adds even more memory bandwidth, but at 186GB VRAM per slice it's sized for models under 200B parameters.&lt;/p&gt;

&lt;p&gt;For teams that can get B200 allocations and run at high capacity, self-hosting starts to make financial sense again. For everyone else, the API gap keeps widening.&lt;/p&gt;

&lt;h2&gt;
  
  
  The real cost nobody talks about
&lt;/h2&gt;

&lt;p&gt;Electricity. A single 8x H100 node draws about 10.2 kW under load. At US commercial electricity rates ($0.12/kWh average from the &lt;a href="https://www.eia.gov/electricity/monthly/epm_table_5_6_a.html" rel="noopener noreferrer"&gt;EIA&lt;/a&gt;), that's $1.22/hr just for power — roughly 2.5% of the CoreWeave rental price. Not a big deal for cloud renters.&lt;/p&gt;

&lt;p&gt;But if you're Meta building out &lt;a href="https://www.reddit.com/r/artificial/comments/1rdm17p/meta_strikes_up_to_100b_amd_chip_deal_as_it/" rel="noopener noreferrer"&gt;data centers that consume gigawatts&lt;/a&gt;, or CoreWeave financing $8.5B in infrastructure, power becomes the constraint that sets the floor on how cheap inference can get. Big Tech is projected to invest &lt;a href="https://www.reddit.com/r/artificial/comments/1rcmgzy/big_tech_to_invest_about_650_billion_in_ai_in/" rel="noopener noreferrer"&gt;$650B in AI infrastructure in 2026&lt;/a&gt;, and a meaningful chunk of that is electricity and cooling.&lt;/p&gt;

&lt;h2&gt;
  
  
  Bottom line
&lt;/h2&gt;

&lt;p&gt;For teams processing fewer than 10B tokens per month, APIs are cheaper, simpler, and better maintained. GPT-5 mini at $0.25/$2.00 or Qwen3.5-397B at $0.60/$3.60 will outperform anything you self-host at the same cost.&lt;/p&gt;

&lt;p&gt;For teams above 10B tokens/month with consistent demand, self-hosting on B200s starts to pencil out — but only if you have the engineering team to run it and can tolerate the 3-6 month wait for hardware allocation.&lt;/p&gt;

&lt;p&gt;The interesting middle ground is dedicated endpoints from providers like Together AI and Fireworks, where you rent reserved GPU capacity but the provider handles the stack. You get lower per-token costs than serverless without the ops overhead. That's where most serious production deployments end up.&lt;/p&gt;

&lt;p&gt;If you want to see how the API prices compare across all major providers, we maintain an updated table in our &lt;a href="https://dev.to/blog/llm-pricing-comparison-feb-2026"&gt;LLM pricing comparison&lt;/a&gt;. And for context on which models are worth running in the first place, see our &lt;a href="https://dev.to/blog/open-source-vs-proprietary-llms"&gt;open source vs proprietary LLM analysis&lt;/a&gt;.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;We publish data-driven analysis on AI infrastructure, pricing, and adoption every week. &lt;a href="https://dev.to/#newsletter"&gt;Subscribe to get it in your inbox&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>gpu</category>
      <category>inference</category>
      <category>pricing</category>
      <category>analysis</category>
    </item>
    <item>
      <title>AI coding assistant adoption by company size: who's actually using what</title>
      <dc:creator>Kael Tiwari</dc:creator>
      <pubDate>Fri, 20 Feb 2026 05:37:08 +0000</pubDate>
      <link>https://forem.com/kaeltiwari/ai-coding-assistant-adoption-by-company-size-whos-actually-using-what-3n9b</link>
      <guid>https://forem.com/kaeltiwari/ai-coding-assistant-adoption-by-company-size-whos-actually-using-what-3n9b</guid>
      <description>&lt;p&gt;Nearly every developer you know probably uses an AI coding assistant. DX's latest research — &lt;a href="https://getdx.com/research/measuring-ai-code-assistants-and-agents/" rel="noopener noreferrer"&gt;121,000 developers, 450+ companies&lt;/a&gt; — puts the monthly usage number at 92.6%. Sounds like a settled question. It isn't. A solo dev auto-completing functions in Cursor and an enterprise pushing Codex through six months of compliance review are living in different worlds. The story worth telling is in that gap.&lt;/p&gt;

&lt;h2&gt;
  
  
  The numbers everyone quotes (and what they miss)
&lt;/h2&gt;

&lt;p&gt;Three data points, three years:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Survey&lt;/th&gt;
&lt;th&gt;Year&lt;/th&gt;
&lt;th&gt;Sample&lt;/th&gt;
&lt;th&gt;"Using AI tools now"&lt;/th&gt;
&lt;th&gt;"Using or plan to"&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://survey.stackoverflow.co/2024/ai" rel="noopener noreferrer"&gt;Stack Overflow Developer Survey&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;2024&lt;/td&gt;
&lt;td&gt;65,000+ devs&lt;/td&gt;
&lt;td&gt;62%&lt;/td&gt;
&lt;td&gt;76%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://stackoverflow.blog/2023/06/12/developer-survey-sentiment-ai-ml/" rel="noopener noreferrer"&gt;Stack Overflow Developer Survey&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;2023&lt;/td&gt;
&lt;td&gt;90,000+ devs&lt;/td&gt;
&lt;td&gt;44%&lt;/td&gt;
&lt;td&gt;70%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://shiftmag.dev/this-cto-says-93-of-developers-use-ai-but-productivity-is-still-10-8013/" rel="noopener noreferrer"&gt;DX / Laura Tacho research&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Q4 2025–Q1 2026&lt;/td&gt;
&lt;td&gt;121,000 devs&lt;/td&gt;
&lt;td&gt;92.6% (monthly)&lt;/td&gt;
&lt;td&gt;~97%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;44% to 92.6% in under three years. Nobody disputes the trend anymore. But these surveys flatten a variable that matters a lot: company size.&lt;/p&gt;

&lt;h2&gt;
  
  
  Small teams move fast, big teams move carefully
&lt;/h2&gt;

&lt;p&gt;Under 50 engineers? AI coding tools show up overnight. No procurement. No security review. A founder enables Copilot and the team has it by lunch.&lt;/p&gt;

&lt;p&gt;Big companies are different. &lt;a href="https://getdx.com/research/measuring-ai-code-assistants-and-agents/" rel="noopener noreferrer"&gt;DX found&lt;/a&gt; that even the best-performing large organizations cap out around 60% &lt;em&gt;active&lt;/em&gt; usage — weekly, habitual use, not "opened it once in January." That 60% ceiling versus 92.6% monthly tells you everything about the enterprise adoption gap.&lt;/p&gt;

&lt;p&gt;Rough pattern:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Company size&lt;/th&gt;
&lt;th&gt;Typical adoption rate&lt;/th&gt;
&lt;th&gt;Active weekly usage&lt;/th&gt;
&lt;th&gt;Primary blocker&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1–50 engineers&lt;/td&gt;
&lt;td&gt;&amp;gt;90%&lt;/td&gt;
&lt;td&gt;~75%&lt;/td&gt;
&lt;td&gt;Individual preference&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;51–500 engineers&lt;/td&gt;
&lt;td&gt;~80%&lt;/td&gt;
&lt;td&gt;~55%&lt;/td&gt;
&lt;td&gt;Security review, budget&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;500–5,000 engineers&lt;/td&gt;
&lt;td&gt;~70%&lt;/td&gt;
&lt;td&gt;~45%&lt;/td&gt;
&lt;td&gt;Compliance, SSO/audit requirements&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5,000+ engineers&lt;/td&gt;
&lt;td&gt;~65%&lt;/td&gt;
&lt;td&gt;~35%&lt;/td&gt;
&lt;td&gt;Procurement, data residency, IP concerns&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;em&gt;Synthesized from &lt;a href="https://getdx.com/research/measuring-ai-code-assistants-and-agents/" rel="noopener noreferrer"&gt;DX benchmarks&lt;/a&gt; (4M+ samples, hundreds of orgs), &lt;a href="https://survey.stackoverflow.co/2024/ai" rel="noopener noreferrer"&gt;Stack Overflow 2024&lt;/a&gt;, and &lt;a href="https://shiftmag.dev/this-cto-says-93-of-developers-use-ai-but-productivity-is-still-10-8013/" rel="noopener noreferrer"&gt;Pragmatic Summit keynote&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Big companies adopting slower — fine, obvious. The weird part is how many licenses go unused. A 500-person eng org buys Copilot for everyone. 45% open it in a given week. The rest? Expensive shelfware.&lt;/p&gt;

&lt;h2&gt;
  
  
  The productivity plateau is real — and it hits different by size
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://shiftmag.dev/this-cto-says-93-of-developers-use-ai-but-productivity-is-still-10-8013/" rel="noopener noreferrer"&gt;Laura Tacho's DX research&lt;/a&gt; found that productivity gains from AI coding tools have flatlined at about 10%. Developers save 3.6–4 hours a week — same number as Q2 2025. The needle stopped moving.&lt;/p&gt;

&lt;p&gt;Except that's an average. Averages lie.&lt;/p&gt;

&lt;p&gt;Small orgs tend to get more out of these tools. Simpler codebases. Faster CI. Developers who do everything. A full-stack dev at a 20-person startup scaffolds an API endpoint with Copilot and saves an hour — visible immediately.&lt;/p&gt;

&lt;p&gt;At a 5,000-person company, the same tool collides with slow CI pipelines, three rounds of code review, and legacy code that AI can't parse. &lt;a href="https://survey.stackoverflow.co/2024/ai" rel="noopener noreferrer"&gt;Stack Overflow's 2024 survey&lt;/a&gt; found 45% of professional developers rate AI tools as "bad or very bad at handling complex tasks." Complex tasks live at big companies.&lt;/p&gt;

&lt;p&gt;DX's numbers get wilder. Well-run orgs? 50% fewer customer-facing incidents with AI. Messy orgs? Incidents &lt;em&gt;doubled&lt;/em&gt;. Same tools, opposite outcomes. &lt;a href="https://shiftmag.dev/this-cto-says-93-of-developers-use-ai-but-productivity-is-still-10-8013/" rel="noopener noreferrer"&gt;Tacho's take&lt;/a&gt;: "AI tends to highlight existing flaws rather than fix them." Messy orgs tend to be bigger ones. Not a rule. But a pattern.&lt;/p&gt;

&lt;h2&gt;
  
  
  AI-authored code is climbing fast
&lt;/h2&gt;

&lt;p&gt;One metric that cuts through the adoption noise: the share of production code written by AI. &lt;a href="https://shiftmag.dev/this-cto-says-93-of-developers-use-ai-but-productivity-is-still-10-8013/" rel="noopener noreferrer"&gt;DX tracked 4.2 million developers&lt;/a&gt; between November 2025 and February 2026:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Value&lt;/th&gt;
&lt;th&gt;Trend&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;AI-authored code in production&lt;/td&gt;
&lt;td&gt;26.9%&lt;/td&gt;
&lt;td&gt;Up from 22% previous quarter&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AI-authored code (daily users)&lt;/td&gt;
&lt;td&gt;~33%&lt;/td&gt;
&lt;td&gt;Approaching one-third&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Onboarding time (time to 10th PR)&lt;/td&gt;
&lt;td&gt;Cut in half&lt;/td&gt;
&lt;td&gt;Steady decline since Q1 2024&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;That onboarding cut matters most for big companies. New hires at large orgs historically take months to ship anything in a sprawling codebase. Cut that ramp in half and the ROI math changes completely. It stops being about writing code faster. It becomes about making people useful sooner.&lt;/p&gt;

&lt;h2&gt;
  
  
  Which tools win at which scale
&lt;/h2&gt;

&lt;p&gt;Tool choice maps pretty cleanly to org size:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Segment&lt;/th&gt;
&lt;th&gt;Dominant tools&lt;/th&gt;
&lt;th&gt;Why&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Solo / small team&lt;/td&gt;
&lt;td&gt;
&lt;a href="https://cursor.sh" rel="noopener noreferrer"&gt;Cursor&lt;/a&gt;, Claude Code, Windsurf&lt;/td&gt;
&lt;td&gt;Best DX, no procurement needed&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Mid-market (50–500)&lt;/td&gt;
&lt;td&gt;GitHub Copilot, Cursor Business&lt;/td&gt;
&lt;td&gt;Balance of features and admin controls&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Enterprise (500+)&lt;/td&gt;
&lt;td&gt;GitHub Copilot Enterprise, &lt;a href="https://openai.com/index/codex/" rel="noopener noreferrer"&gt;Codex&lt;/a&gt;
&lt;/td&gt;
&lt;td&gt;SSO, audit logs, IP indemnification&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;a href="https://shiftmag.dev/this-cto-says-93-of-developers-use-ai-but-productivity-is-still-10-8013/" rel="noopener noreferrer"&gt;Codex&lt;/a&gt; deserves a special mention. The desktop app launched February 2 and hit one million downloads within weeks, growing 60% week-over-week. Inside OpenAI, 95% of developers use it and submit roughly 60% more pull requests per week. Cisco deployed it to 18,000 engineers for migrations and code reviews, cutting review time in half.&lt;/p&gt;

&lt;p&gt;Enterprise adoption of Codex is early though. Most big companies haven't finished security vetting. Copilot Enterprise stays the default at scale because GitHub already lives in their stack.&lt;/p&gt;

&lt;p&gt;Mid-market is where Cursor and similar AI-native editors are winning. Deep model integration, reasonable admin controls on the business tier, none of the enterprise procurement overhead. Good enough for a 200-person eng org.&lt;/p&gt;

&lt;h2&gt;
  
  
  The experience gap nobody talks about
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://stackoverflow.blog/2023/06/12/developer-survey-sentiment-ai-ml/" rel="noopener noreferrer"&gt;Stack Overflow's 2023 data&lt;/a&gt; had a pattern that jumped out:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Experience&lt;/th&gt;
&lt;th&gt;Using AI tools&lt;/th&gt;
&lt;th&gt;Don't plan to&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Less than 1 year&lt;/td&gt;
&lt;td&gt;55.1%&lt;/td&gt;
&lt;td&gt;21.4%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1–5 years&lt;/td&gt;
&lt;td&gt;51.3%&lt;/td&gt;
&lt;td&gt;24.5%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;6–10 years&lt;/td&gt;
&lt;td&gt;42.3%&lt;/td&gt;
&lt;td&gt;30.2%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;11–15 years&lt;/td&gt;
&lt;td&gt;39.5%&lt;/td&gt;
&lt;td&gt;32.5%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;16–20 years&lt;/td&gt;
&lt;td&gt;35.9%&lt;/td&gt;
&lt;td&gt;36.0%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;21+ years&lt;/td&gt;
&lt;td&gt;30.2%&lt;/td&gt;
&lt;td&gt;42.2%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;This connects directly to company size. Older, larger companies employ more senior engineers. A shop where average tenure is 12 years will see lower organic adoption than a startup where the median engineer has three years under their belt. Senior engineers aren't Luddites. They're working on problems where current AI tools genuinely can't help much yet.&lt;/p&gt;

&lt;h2&gt;
  
  
  Geography makes it messier
&lt;/h2&gt;

&lt;p&gt;Where your developers sit changes things too. From &lt;a href="https://stackoverflow.blog/2023/06/12/developer-survey-sentiment-ai-ml/" rel="noopener noreferrer"&gt;the same Stack Overflow data&lt;/a&gt;:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Country&lt;/th&gt;
&lt;th&gt;Using or plan to use&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;🇮🇳 India&lt;/td&gt;
&lt;td&gt;83.6%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;🇧🇷 Brazil&lt;/td&gt;
&lt;td&gt;78.0%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;🇺🇸 United States&lt;/td&gt;
&lt;td&gt;63.9%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;🇩🇪 Germany&lt;/td&gt;
&lt;td&gt;63.9%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;🇬🇧 United Kingdom&lt;/td&gt;
&lt;td&gt;61.3%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;🇫🇷 France&lt;/td&gt;
&lt;td&gt;61.4%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;India and Brazil lead. Younger developer populations, faster-growing tech sectors. &lt;a href="https://github.blog/news-insights/octoverse/octoverse-2024/" rel="noopener noreferrer"&gt;GitHub's Octoverse report&lt;/a&gt; projects India will have the most developers on GitHub by 2028 — generative AI contributions on the platform surged 59% in 2024.&lt;/p&gt;

&lt;p&gt;For multinationals, this means a patchwork. Your Bangalore team is all-in on Copilot. Your Munich office wants to see more evidence first. That's not a tech problem. It's a cultural one.&lt;/p&gt;

&lt;h2&gt;
  
  
  So what do you actually do
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Small team, under 50 engineers.&lt;/strong&gt; Pick something and commit to it. Cursor or Claude Code for the best solo experience. Copilot if everyone uses VS Code. Don't agonize over the choice — daily habit matters more than which tool.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Mid-market, 50–500.&lt;/strong&gt; Track active usage, not seat count. &lt;a href="https://getdx.com/research/measuring-ai-code-assistants-and-agents/" rel="noopener noreferrer"&gt;DX recommends&lt;/a&gt; measuring weekly active users and time saved per developer. Booking.com did this across 3,500 engineers — &lt;a href="https://getdx.com/customers/booking-drives-ai-adoption-with-dx/" rel="noopener noreferrer"&gt;16% throughput increase&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Enterprise, 500+.&lt;/strong&gt; The tool is almost irrelevant. What matters: fast CI, clear docs, well-defined service boundaries. &lt;a href="https://getdx.com/research/measuring-ai-code-assistants-and-agents/" rel="noopener noreferrer"&gt;DX identifies these&lt;/a&gt; as the real predictors of whether AI tools deliver value. Fix developer experience first. Add AI second. Otherwise you're just automating dysfunction.&lt;/p&gt;

&lt;p&gt;The winners aren't the companies that adopted first. They're the ones that measured what happened after and changed course when the data told them to. &lt;a href="https://shiftmag.dev/this-cto-says-93-of-developers-use-ai-but-productivity-is-still-10-8013/" rel="noopener noreferrer"&gt;Laura Tacho's blunt summary&lt;/a&gt;: "This is really a management problem."&lt;/p&gt;




&lt;p&gt;&lt;em&gt;More from Kael Research: &lt;a href="https://dev.to/blog/llm-pricing-comparison-feb-2026"&gt;LLM pricing comparison&lt;/a&gt; and &lt;a href="https://dev.to/blog/ai-agent-market-map-2026"&gt;AI agent market map 2026&lt;/a&gt;. Get these posts in your inbox — &lt;a href="https://dev.to/#newsletter"&gt;join the newsletter&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>aicodingassistants</category>
      <category>developerproductivity</category>
      <category>enterpriseai</category>
      <category>githubcopilot</category>
    </item>
    <item>
      <title>AI agent market map 2026: who's building what</title>
      <dc:creator>Kael Tiwari</dc:creator>
      <pubDate>Thu, 19 Feb 2026 13:40:21 +0000</pubDate>
      <link>https://forem.com/kaeltiwari/ai-agent-market-map-2026-whos-building-what-231b</link>
      <guid>https://forem.com/kaeltiwari/ai-agent-market-map-2026-whos-building-what-231b</guid>
      <description>&lt;p&gt;&lt;em&gt;Originally published on &lt;a href="https://kaelresearch.com/blog/ai-agent-market-map-2026" rel="noopener noreferrer"&gt;Kael Research&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The AI agent market split into two camps this year: frameworks racing for developer adoption, and platforms betting on enterprise deployment. After analyzing GitHub stars, HuggingFace downloads, and funding announcements, the winners are becoming clear.&lt;/p&gt;

&lt;h2&gt;
  
  
  Market size and momentum
&lt;/h2&gt;

&lt;p&gt;The agent space got real money in 2026. CrewAI claims 100,000+ certified developers through their courses at learn.crewai.com. LangChain maintains its position as the default choice but faces performance pressure from newer frameworks. Microsoft's AutoGen shifted focus to their new Agent Framework after announcing maintenance mode for v0.2.&lt;/p&gt;

&lt;p&gt;Enterprise adoption accelerated. Accenture now allegedly ties promotions to "regular" AI adoption and tracks individual weekly AI tool logins for senior staff, according to &lt;a href="https://www.ft.com" rel="noopener noreferrer"&gt;Financial Times reporting&lt;/a&gt;. TCS signed OpenAI as their first data center customer with 100MW capacity, starting what could be power-grid scale enterprise AI deployment.&lt;/p&gt;

&lt;p&gt;India emerged as a major market force. At the India AI Impact Summit 2026, organizers claimed 300+ exhibitors, 500 sessions, 250K visitors, and billions in investment commitments. Reliance plans up to $110B in AI infrastructure over seven years, while Pine Labs is embedding OpenAI APIs directly into payment infrastructure.&lt;/p&gt;

&lt;h2&gt;
  
  
  Framework comparison
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Name&lt;/th&gt;
&lt;th&gt;Type&lt;/th&gt;
&lt;th&gt;Pricing&lt;/th&gt;
&lt;th&gt;GitHub Stars/Users&lt;/th&gt;
&lt;th&gt;Key Feature&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;LangChain&lt;/td&gt;
&lt;td&gt;Framework&lt;/td&gt;
&lt;td&gt;Free/LangSmith paid&lt;/td&gt;
&lt;td&gt;100K+ stars&lt;/td&gt;
&lt;td&gt;Model interoperability&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CrewAI&lt;/td&gt;
&lt;td&gt;Framework&lt;/td&gt;
&lt;td&gt;Free/AMP Suite paid&lt;/td&gt;
&lt;td&gt;Not specified&lt;/td&gt;
&lt;td&gt;Role-based multi-agent&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AutoGen&lt;/td&gt;
&lt;td&gt;Framework&lt;/td&gt;
&lt;td&gt;Free&lt;/td&gt;
&lt;td&gt;30K+ stars&lt;/td&gt;
&lt;td&gt;Conversational agents&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;OpenAI Assistants&lt;/td&gt;
&lt;td&gt;API&lt;/td&gt;
&lt;td&gt;Per-token&lt;/td&gt;
&lt;td&gt;N/A (deprecated Aug 2026)&lt;/td&gt;
&lt;td&gt;Native OpenAI integration&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Anthropic Tool Use&lt;/td&gt;
&lt;td&gt;API&lt;/td&gt;
&lt;td&gt;Per-token&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;td&gt;Claude-native tools&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;OpenAI deprecated their Assistants API in favor of the new Responses API, marking a significant shift toward simpler mental models. The new system replaces assistants with "prompts" that can be versioned in the dashboard and threads with "conversations" that store items beyond just messages.&lt;/p&gt;

&lt;p&gt;CrewAI positioned itself as the anti-LangChain this year — completely independent, no dependencies, built from scratch. They claim 5.76x faster execution than LangGraph in certain QA tasks and tout their lean architecture. The framework offers both autonomous "Crews" for flexible decision-making and precise "Flows" for event-driven control.&lt;/p&gt;

&lt;p&gt;AutoGen's Microsoft backing kept it relevant despite the maintenance mode announcement. The new Agent Framework promises better layered architecture with Core API for message passing, AgentChat API for rapid prototyping, and Extensions API for third-party capabilities.&lt;/p&gt;

&lt;h2&gt;
  
  
  Platform comparison
&lt;/h2&gt;

&lt;p&gt;The platform battle intensified around deployment and monitoring:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Platform&lt;/th&gt;
&lt;th&gt;Focus&lt;/th&gt;
&lt;th&gt;Pricing&lt;/th&gt;
&lt;th&gt;Notable Features&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;LangSmith&lt;/td&gt;
&lt;td&gt;Monitoring&lt;/td&gt;
&lt;td&gt;Usage-based&lt;/td&gt;
&lt;td&gt;LangChain native observability&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CrewAI AMP&lt;/td&gt;
&lt;td&gt;Enterprise control&lt;/td&gt;
&lt;td&gt;Enterprise pricing&lt;/td&gt;
&lt;td&gt;Unified control plane, 24/7 support&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AutoGen Studio&lt;/td&gt;
&lt;td&gt;No-code GUI&lt;/td&gt;
&lt;td&gt;Free&lt;/td&gt;
&lt;td&gt;Visual multi-agent workflows&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;OpenClaw&lt;/td&gt;
&lt;td&gt;Personal agents&lt;/td&gt;
&lt;td&gt;Free tier&lt;/td&gt;
&lt;td&gt;Telegram-native, cross-platform&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;OpenClaw gained traction in messaging-native agent deployment, particularly on Telegram. The platform offers personal AI assistants that integrate across devices and supports features like voice message transcription and real-time collaboration.&lt;/p&gt;

&lt;h2&gt;
  
  
  Open source model momentum
&lt;/h2&gt;

&lt;p&gt;HuggingFace download numbers revealed shifting preferences:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;moonshotai/Kimi-K2.5&lt;/code&gt; hit 955K+ downloads with 2.2K likes — Kimi adoption is accelerating&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;hexgrad/Kokoro-82M&lt;/code&gt; dominated text-to-speech with 8.1M+ downloads — tiny models are winning distribution
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;MiniMaxAI/MiniMax-M2.5&lt;/code&gt; showed 89.9K downloads — non-US models are gaining serious traction&lt;/li&gt;
&lt;li&gt;Video generation crossed from demos to repeated use: &lt;code&gt;Lightricks/LTX-2&lt;/code&gt; reached 2M+ downloads&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The pattern is clear: smaller, specialized models are eating market share from larger general-purpose systems. Developers want fast, focused tools over Swiss Army knife solutions.&lt;/p&gt;

&lt;h2&gt;
  
  
  Recent launches and announcements
&lt;/h2&gt;

&lt;p&gt;February 2026 brought several major developments:&lt;/p&gt;

&lt;p&gt;Funding rounds brought major capital influx. Fei-Fei Li's World Labs reportedly raised $1B from A16Z and Nvidia for world models. OpenAI approaches a funding round that could exceed $100B, with valuations potentially hitting $850B according to Bloomberg.&lt;/p&gt;

&lt;p&gt;Enterprise deals showed infrastructure scale. TCS and OpenAI's 100MW data center partnership signals AI infrastructure moving to utility scale. Circuit raised $30M for AI manufacturing platforms, showing vertical-specific agent demand.&lt;/p&gt;

&lt;p&gt;Technical updates accelerated across providers. Gemini 3.1 Pro went live on Vertex AI. New model releases included significant improvements in reasoning and tool use capabilities across major providers.&lt;/p&gt;

&lt;p&gt;Platform consolidation emerged around major approaches — framework-first (LangChain, CrewAI), platform-first (enterprise solutions), and API-first (OpenAI, Anthropic).&lt;/p&gt;

&lt;h2&gt;
  
  
  What this means for builders
&lt;/h2&gt;

&lt;p&gt;The agent market is maturing fast. Three trends matter most:&lt;/p&gt;

&lt;p&gt;Performance beats features every time. CrewAI's speed claims against LangChain reflect broader developer frustration with bloated frameworks. Lean, fast solutions are winning mindshare.&lt;/p&gt;

&lt;p&gt;Enterprise deployment patterns are hardening. The TCS-OpenAI deal and Accenture's promotion policies show enterprise AI is moving from experimentation to operational requirement. IT departments want monitoring, control planes, and SLA guarantees.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Messaging-native experiences&lt;/strong&gt;: Telegram bots, WhatsApp integrations, and SMS-based agents are becoming default UX patterns. The command line lost to the chat interface.&lt;/p&gt;

&lt;p&gt;If you're building agents in 2026, focus on deployment simplicity over framework complexity. The market rewarded practical tools that solve real workflow problems, not academic demonstrations of multi-agent collaboration.&lt;/p&gt;

&lt;p&gt;The infrastructure layer is consolidating around a few winners, but application opportunities remain wide open. Pick your framework based on deployment target: CrewAI for speed, LangChain for ecosystem, or native APIs for direct model integration.&lt;/p&gt;

&lt;p&gt;For more analysis on model pricing trends, read our &lt;a href="https://dev.to/blog/llm-pricing-comparison-feb-2026"&gt;LLM pricing comparison Feb 2026&lt;/a&gt; and &lt;a href="https://dev.to/blog/open-source-vs-proprietary-llms"&gt;open source vs proprietary LLMs&lt;/a&gt; breakdown.&lt;/p&gt;

&lt;p&gt;Want updates on agent market developments? &lt;a href="https://dev.to/#newsletter"&gt;Subscribe to our newsletter&lt;/a&gt; for weekly analysis of funding, launches, and technical developments.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>llm</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>Open Source vs Proprietary LLMs: The Real Cost Breakdown</title>
      <dc:creator>Kael Tiwari</dc:creator>
      <pubDate>Thu, 19 Feb 2026 13:34:17 +0000</pubDate>
      <link>https://forem.com/kaeltiwari/open-source-vs-proprietary-llms-the-real-cost-breakdown-15d0</link>
      <guid>https://forem.com/kaeltiwari/open-source-vs-proprietary-llms-the-real-cost-breakdown-15d0</guid>
      <description>&lt;p&gt;&lt;em&gt;Originally published on &lt;a href="https://kaelresearch.com/blog/open-source-vs-proprietary-llms" rel="noopener noreferrer"&gt;Kael Research&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;TL;DR:&lt;/strong&gt; Below 1B tokens/month, just use APIs. Proprietary or hosted open-source, doesn't matter much. Between 1 and 10B tokens, hosted open-source APIs from &lt;a href="https://www.together.ai/pricing" rel="noopener noreferrer"&gt;Together.ai&lt;/a&gt; or &lt;a href="https://groq.com/pricing" rel="noopener noreferrer"&gt;Groq&lt;/a&gt; are usually cheapest. Above 10B tokens/month, self-hosting can win, but only if you already have an MLOps team. The "open source is free" narrative ignores $300K to $600K/year in engineering overhead.&lt;/p&gt;

&lt;h2&gt;
  
  
  The pricing table
&lt;/h2&gt;

&lt;p&gt;Prices move fast. Here's where things stand in February 2026. All prices are per 1M tokens (input/output).&lt;/p&gt;

&lt;h3&gt;
  
  
  Open source models via hosted APIs
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Provider&lt;/th&gt;
&lt;th&gt;Input&lt;/th&gt;
&lt;th&gt;Output&lt;/th&gt;
&lt;th&gt;Notes&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Llama 4 Maverick&lt;/td&gt;
&lt;td&gt;&lt;a href="https://www.together.ai/pricing" rel="noopener noreferrer"&gt;Together.ai&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;$0.27&lt;/td&gt;
&lt;td&gt;$0.85&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Llama 4 Maverick&lt;/td&gt;
&lt;td&gt;&lt;a href="https://groq.com/pricing" rel="noopener noreferrer"&gt;Groq&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;$0.20&lt;/td&gt;
&lt;td&gt;$0.60&lt;/td&gt;
&lt;td&gt;562 tok/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GPT-OSS-120B&lt;/td&gt;
&lt;td&gt;
&lt;a href="https://www.together.ai/pricing" rel="noopener noreferrer"&gt;Together.ai&lt;/a&gt; / &lt;a href="https://fireworks.ai/pricing" rel="noopener noreferrer"&gt;Fireworks&lt;/a&gt; / &lt;a href="https://groq.com/pricing" rel="noopener noreferrer"&gt;Groq&lt;/a&gt;
&lt;/td&gt;
&lt;td&gt;$0.15&lt;/td&gt;
&lt;td&gt;$0.60&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GPT-OSS-20B&lt;/td&gt;
&lt;td&gt;&lt;a href="https://www.together.ai/pricing" rel="noopener noreferrer"&gt;Together.ai&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;$0.05&lt;/td&gt;
&lt;td&gt;$0.20&lt;/td&gt;
&lt;td&gt;Bargain tier&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DeepSeek V3.1&lt;/td&gt;
&lt;td&gt;&lt;a href="https://www.together.ai/pricing" rel="noopener noreferrer"&gt;Together.ai&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;$0.60&lt;/td&gt;
&lt;td&gt;$1.70&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Qwen3-235B&lt;/td&gt;
&lt;td&gt;&lt;a href="https://www.together.ai/pricing" rel="noopener noreferrer"&gt;Together.ai&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;$0.20&lt;/td&gt;
&lt;td&gt;$0.60&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Mistral Small 3&lt;/td&gt;
&lt;td&gt;&lt;a href="https://www.together.ai/pricing" rel="noopener noreferrer"&gt;Together.ai&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;$0.10&lt;/td&gt;
&lt;td&gt;$0.30&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Proprietary models
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Input&lt;/th&gt;
&lt;th&gt;Output&lt;/th&gt;
&lt;th&gt;Source&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;GPT-5.2&lt;/td&gt;
&lt;td&gt;$1.75&lt;/td&gt;
&lt;td&gt;$14.00&lt;/td&gt;
&lt;td&gt;&lt;a href="https://openai.com/api/pricing/" rel="noopener noreferrer"&gt;OpenAI&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GPT-5 mini&lt;/td&gt;
&lt;td&gt;$0.25&lt;/td&gt;
&lt;td&gt;$2.00&lt;/td&gt;
&lt;td&gt;&lt;a href="https://openai.com/api/pricing/" rel="noopener noreferrer"&gt;OpenAI&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Claude Opus 4.6&lt;/td&gt;
&lt;td&gt;$5.00&lt;/td&gt;
&lt;td&gt;$25.00&lt;/td&gt;
&lt;td&gt;&lt;a href="https://platform.claude.com/docs/en/about-claude/models/overview" rel="noopener noreferrer"&gt;Anthropic&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Claude Sonnet 4.6&lt;/td&gt;
&lt;td&gt;$3.00&lt;/td&gt;
&lt;td&gt;$15.00&lt;/td&gt;
&lt;td&gt;&lt;a href="https://platform.claude.com/docs/en/about-claude/models/overview" rel="noopener noreferrer"&gt;Anthropic&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Gemini 2.5 Flash&lt;/td&gt;
&lt;td&gt;$0.30&lt;/td&gt;
&lt;td&gt;$2.50&lt;/td&gt;
&lt;td&gt;Google&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;A few things jump out. GPT-OSS-120B at $0.15 input is wild. That's 11x cheaper than GPT-5.2 on the input side. GPT-5 mini and Gemini 2.5 Flash sit in a middle ground where proprietary pricing gets surprisingly close to open-source hosted rates. For a deeper dive on the month-over-month trends, see our &lt;a href="https://dev.to/blog/llm-pricing-comparison-feb-2026"&gt;full pricing comparison&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  The real comparison: API vs API vs self-hosted
&lt;/h2&gt;

&lt;p&gt;People frame this as "open source vs proprietary." That's wrong. The actual decision has three options:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Proprietary API, where you pay OpenAI, Anthropic, or Google directly&lt;/li&gt;
&lt;li&gt;Hosted open-source API, where you pay Together.ai, Groq, or Fireworks to run open models for you&lt;/li&gt;
&lt;li&gt;Self-hosted open source, where you rent GPUs and run the models yourself&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Option 2 gets overlooked constantly. You get the flexibility of open weights without the operational burden. For most companies, this is the right answer.&lt;/p&gt;

&lt;p&gt;Option 3 sounds appealing on paper. In practice, it's a staffing decision disguised as a technology decision.&lt;/p&gt;

&lt;h2&gt;
  
  
  Breakeven math at different scales
&lt;/h2&gt;

&lt;p&gt;Let's do the math for a representative setup: GPT-OSS-120B via Together.ai ($0.15/$0.60) vs self-hosting on H100s from &lt;a href="https://lambdalabs.com/" rel="noopener noreferrer"&gt;Lambda Labs&lt;/a&gt; at $2.99/hr ($2,183/mo). A single H100 running a 70B model produces roughly 50 tokens/second on average, which works out to about 130M tokens per month.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Scale (tokens/mo)&lt;/th&gt;
&lt;th&gt;Together.ai cost&lt;/th&gt;
&lt;th&gt;Self-hosted cost&lt;/th&gt;
&lt;th&gt;Winner&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;10M&lt;/td&gt;
&lt;td&gt;~$4.50&lt;/td&gt;
&lt;td&gt;$2,183 + eng. overhead&lt;/td&gt;
&lt;td&gt;API by a mile&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;100M&lt;/td&gt;
&lt;td&gt;~$45&lt;/td&gt;
&lt;td&gt;$2,183 + eng. overhead&lt;/td&gt;
&lt;td&gt;API&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1B&lt;/td&gt;
&lt;td&gt;~$450&lt;/td&gt;
&lt;td&gt;$2,183 + eng. overhead&lt;/td&gt;
&lt;td&gt;Roughly even on compute, but API wins on total cost&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;10B&lt;/td&gt;
&lt;td&gt;~$4,500&lt;/td&gt;
&lt;td&gt;~$17K compute (8× H100s) + eng. overhead&lt;/td&gt;
&lt;td&gt;Depends on your team&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The compute-only crossover hits somewhere around 1 to 2B tokens/month. But compute isn't the whole story.&lt;/p&gt;

&lt;p&gt;At AWS rates of ~$3.90/hr per H100, the math shifts even further toward APIs. Reserved instances at $1.85/hr help, but you're committing to a year of capacity. H200s at $6.00/hr and B200s at $9.00/hr from &lt;a href="https://fireworks.ai/pricing" rel="noopener noreferrer"&gt;Fireworks&lt;/a&gt; give you more throughput per dollar, but the upfront commitment grows too.&lt;/p&gt;

&lt;h2&gt;
  
  
  The hidden costs of self-hosting
&lt;/h2&gt;

&lt;p&gt;Here's the part that "open source is free" evangelists skip over.&lt;/p&gt;

&lt;p&gt;An MLOps team to keep self-hosted models running costs $300K to $600K per year. That's 2 to 4 engineers, and you're competing with every AI company on earth for that talent. Good luck hiring them quickly.&lt;/p&gt;

&lt;p&gt;Beyond salaries, you're signing up for monitoring and alerting infrastructure, model version management and rollback procedures, GPU usage tuning (most teams waste 30 to 50% of their compute), security patching and compliance audits, and on-call rotations for when inference goes sideways at 3 AM.&lt;/p&gt;

&lt;p&gt;None of this shows up in the $/token calculation. It should.&lt;/p&gt;

&lt;p&gt;There's also the upgrade treadmill. A new model drops, your fine-tuned version is two generations behind, and now you need to re-run your evaluation suite, re-tune, and redeploy. With an API provider, you change a model string.&lt;/p&gt;

&lt;h2&gt;
  
  
  When open source wins
&lt;/h2&gt;

&lt;p&gt;Open source isn't always the cheaper option, but it's sometimes the &lt;em&gt;only&lt;/em&gt; option.&lt;/p&gt;

&lt;p&gt;Compliance and data sovereignty come first. If you're operating in healthcare or finance with strict data residency requirements, self-hosted open source gives you full control. The data never leaves your infrastructure. No BAA negotiations, no hoping your provider's compliance team got it right. HIPAA and GDPR compliance by design, not by contract.&lt;/p&gt;

&lt;p&gt;Air-gapped environments are the extreme version of this. Defense, certain government agencies, some financial institutions: they can't send data to external APIs at all. Open source is the only game in town.&lt;/p&gt;

&lt;p&gt;Fine-tuning is where open source pulls ahead on cost dramatically. Training on &lt;a href="https://openai.com/api/pricing/" rel="noopener noreferrer"&gt;OpenAI's GPT-4.1 costs $25 per million tokens&lt;/a&gt;. The same job on open-source models through &lt;a href="https://fireworks.ai/pricing" rel="noopener noreferrer"&gt;Fireworks runs $0.50 per million tokens&lt;/a&gt; for models up to 16B parameters. Self-hosted, you pay only for compute. That's a 50x cost difference at the API level. If you need customized models, and many &lt;a href="https://dev.to/brief/ai-agents"&gt;agent-based architectures&lt;/a&gt; do, open source is hard to beat.&lt;/p&gt;

&lt;p&gt;High volume is the last piece. Past 10B tokens per month, the economics of self-hosting start making sense, assuming you've already got the infrastructure team. The key word is "already." Building that team from scratch to save on inference costs rarely pencils out.&lt;/p&gt;

&lt;h2&gt;
  
  
  When proprietary wins
&lt;/h2&gt;

&lt;p&gt;Speed to market is the obvious one. You can go from zero to production with GPT-5.2 or Claude Sonnet 4.6 in a weekend. No infrastructure provisioning, no model tuning, no serving framework selection. Just an API key and a credit card.&lt;/p&gt;

&lt;p&gt;Quality ceiling matters too. As of February 2026, Claude Opus 4.6 and GPT-5.2 still outperform open-source alternatives on complex reasoning tasks. The gap has narrowed (Llama 4 Maverick and Qwen3-235B are genuinely impressive) but for the hardest problems, proprietary models hold an edge. That edge costs 10 to 20x more per token, so the question is whether your use case actually needs it.&lt;/p&gt;

&lt;p&gt;No infra team is the underrated advantage. A startup with 5 engineers shouldn't be allocating 2 of them to GPU management. Use that headcount to build product instead. The API cost premium is cheaper than the hiring cost.&lt;/p&gt;

&lt;p&gt;Proprietary providers also handle the compliance paperwork for you. &lt;a href="https://openai.com/api/pricing/" rel="noopener noreferrer"&gt;OpenAI has a BAA for HIPAA&lt;/a&gt;. &lt;a href="https://platform.claude.com/docs/en/about-claude/models/overview" rel="noopener noreferrer"&gt;Anthropic is HIPAA-ready&lt;/a&gt;. Azure OpenAI gives you EU data residency. These aren't free (enterprise plans cost more) but the operational simplicity has real value.&lt;/p&gt;

&lt;h2&gt;
  
  
  Decision framework
&lt;/h2&gt;

&lt;p&gt;Forget the vibes. Use this matrix.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Factor&lt;/th&gt;
&lt;th&gt;Use proprietary API&lt;/th&gt;
&lt;th&gt;Use hosted open-source API&lt;/th&gt;
&lt;th&gt;Self-host&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Volume&lt;/td&gt;
&lt;td&gt;&amp;lt; 1B tok/mo&lt;/td&gt;
&lt;td&gt;1 to 10B tok/mo&lt;/td&gt;
&lt;td&gt;&amp;gt; 10B tok/mo&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Team size&lt;/td&gt;
&lt;td&gt;No MLOps engineers&lt;/td&gt;
&lt;td&gt;No MLOps engineers&lt;/td&gt;
&lt;td&gt;2+ MLOps engineers already on staff&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Data sensitivity&lt;/td&gt;
&lt;td&gt;Standard (with BAA if needed)&lt;/td&gt;
&lt;td&gt;Standard&lt;/td&gt;
&lt;td&gt;Air-gapped or strict residency&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Fine-tuning needed&lt;/td&gt;
&lt;td&gt;Light (prompt engineering suffices)&lt;/td&gt;
&lt;td&gt;Moderate&lt;/td&gt;
&lt;td&gt;Heavy or continuous&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Time to production&lt;/td&gt;
&lt;td&gt;Days&lt;/td&gt;
&lt;td&gt;Days&lt;/td&gt;
&lt;td&gt;Weeks to months&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Quality requirements&lt;/td&gt;
&lt;td&gt;Highest available&lt;/td&gt;
&lt;td&gt;Good enough&lt;/td&gt;
&lt;td&gt;Good enough + customized&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;My honest take: most companies should start with proprietary APIs, move to hosted open-source APIs as volume grows, and only self-host when they're processing billions of tokens and already have the team. The middle option, hosted open source, is the most underused path. And it's often the best one.&lt;/p&gt;

&lt;p&gt;The market is moving fast. Prices on this page will be outdated within weeks. We track changes monthly in our &lt;a href="https://dev.to/blog/llm-pricing-comparison-feb-2026"&gt;LLM pricing comparison&lt;/a&gt;, and if you want updates when the math shifts, &lt;a href="https://dev.to/#newsletter"&gt;join the newsletter&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>opensource</category>
      <category>llm</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>LLM Pricing in February 2026: What Every Model Actually Costs</title>
      <dc:creator>Kael Tiwari</dc:creator>
      <pubDate>Thu, 19 Feb 2026 13:34:15 +0000</pubDate>
      <link>https://forem.com/kaeltiwari/llm-pricing-in-february-2026-what-every-model-actually-costs-3jdd</link>
      <guid>https://forem.com/kaeltiwari/llm-pricing-in-february-2026-what-every-model-actually-costs-3jdd</guid>
      <description>&lt;p&gt;&lt;em&gt;Originally published on &lt;a href="https://kaelresearch.com/blog/llm-pricing-comparison-feb-2026" rel="noopener noreferrer"&gt;Kael Research&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;TL;DR:&lt;/strong&gt; Cheapest option is OpenAI's open-source GPT-OSS-20B at $0.05/M input. Best value is GPT-5 mini at $0.25/M. Most expensive is Grok-4 at $30/M — 600x more than GPT-OSS-20B. Claude Opus 4.6 dropped to $5/$25 (down from $15/$75 on Opus 4). Full table with 18 models below.&lt;/p&gt;




&lt;p&gt;If you're building on top of LLMs right now, you're probably spending more than you need to. Pricing has changed so fast over the past year that most teams are running on outdated assumptions.&lt;/p&gt;

&lt;p&gt;Here's what every major model actually costs as of February 2026, with the context that matters for choosing between them.&lt;/p&gt;

&lt;h2&gt;
  
  
  The full pricing table
&lt;/h2&gt;

&lt;p&gt;All prices are per million tokens.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Provider&lt;/th&gt;
&lt;th&gt;Input&lt;/th&gt;
&lt;th&gt;Output&lt;/th&gt;
&lt;th&gt;Notes&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;GPT-5.2&lt;/td&gt;
&lt;td&gt;OpenAI&lt;/td&gt;
&lt;td&gt;$1.75&lt;/td&gt;
&lt;td&gt;$14.00&lt;/td&gt;
&lt;td&gt;Flagship, best overall quality&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GPT-5 mini&lt;/td&gt;
&lt;td&gt;OpenAI&lt;/td&gt;
&lt;td&gt;$0.25&lt;/td&gt;
&lt;td&gt;$2.00&lt;/td&gt;
&lt;td&gt;Best price/performance ratio&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GPT-4.1&lt;/td&gt;
&lt;td&gt;OpenAI&lt;/td&gt;
&lt;td&gt;$2.00&lt;/td&gt;
&lt;td&gt;$8.00&lt;/td&gt;
&lt;td&gt;Still widely deployed&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GPT-4.1 nano&lt;/td&gt;
&lt;td&gt;OpenAI&lt;/td&gt;
&lt;td&gt;$0.10&lt;/td&gt;
&lt;td&gt;$0.40&lt;/td&gt;
&lt;td&gt;Cheapest OpenAI option&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;o4-mini&lt;/td&gt;
&lt;td&gt;OpenAI&lt;/td&gt;
&lt;td&gt;$1.10&lt;/td&gt;
&lt;td&gt;$4.40&lt;/td&gt;
&lt;td&gt;Reasoning model&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Claude Opus 4.6&lt;/td&gt;
&lt;td&gt;Anthropic&lt;/td&gt;
&lt;td&gt;$5.00&lt;/td&gt;
&lt;td&gt;$25.00&lt;/td&gt;
&lt;td&gt;Top-tier reasoning + coding&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Claude Sonnet 4.6&lt;/td&gt;
&lt;td&gt;Anthropic&lt;/td&gt;
&lt;td&gt;$3.00&lt;/td&gt;
&lt;td&gt;$15.00&lt;/td&gt;
&lt;td&gt;Workhorse model&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Claude Haiku 4.5&lt;/td&gt;
&lt;td&gt;Anthropic&lt;/td&gt;
&lt;td&gt;$1.00&lt;/td&gt;
&lt;td&gt;$5.00&lt;/td&gt;
&lt;td&gt;Fast + cheap&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GPT-OSS-120B&lt;/td&gt;
&lt;td&gt;OpenAI (open-source)&lt;/td&gt;
&lt;td&gt;$0.15&lt;/td&gt;
&lt;td&gt;$0.60&lt;/td&gt;
&lt;td&gt;Open-weight, via hosted APIs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GPT-OSS-20B&lt;/td&gt;
&lt;td&gt;OpenAI (open-source)&lt;/td&gt;
&lt;td&gt;$0.05&lt;/td&gt;
&lt;td&gt;$0.20&lt;/td&gt;
&lt;td&gt;Smallest open-weight option&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Gemini 2.5 Flash&lt;/td&gt;
&lt;td&gt;Google&lt;/td&gt;
&lt;td&gt;$0.30&lt;/td&gt;
&lt;td&gt;$2.50&lt;/td&gt;
&lt;td&gt;Strong on long context&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Gemini 2.0 Flash&lt;/td&gt;
&lt;td&gt;Google&lt;/td&gt;
&lt;td&gt;$0.10&lt;/td&gt;
&lt;td&gt;$0.40&lt;/td&gt;
&lt;td&gt;Budget tier&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Llama 4 Maverick&lt;/td&gt;
&lt;td&gt;Meta (via API)&lt;/td&gt;
&lt;td&gt;$0.27&lt;/td&gt;
&lt;td&gt;$0.85&lt;/td&gt;
&lt;td&gt;Open-weight, self-hostable&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DeepSeek V3.1&lt;/td&gt;
&lt;td&gt;DeepSeek&lt;/td&gt;
&lt;td&gt;$0.60&lt;/td&gt;
&lt;td&gt;$1.70&lt;/td&gt;
&lt;td&gt;Chinese lab, surprisingly strong&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Grok-4&lt;/td&gt;
&lt;td&gt;xAI&lt;/td&gt;
&lt;td&gt;$30.00&lt;/td&gt;
&lt;td&gt;$150.00&lt;/td&gt;
&lt;td&gt;Most expensive model on market&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Grok-4-fast&lt;/td&gt;
&lt;td&gt;xAI&lt;/td&gt;
&lt;td&gt;$2.00&lt;/td&gt;
&lt;td&gt;$5.00&lt;/td&gt;
&lt;td&gt;xAI's mid-tier&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Grok-3&lt;/td&gt;
&lt;td&gt;xAI&lt;/td&gt;
&lt;td&gt;$30.00&lt;/td&gt;
&lt;td&gt;$150.00&lt;/td&gt;
&lt;td&gt;Previous gen, same price as Grok-4&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Grok-3-mini&lt;/td&gt;
&lt;td&gt;xAI&lt;/td&gt;
&lt;td&gt;$3.00&lt;/td&gt;
&lt;td&gt;$5.00&lt;/td&gt;
&lt;td&gt;Budget reasoning&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;em&gt;Sources: &lt;a href="https://openai.com/api/pricing/" rel="noopener noreferrer"&gt;OpenAI pricing&lt;/a&gt;, &lt;a href="https://platform.claude.com/docs/en/about-claude/models/overview" rel="noopener noreferrer"&gt;Anthropic models&lt;/a&gt;, &lt;a href="https://ai.google.dev/pricing" rel="noopener noreferrer"&gt;Google AI pricing&lt;/a&gt;, &lt;a href="https://docs.x.ai/docs/models#models-and-pricing" rel="noopener noreferrer"&gt;xAI pricing&lt;/a&gt;, &lt;a href="https://api-docs.deepseek.com/quick_start/pricing" rel="noopener noreferrer"&gt;DeepSeek pricing&lt;/a&gt;, &lt;a href="https://www.together.ai/pricing" rel="noopener noreferrer"&gt;Together.ai&lt;/a&gt;, &lt;a href="https://groq.com/pricing" rel="noopener noreferrer"&gt;Groq&lt;/a&gt; for open-source model hosting. All checked February 19, 2026.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What stands out
&lt;/h2&gt;

&lt;p&gt;The gap between cheapest and most expensive is staggering. GPT-OSS-20B at $0.05/M input vs Grok-4 at $30/M input. That's 600x. Even comparing production-grade models, GPT-5 mini at $0.25/M vs Claude Opus 4.6 at $5/M is a 20x spread. For most workloads, the cheaper models handle 80%+ of tasks just fine.&lt;/p&gt;

&lt;p&gt;xAI is pricing itself out. Grok-4 at $30/$150 per million tokens is the most expensive API on the market. That's 6x Claude Opus 4.6 and 17x GPT-5.2 on input. Unless you need something Grok does better (hard to name what that is), the pricing makes no sense for production use.&lt;/p&gt;

&lt;p&gt;Google is quietly the cheapest. Gemini 2.0 Flash at $0.10/$0.40 matches GPT-4.1 nano and undercuts almost everything else. If your use case tolerates the quality tradeoff, it's the best deal available.&lt;/p&gt;

&lt;p&gt;Open-weight models changed the math. Llama 4 Maverick at $0.27/$0.85 through hosted APIs is cheap, but the real story is self-hosting. Running Llama on your own GPUs drops the effective cost below $0.10/M tokens for input. The breakeven vs API depends on volume, but for companies doing 10B+ tokens/month, self-hosting wins.&lt;/p&gt;

&lt;h2&gt;
  
  
  Beyond the price tag
&lt;/h2&gt;

&lt;p&gt;The table is just the start. What actually matters:&lt;/p&gt;

&lt;p&gt;Output tokens cost 3-8x more than input. This is consistent across every provider. If your app generates long responses (code, reports, content), output cost dominates your bill. Trim your outputs.&lt;/p&gt;

&lt;p&gt;Caching changes everything. &lt;a href="https://platform.openai.com/docs/guides/prompt-caching" rel="noopener noreferrer"&gt;OpenAI&lt;/a&gt; and &lt;a href="https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching" rel="noopener noreferrer"&gt;Anthropic&lt;/a&gt; both offer prompt caching that cuts repeat-context costs by 50-90%. If you're sending the same system prompt or few-shot examples on every call, caching alone might cut your bill in half.&lt;/p&gt;

&lt;p&gt;Quality gaps are shrinking. A year ago, there was a clear hierarchy: GPT-4 &amp;gt; Claude 3 &amp;gt; everything else. Now GPT-5 mini, Claude Sonnet 4, and Gemini 2.5 Flash are all competitive for most tasks. The premium models (GPT-5.2, Opus 4) still win on complex reasoning and long-form analysis, but the gap keeps closing.&lt;/p&gt;

&lt;p&gt;Latency matters more than price. The cheapest model that takes 8 seconds to respond might cost you more in user drop-off than a 2x pricier model that responds in 1.5 seconds. Benchmark latency alongside cost.&lt;/p&gt;

&lt;h2&gt;
  
  
  Who should use what
&lt;/h2&gt;

&lt;p&gt;High-volume production (chatbots, classification, extraction): GPT-5 mini or Gemini 2.0 Flash. Both under $0.50/M input with solid quality.&lt;/p&gt;

&lt;p&gt;Code generation: Claude Sonnet 4 or GPT-5.2. Sonnet is generally better at following complex coding instructions. GPT-5.2 has an edge on multi-file refactoring.&lt;/p&gt;

&lt;p&gt;Research and analysis: Claude Opus 4.6 if budget allows ($5/$25 is much more reasonable than the old Opus 4 pricing). GPT-5.2 if not.&lt;/p&gt;

&lt;p&gt;Cost-sensitive startups: Llama 4 Maverick self-hosted, or GPT-4.1 nano for API. Get to market first, pick the right model later.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's next
&lt;/h2&gt;

&lt;p&gt;Pricing has dropped roughly 10x per year for equivalent quality over the past three years. There's no reason to think that stops. By Q4 2026, expect GPT-5 mini-equivalent quality at $0.05/M input or less.&lt;/p&gt;

&lt;p&gt;The real shift is happening at the infrastructure layer. Custom silicon (Google TPUs, Amazon Trainium, Microsoft Maia) is starting to undercut Nvidia GPU economics. As that scales, hosted API pricing will drop faster than self-hosting costs — potentially flipping the build-vs-buy calculation for mid-size companies.&lt;/p&gt;

&lt;p&gt;We'll update this comparison monthly. &lt;a href="https://dev.to/#newsletter"&gt;Subscribe to get updates&lt;/a&gt; when pricing changes.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;This analysis is part of Kael Research's ongoing coverage of AI market economics. We track pricing, adoption, and competition across the AI industry. &lt;a href="https://dev.to/briefs"&gt;See our full research briefs&lt;/a&gt; for deeper analysis on specific markets.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>pricing</category>
      <category>machinelearning</category>
    </item>
  </channel>
</rss>
