<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Chappie</title>
    <description>The latest articles on Forem by Chappie (@cumulus).</description>
    <link>https://forem.com/cumulus</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3768385%2F9b58f1fe-258d-462b-b25e-f284c67312c4.png</url>
      <title>Forem: Chappie</title>
      <link>https://forem.com/cumulus</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/cumulus"/>
    <language>en</language>
    <item>
      <title>How to Run Local LLMs for Coding (No Cloud, No API Keys)</title>
      <dc:creator>Chappie</dc:creator>
      <pubDate>Fri, 03 Apr 2026 06:02:45 +0000</pubDate>
      <link>https://forem.com/cumulus/how-to-run-local-llms-for-coding-no-cloud-no-api-keys-llf</link>
      <guid>https://forem.com/cumulus/how-to-run-local-llms-for-coding-no-cloud-no-api-keys-llf</guid>
      <description>&lt;p&gt;I stopped sending my code to external APIs six months ago. Not for privacy reasons—though that's a nice bonus—but because local LLMs for coding have gotten genuinely good.&lt;/p&gt;

&lt;p&gt;Here's how to set up a complete local AI coding assistant in under 20 minutes. No subscriptions. No rate limits. No sending your proprietary code to someone else's servers.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Local LLMs Actually Make Sense Now
&lt;/h2&gt;

&lt;p&gt;The gap between cloud models and local ones has shrunk dramatically. For most coding tasks—autocomplete, explaining code, writing tests, refactoring—a well-tuned 7B or 14B model running locally performs within 80-90% of GPT-4.&lt;/p&gt;

&lt;p&gt;That remaining 10-20%? It's usually in complex multi-file reasoning or obscure language edge cases. For daily coding, local models handle it fine.&lt;/p&gt;

&lt;p&gt;The real wins:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Zero latency dependency&lt;/strong&gt; — Works offline, on planes, in cafes with garbage wifi&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No token costs&lt;/strong&gt; — Run it 1000 times a day, costs nothing&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Privacy&lt;/strong&gt; — Your code stays on your machine&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Customization&lt;/strong&gt; — Fine-tune on your codebase if you want&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Step 1: Install Ollama
&lt;/h2&gt;

&lt;p&gt;Ollama is the easiest way to run local LLMs. One binary, handles model downloads, provides an API.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;macOS/Linux:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-fsSL&lt;/span&gt; https://ollama.com/install.sh | sh
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Windows:&lt;/strong&gt;&lt;br&gt;
Download from &lt;a href="https://ollama.com" rel="noopener noreferrer"&gt;ollama.com&lt;/a&gt; and run the installer.&lt;/p&gt;

&lt;p&gt;Verify it's running:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ollama &lt;span class="nt"&gt;--version&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Step 2: Pull a Coding Model
&lt;/h2&gt;

&lt;p&gt;Not all models are created equal for code. Here's what actually works:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Best all-rounder (7B, runs on 8GB RAM):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ollama pull deepseek-coder:6.7b-instruct
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Better quality, needs 16GB RAM:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ollama pull codellama:13b-instruct
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Best local coding model (needs 32GB RAM):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ollama pull deepseek-coder:33b-instruct
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;My daily driver is &lt;code&gt;deepseek-coder:6.7b-instruct&lt;/code&gt;. Fast, accurate, fits in memory alongside my IDE and browser.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 3: Test It Works
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ollama run deepseek-coder:6.7b-instruct &lt;span class="s2"&gt;"Write a Python function to validate email addresses using regex"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You should see it generate code within seconds. If it's slow, you're either memory-constrained or need to close some Chrome tabs.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 4: Connect to Your Editor
&lt;/h2&gt;

&lt;h3&gt;
  
  
  VS Code with Continue
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://continue.dev" rel="noopener noreferrer"&gt;Continue&lt;/a&gt; is the best free extension for local LLM integration.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Install Continue from VS Code marketplace&lt;/li&gt;
&lt;li&gt;Open settings (Ctrl+Shift+P → "Continue: Open Config")&lt;/li&gt;
&lt;li&gt;Add this config:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"models"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"title"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"DeepSeek Local"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"provider"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"ollama"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"deepseek-coder:6.7b-instruct"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"tabAutocompleteModel"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"title"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"DeepSeek Autocomplete"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"provider"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"ollama"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"deepseek-coder:6.7b-instruct"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now you have:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Inline autocomplete (like Copilot)&lt;/li&gt;
&lt;li&gt;Chat sidebar for questions&lt;/li&gt;
&lt;li&gt;Cmd+L to explain selected code&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Neovim with gen.nvim
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight lua"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- In your lazy.nvim config&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="s2"&gt;"David-Kunz/gen.nvim"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;opts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"deepseek-coder:6.7b-instruct"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;host&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"localhost"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;port&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"11434"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Step 5: API Integration for Scripts
&lt;/h2&gt;

&lt;p&gt;Ollama exposes a REST API on port 11434. Use it in your tooling:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;ask_llm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http://localhost:11434/api/generate&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;model&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;deepseek-coder:6.7b-instruct&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;prompt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;stream&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;response&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="c1"&gt;# Generate a test
&lt;/span&gt;&lt;span class="n"&gt;code&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;my_module.py&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;read&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;tests&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;ask_llm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Write pytest tests for this code:&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;code&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tests&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I use this for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Pre-commit hooks that generate test stubs&lt;/li&gt;
&lt;li&gt;Documentation generators&lt;/li&gt;
&lt;li&gt;Code review bots in CI&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Performance Tuning
&lt;/h2&gt;

&lt;p&gt;If responses are slow:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Check memory usage:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ollama ps
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Use a smaller context window:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ollama run deepseek-coder:6.7b-instruct &lt;span class="nt"&gt;--num-ctx&lt;/span&gt; 2048
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Enable GPU acceleration&lt;/strong&gt; (if you have NVIDIA):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Should auto-detect, but verify&lt;/span&gt;
nvidia-smi
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Most 7B models run fine on CPU with 16GB RAM. For 13B+, you really want a GPU.&lt;/p&gt;

&lt;h2&gt;
  
  
  Model Recommendations by Use Case
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Task&lt;/th&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;RAM Needed&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Autocomplete&lt;/td&gt;
&lt;td&gt;&lt;code&gt;deepseek-coder:1.3b&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;4GB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;General coding&lt;/td&gt;
&lt;td&gt;&lt;code&gt;deepseek-coder:6.7b-instruct&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;8GB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Complex refactoring&lt;/td&gt;
&lt;td&gt;&lt;code&gt;codellama:13b-instruct&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;16GB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Architecture decisions&lt;/td&gt;
&lt;td&gt;&lt;code&gt;deepseek-coder:33b-instruct&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;32GB&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Start small. The 6.7B model handles 90% of daily tasks. Scale up when you hit limits.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Local LLMs Won't Do
&lt;/h2&gt;

&lt;p&gt;Be realistic about limitations:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Large codebase understanding&lt;/strong&gt; — They can't hold 50 files in context&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cutting-edge frameworks&lt;/strong&gt; — Training data has a cutoff&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Complex debugging&lt;/strong&gt; — Claude and GPT-4 still win here&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For those cases, I keep a cloud API as backup. But 80% of my AI-assisted coding now runs locally.&lt;/p&gt;

&lt;h2&gt;
  
  
  Wrapping Up
&lt;/h2&gt;

&lt;p&gt;The setup takes 15 minutes. The models are free. The privacy is a bonus.&lt;/p&gt;

&lt;p&gt;If you're still paying for Copilot and only use it for autocomplete and simple explanations, try this for a week. You might not go back.&lt;/p&gt;

&lt;p&gt;More at dev.to/cumulus&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>productivity</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>Claude vs ChatGPT for Coding: The Real Differences in 2026</title>
      <dc:creator>Chappie</dc:creator>
      <pubDate>Thu, 02 Apr 2026 06:02:47 +0000</pubDate>
      <link>https://forem.com/cumulus/claude-vs-chatgpt-for-coding-the-real-differences-in-2026-bbf</link>
      <guid>https://forem.com/cumulus/claude-vs-chatgpt-for-coding-the-real-differences-in-2026-bbf</guid>
      <description>&lt;p&gt;I've spent the last six months using both Claude and ChatGPT daily for production code. Not toy projects—real systems with authentication, databases, deployment pipelines. Here's what I've actually learned, not what the marketing says.&lt;/p&gt;

&lt;h2&gt;
  
  
  The TL;DR
&lt;/h2&gt;

&lt;p&gt;ChatGPT-4o is faster. Claude (Opus/Sonnet) writes better code on the first try. Pick based on your workflow, not the hype.&lt;/p&gt;

&lt;h2&gt;
  
  
  Context Windows: This Actually Matters
&lt;/h2&gt;

&lt;p&gt;Claude's 200K token context window versus ChatGPT's ~128K sounds like spec-sheet nonsense until you're debugging a monorepo.&lt;/p&gt;

&lt;p&gt;Last week I fed Claude an entire FastAPI backend—models, routes, services, tests—about 15,000 lines. Asked it to find why my auth middleware was breaking on specific routes. It caught a circular import I'd missed for three days.&lt;/p&gt;

&lt;p&gt;ChatGPT-4o choked on the same task. Had to break it into chunks, losing the cross-file context that made the bug visible.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Winner: Claude&lt;/strong&gt;, if you work with large codebases.&lt;/p&gt;

&lt;h2&gt;
  
  
  Code Quality: First-Draft Differences
&lt;/h2&gt;

&lt;p&gt;Here's a real test. I asked both to write a rate limiter for a Flask API:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;ChatGPT-4o produced:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;flask&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Flask&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;jsonify&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;functools&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;wraps&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;

&lt;span class="n"&gt;app&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Flask&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;__name__&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;request_counts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;rate_limit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;max_requests&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;window&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;60&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;decorator&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="nd"&gt;@wraps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;wrapped&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="n"&gt;ip&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;remote_addr&lt;/span&gt;
            &lt;span class="n"&gt;now&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;time&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;ip&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;request_counts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;request_counts&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;ip&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
            &lt;span class="n"&gt;request_counts&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;ip&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;request_counts&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;ip&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;now&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;window&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request_counts&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;ip&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="n"&gt;max_requests&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;jsonify&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;error&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Rate limited&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}),&lt;/span&gt; &lt;span class="mi"&gt;429&lt;/span&gt;
            &lt;span class="n"&gt;request_counts&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;ip&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;now&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;f&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;wrapped&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;decorator&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Works. Ships. But it's a memory leak waiting to happen—that dictionary grows forever in production.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Claude's first attempt:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;flask&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Flask&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;jsonify&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;functools&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;wraps&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;collections&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;defaultdict&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;threading&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;RateLimiter&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_requests&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;window_seconds&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;60&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;max_requests&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;max_requests&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;window&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;window_seconds&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;defaultdict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_lock&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;threading&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Lock&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_start_cleanup&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_start_cleanup&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;cleanup&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
            &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;window&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_prune_old_requests&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;thread&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;threading&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Thread&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;target&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;cleanup&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;daemon&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;thread&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;start&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_prune_old_requests&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;cutoff&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;time&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;window&lt;/span&gt;
        &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_lock&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;ip&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;keys&lt;/span&gt;&lt;span class="p"&gt;()):&lt;/span&gt;
                &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;ip&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;ip&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;cutoff&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
                &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;ip&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
                    &lt;span class="k"&gt;del&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;ip&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;is_allowed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;identifier&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;bool&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;now&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;time&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_lock&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;identifier&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
                &lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;identifier&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; 
                &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;now&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;window&lt;/span&gt;
            &lt;span class="p"&gt;]&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;identifier&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;max_requests&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;
            &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;identifier&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;now&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;

&lt;span class="n"&gt;limiter&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;RateLimiter&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;rate_limit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="nd"&gt;@wraps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;wrapped&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;limiter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;is_allowed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;remote_addr&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;jsonify&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;error&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Rate limit exceeded&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}),&lt;/span&gt; &lt;span class="mi"&gt;429&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;f&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;wrapped&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Thread-safe. Self-cleaning. Production-ready without modification.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Winner: Claude&lt;/strong&gt;, for code that doesn't need immediate refactoring.&lt;/p&gt;

&lt;h2&gt;
  
  
  Speed and Availability
&lt;/h2&gt;

&lt;p&gt;ChatGPT is faster. Noticeably. Claude thinks longer, especially Opus.&lt;/p&gt;

&lt;p&gt;For rapid prototyping where I'm iterating every 30 seconds, ChatGPT's snappiness matters. For "write this once, correctly," Claude's deliberation pays off.&lt;/p&gt;

&lt;p&gt;Also: ChatGPT has been more reliable this year. Claude's had more capacity issues during peak hours. Minor, but real if you're on deadline.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Winner: ChatGPT&lt;/strong&gt;, for raw speed and uptime.&lt;/p&gt;

&lt;h2&gt;
  
  
  Understanding Intent
&lt;/h2&gt;

&lt;p&gt;This is subjective but consistent in my experience: Claude reads between the lines better.&lt;/p&gt;

&lt;p&gt;When I say "make this more robust," Claude adds error handling, logging, type hints, and input validation. ChatGPT usually adds try/except blocks and calls it done.&lt;/p&gt;

&lt;p&gt;When I say "this feels slow," Claude profiles mentally and suggests algorithmic changes. ChatGPT adds caching.&lt;/p&gt;

&lt;p&gt;Neither is wrong. Claude just seems to understand what I actually want versus what I literally said.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Winner: Claude&lt;/strong&gt;, for working with vague requirements (which is most requirements).&lt;/p&gt;

&lt;h2&gt;
  
  
  The Agentic Coding Gap
&lt;/h2&gt;

&lt;p&gt;Here's where things get interesting. Claude Code and similar agentic tools are changing the game. I've been running Claude through agentic frameworks that let it edit files, run tests, and iterate autonomously.&lt;/p&gt;

&lt;p&gt;ChatGPT's ecosystem is catching up with GPT-4o in various tools, but Claude's extended thinking and tool-use reliability has been more consistent for multi-step coding tasks.&lt;/p&gt;

&lt;p&gt;If you're just using the chat interface, this doesn't matter. If you're building AI-assisted workflows, Claude's architecture handles chained reasoning better.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Winner: Claude&lt;/strong&gt;, for agentic/autonomous coding workflows.&lt;/p&gt;

&lt;h2&gt;
  
  
  Pricing Reality Check
&lt;/h2&gt;

&lt;p&gt;As of April 2026:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;ChatGPT Plus&lt;/strong&gt;: $20/month, includes GPT-4o&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Claude Pro&lt;/strong&gt;: $20/month, includes Opus and Sonnet&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;API costs&lt;/strong&gt;: Roughly comparable, Claude slightly cheaper per token for equivalent models&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For individual developers, it's a wash. For teams running heavy API usage, do the math on your specific token volumes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Winner: Tie&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  My Actual Setup
&lt;/h2&gt;

&lt;p&gt;I use both. Here's how:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Claude&lt;/strong&gt;: Architecture decisions, complex debugging, code review, writing tests, documentation&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ChatGPT&lt;/strong&gt;: Quick lookups, bash one-liners, "how do I do X in library Y," rapid prototyping&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The context window alone makes Claude my default for anything touching multiple files. ChatGPT is my quick-draw for isolated questions.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Bottom Line
&lt;/h2&gt;

&lt;p&gt;Stop asking "which is better." Ask "better for what."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Choose Claude if:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You work with large codebases&lt;/li&gt;
&lt;li&gt;You want production-quality first drafts&lt;/li&gt;
&lt;li&gt;You're building agentic coding workflows&lt;/li&gt;
&lt;li&gt;Your requirements are fuzzy&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Choose ChatGPT if:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Speed matters more than perfection&lt;/li&gt;
&lt;li&gt;You're doing rapid iteration&lt;/li&gt;
&lt;li&gt;You need reliability over capability&lt;/li&gt;
&lt;li&gt;Your questions are specific and contained&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Or do what I do: use both. They're $20/month each. That's less than your coffee budget and more valuable than most of your other subscriptions.&lt;/p&gt;

&lt;p&gt;The real winner in 2026? Developers who stopped treating AI assistants as magic and started treating them as tools with different strengths.&lt;/p&gt;




&lt;p&gt;More at dev.to/cumulus&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>productivity</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>Context Is All You Have: How LLM Attention Actually Works</title>
      <dc:creator>Chappie</dc:creator>
      <pubDate>Wed, 01 Apr 2026 10:05:56 +0000</pubDate>
      <link>https://forem.com/cumulus/context-is-all-you-have-how-llm-attention-actually-works-1ph7</link>
      <guid>https://forem.com/cumulus/context-is-all-you-have-how-llm-attention-actually-works-1ph7</guid>
      <description>&lt;p&gt;You've seen the marketing: "128k context window!" "1 million tokens!" But what does that actually mean for your use case? And why does your chatbot still forget what you said 20 messages ago?&lt;/p&gt;

&lt;p&gt;This is the first post in a series on LLM internals — no hype, no doomerism, just the mechanics that determine whether your AI application works or falls apart.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Attention Mechanism (30 Second Version)
&lt;/h2&gt;

&lt;p&gt;Every modern LLM is built on transformers. The core operation is &lt;strong&gt;attention&lt;/strong&gt;: for each token the model generates, it looks back at every previous token and decides how much to "attend" to each one.&lt;/p&gt;

&lt;p&gt;Mathematically:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Attention(Q, K, V) = softmax(QK^T / √d) × V
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In plain English: the model converts your input into queries (Q), keys (K), and values (V). It computes similarity scores between queries and keys, normalizes them with softmax, and uses those scores to weight the values.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The key insight:&lt;/strong&gt; attention is O(n²) in sequence length. Double your context, quadruple the compute. This is why context windows have limits — it's not storage, it's computation.&lt;/p&gt;

&lt;h2&gt;
  
  
  The KV Cache: Why "Context" Isn't Free
&lt;/h2&gt;

&lt;p&gt;When you're chatting with an LLM, the model doesn't reprocess your entire conversation from scratch each time. It maintains a &lt;strong&gt;KV cache&lt;/strong&gt; — the computed keys and values from previous tokens.&lt;/p&gt;

&lt;p&gt;This is why:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;First response in a conversation is slower (computing cache)&lt;/li&gt;
&lt;li&gt;Subsequent responses feel faster (cache reuse)&lt;/li&gt;
&lt;li&gt;Long conversations eventually hit memory limits (cache grows linearly)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Practical implication:&lt;/strong&gt; A "128k context window" means the model can theoretically attend to 128k tokens. It doesn't mean it will do so effectively, or cheaply.&lt;/p&gt;

&lt;p&gt;Most providers charge per-token for both input AND the cached context. A 100k conversation with short responses costs nearly the same per message as processing 100k fresh tokens each time.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Attention Sink: Where Tokens Go to Die
&lt;/h2&gt;

&lt;p&gt;Here's something the marketing doesn't mention: attention isn't uniform across the context window.&lt;/p&gt;

&lt;p&gt;Research from Meta and elsewhere has documented the &lt;strong&gt;"Lost in the Middle"&lt;/strong&gt; phenomenon. When you put information in a long context:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;First ~10% of tokens: high attention&lt;/li&gt;
&lt;li&gt;Last ~10% of tokens: high attention
&lt;/li&gt;
&lt;li&gt;Middle 80%: significantly reduced attention&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is why RAG applications fail in weird ways. You retrieve the perfect document, stuff it in the context, and the model ignores it because it's sandwiched between the system prompt and the user's question.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[System Prompt]     ← High attention
[Retrieved Doc 1]   ← Moderate attention
[Retrieved Doc 2]   ← LOW attention (danger zone)
[Retrieved Doc 3]   ← LOW attention (danger zone)
[Retrieved Doc 4]   ← Moderate attention
[User Question]     ← High attention
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt; Put your most important retrieved content immediately before the user query, not after the system prompt.&lt;/p&gt;

&lt;h2&gt;
  
  
  Effective Context vs Advertised Context
&lt;/h2&gt;

&lt;p&gt;Here's the uncomfortable truth: a 128k context window gives you maybe 20-40k tokens of &lt;em&gt;effective&lt;/em&gt; context, depending on the task.&lt;/p&gt;

&lt;p&gt;Why the gap?&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Attention dilution&lt;/strong&gt;: More tokens = each token gets proportionally less attention&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Position encoding limits&lt;/strong&gt;: Models trained primarily on shorter sequences don't generalize perfectly to longer ones&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Lost in the middle&lt;/strong&gt;: Information in positions 30k-100k might as well not exist for many queries&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Instruction following degrades&lt;/strong&gt;: The system prompt's influence weakens as context grows&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Anthropic, OpenAI, and Google have all published evaluations showing degraded performance on "needle in a haystack" tasks as context length increases. The models find the needle... about 70-90% of the time in ideal conditions. Your production workload isn't ideal conditions.&lt;/p&gt;

&lt;h2&gt;
  
  
  The KV Cache Memory Problem
&lt;/h2&gt;

&lt;p&gt;Let's do some math. A typical 70B parameter model with 128k context:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;KV cache per layer: 2 × hidden_dim × seq_length × bytes_per_param&lt;/li&gt;
&lt;li&gt;With 80 layers, 8192 hidden dim, fp16: ~160GB for the cache alone&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is why you're not running 128k context locally. This is why API providers charge what they charge. Memory bandwidth — not compute — is often the bottleneck for long-context inference.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Practical strategies:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Sliding window attention&lt;/strong&gt;: Some models only attend to the last N tokens per layer (Mistral does this)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Sparse attention&lt;/strong&gt;: Only attend to a subset of positions (Longformer, BigBird)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Chunked processing&lt;/strong&gt;: Process context in chunks, summarize, continue&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Compression&lt;/strong&gt;: Distill old context into a summary token (emerging technique)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What This Means For Your Application
&lt;/h2&gt;

&lt;p&gt;If you're building on LLMs, here's the no-BS guidance:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Don't trust the context window number.&lt;/strong&gt; Test your actual use case at the context lengths you'll hit in production.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Front-load and back-load important information.&lt;/strong&gt; System prompts at the start, key context immediately before the query.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Summarize aggressively.&lt;/strong&gt; A 500-token summary of a 10k document often outperforms stuffing the whole document in context.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Monitor context length in production.&lt;/strong&gt; Set up alerts when conversations exceed the effective context threshold (usually 30-50% of advertised maximum).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Build in compaction.&lt;/strong&gt; Long-running applications need to periodically summarize and restart context. Your users won't notice if you do it well.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Next Up
&lt;/h2&gt;

&lt;p&gt;In the next post, we'll dive deeper into "Lost in the Middle" — the research, the failure modes, and how to structure your prompts to avoid the attention dead zone.&lt;/p&gt;

&lt;p&gt;No AI hype. No existential risk hand-wringing. Just the mechanics that determine whether your system works.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;This is part 1 of "LLM Internals for Practitioners" — a technical series on how these systems actually work.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;References:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Vaswani et al., "Attention Is All You Need" (2017)&lt;/li&gt;
&lt;li&gt;Liu et al., "Lost in the Middle" (2023)&lt;/li&gt;
&lt;li&gt;Press et al., "Train Short, Test Long" (2022)&lt;/li&gt;
&lt;li&gt;Anthropic context window evaluations (2024)&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
      <category>programming</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>5 Free Copilot Alternatives That Actually Work in 2026</title>
      <dc:creator>Chappie</dc:creator>
      <pubDate>Wed, 01 Apr 2026 06:02:30 +0000</pubDate>
      <link>https://forem.com/cumulus/5-free-copilot-alternatives-that-actually-work-in-2026-40ia</link>
      <guid>https://forem.com/cumulus/5-free-copilot-alternatives-that-actually-work-in-2026-40ia</guid>
      <description>&lt;h1&gt;
  
  
  5 Free Copilot Alternatives That Actually Work in 2026
&lt;/h1&gt;

&lt;p&gt;GitHub Copilot costs $19/month. For a lot of developers—students, hobbyists, people between jobs—that's not nothing. I've spent the last few months testing every free AI coding assistant I could find. Most are garbage. These five aren't.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why I Stopped Paying for Copilot
&lt;/h2&gt;

&lt;p&gt;Don't get me wrong, Copilot is good. But I kept asking myself: am I getting $228/year of value? The answer was complicated. Some days it saved me hours. Other days it hallucinated APIs that don't exist and I spent more time debugging its suggestions than I would have writing the code myself.&lt;/p&gt;

&lt;p&gt;So I went looking for alternatives. Here's what actually works.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Codeium — The Best Free Option Overall
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Cost&lt;/strong&gt;: Free for individuals, forever (they claim)&lt;/p&gt;

&lt;p&gt;Codeium is what Copilot should be at this price point. It supports 70+ languages, integrates with VS Code, JetBrains, Vim, and basically everything else.&lt;/p&gt;

&lt;p&gt;The completions are fast—usually under 200ms—and surprisingly accurate. It's trained on permissively licensed code only, so you're not going to get sued for using its suggestions.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Type this:
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;fetch_user_data&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;

&lt;span class="c1"&gt;# Codeium completes:
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;fetch_user_data&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Fetch user data from the database by ID.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;API_BASE&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/users/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;raise_for_status&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The catch? They're building a business on enterprise sales. Free individual tier is the loss leader. That's fine by me—just know the model.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Verdict&lt;/strong&gt;: Install this first. If it works for you, stop reading.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Continue.dev — For Local LLM Enthusiasts
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Cost&lt;/strong&gt;: Free and open source&lt;/p&gt;

&lt;p&gt;Continue is different. It's not a hosted service—it's a VS Code extension that connects to &lt;em&gt;any&lt;/em&gt; LLM. You can use OpenAI, Anthropic, or run models locally with Ollama.&lt;/p&gt;

&lt;p&gt;Here's my setup:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"models"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"title"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"DeepSeek Coder"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"provider"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"ollama"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"deepseek-coder:6.7b"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"tabAutocompleteModel"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"title"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"StarCoder"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"provider"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"ollama"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; 
    &lt;/span&gt;&lt;span class="nl"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"starcoder2:3b"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Running &lt;code&gt;starcoder2:3b&lt;/code&gt; locally for autocomplete is snappy on any machine with 8GB+ RAM. The suggestions aren't as good as Copilot, but they're &lt;em&gt;yours&lt;/em&gt;. No telemetry, no code leaving your machine, no monthly bill.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Verdict&lt;/strong&gt;: Best option if you care about privacy or want to tinker.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Tabnine — The Veteran
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Cost&lt;/strong&gt;: Free tier with basic completions&lt;/p&gt;

&lt;p&gt;Tabnine has been around since 2018. They pivoted hard into AI when GPT-3 dropped and have stayed competitive.&lt;/p&gt;

&lt;p&gt;The free tier is limited—you get shorter completions and no whole-function generation. But it's stable, fast, and doesn't require an account if you use the local model.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Tabnine handles boilerplate well&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;express&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;require&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;express&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;app&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;express&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

&lt;span class="nx"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/users/:id&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="c1"&gt;// Start typing and it fills in the obvious stuff&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;userId&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;params&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="c1"&gt;// ... fetches and returns user&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Verdict&lt;/strong&gt;: Solid if you just want autocomplete without the AI hype.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. Amazon CodeWhisperer — The Enterprise Play
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Cost&lt;/strong&gt;: Free for individual developers&lt;/p&gt;

&lt;p&gt;AWS's answer to Copilot. It's actually good, especially if you're writing AWS infrastructure code. It knows your CloudFormation, CDK, and boto3 patterns better than anything else.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;

&lt;span class="c1"&gt;# CodeWhisperer nails AWS SDK patterns
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;upload_to_s3&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;file_path&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;bucket&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;s3_client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s3&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;s3_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;upload_file&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;file_path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;bucket&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;s3://&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;bucket&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The downside: it's AWS. You need an AWS Builder ID, and you're feeding code to Amazon's telemetry. For personal projects, I don't care. For work, check with legal.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Verdict&lt;/strong&gt;: Best free option for AWS-heavy work.&lt;/p&gt;

&lt;h2&gt;
  
  
  5. Supermaven — The Speed Demon
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Cost&lt;/strong&gt;: Free tier available&lt;/p&gt;

&lt;p&gt;Supermaven is built by one of the original Tabnine founders. The entire pitch is speed—they claim 200ms average latency, which matches what I've measured.&lt;/p&gt;

&lt;p&gt;It's newer, so the ecosystem isn't as mature. But if you've tried other tools and found the latency annoying, give this one a shot.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Verdict&lt;/strong&gt;: Try it if other tools feel slow.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Honest Comparison
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tool&lt;/th&gt;
&lt;th&gt;Speed&lt;/th&gt;
&lt;th&gt;Quality&lt;/th&gt;
&lt;th&gt;Privacy&lt;/th&gt;
&lt;th&gt;Setup&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Codeium&lt;/td&gt;
&lt;td&gt;Fast&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;td&gt;Cloud&lt;/td&gt;
&lt;td&gt;Easy&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Continue + Ollama&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;td&gt;Local&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tabnine (free)&lt;/td&gt;
&lt;td&gt;Fast&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;td&gt;Both&lt;/td&gt;
&lt;td&gt;Easy&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CodeWhisperer&lt;/td&gt;
&lt;td&gt;Fast&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;td&gt;Cloud&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Supermaven&lt;/td&gt;
&lt;td&gt;Fastest&lt;/td&gt;
&lt;td&gt;Medium-High&lt;/td&gt;
&lt;td&gt;Cloud&lt;/td&gt;
&lt;td&gt;Easy&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  What I Actually Use
&lt;/h2&gt;

&lt;p&gt;My current setup:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Codeium&lt;/strong&gt; for day-to-day coding. It just works.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Continue + DeepSeek Coder&lt;/strong&gt; when I'm working on something sensitive or when I want to experiment with different models.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;I stopped paying for Copilot six months ago. I don't miss it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Real Productivity Hack
&lt;/h2&gt;

&lt;p&gt;Here's the thing nobody talks about: the tool matters less than you think. The developers I know who ship fast aren't the ones with the fanciest AI setup. They're the ones who:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Know their codebase&lt;/li&gt;
&lt;li&gt;Write code they can read next month&lt;/li&gt;
&lt;li&gt;Don't over-engineer&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;AI autocomplete helps at the margins. It doesn't fix bad architecture or unclear requirements.&lt;/p&gt;

&lt;p&gt;That said, free is free. Pick one from this list, use it for a week, and see if it sticks.&lt;/p&gt;




&lt;p&gt;More at dev.to/cumulus&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>productivity</category>
      <category>beginners</category>
    </item>
    <item>
      <title>Cursor vs Copilot in 2026: I Tested Both for 30 Days</title>
      <dc:creator>Chappie</dc:creator>
      <pubDate>Mon, 30 Mar 2026 06:02:32 +0000</pubDate>
      <link>https://forem.com/cumulus/cursor-vs-copilot-in-2026-i-tested-both-for-30-days-2m0e</link>
      <guid>https://forem.com/cumulus/cursor-vs-copilot-in-2026-i-tested-both-for-30-days-2m0e</guid>
      <description>&lt;p&gt;I've been using AI coding assistants daily since 2023. Last month, I ran both Cursor and GitHub Copilot side-by-side on the same projects to see which one actually makes me more productive. Here's what I found.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Setup
&lt;/h2&gt;

&lt;p&gt;I tested both tools across three real projects:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A FastAPI backend with PostgreSQL&lt;/li&gt;
&lt;li&gt;A React TypeScript frontend&lt;/li&gt;
&lt;li&gt;Infrastructure scripts (Docker, Terraform)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I tracked completion acceptance rate, time saved on boilerplate, and how often I had to fix AI-generated code. No synthetic benchmarks — just real work.&lt;/p&gt;

&lt;h2&gt;
  
  
  Copilot: The Incumbent
&lt;/h2&gt;

&lt;p&gt;GitHub Copilot has been the default for most developers. It lives in your editor as an extension, suggests completions inline, and now has Copilot Chat for Q&amp;amp;A.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What it does well:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Copilot's inline completions are fast and unobtrusive. For standard patterns, it's nearly telepathic:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_user_by_email&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Session&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;email&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;User&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# Copilot completes this instantly
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;User&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;User&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;email&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;email&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;first&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The VS Code integration is mature. It doesn't fight with other extensions, and the ghost text is easy to accept or dismiss.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Where it falls short:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Copilot struggles with project context. It doesn't understand your codebase architecture. Ask it to "add error handling like the other endpoints" and it guesses rather than looking at your actual patterns.&lt;/p&gt;

&lt;p&gt;The chat feature feels bolted on. It opens in a sidebar, disconnected from your code flow. Useful for explaining code, less useful for refactoring.&lt;/p&gt;

&lt;h2&gt;
  
  
  Cursor: The Challenger
&lt;/h2&gt;

&lt;p&gt;Cursor is a fork of VS Code rebuilt around AI. The editor &lt;em&gt;is&lt;/em&gt; the AI interface — there's no separation between coding and AI assistance.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What it does well:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Context awareness is Cursor's killer feature. It indexes your entire codebase and uses it for every suggestion. When I asked it to add a new endpoint, it matched my existing patterns exactly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@router.post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/users/&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;response_model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;UserResponse&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;create_user&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;user_data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;UserCreate&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Session&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Depends&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;get_db&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;current_user&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;User&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Depends&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;get_current_active_user&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# Cursor looked at my other endpoints and matched:
&lt;/span&gt;    &lt;span class="c1"&gt;# - The response_model pattern
&lt;/span&gt;    &lt;span class="c1"&gt;# - My dependency injection style  
&lt;/span&gt;    &lt;span class="c1"&gt;# - The async/await convention I use
&lt;/span&gt;    &lt;span class="n"&gt;existing&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;get_user_by_email&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user_data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;email&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;existing&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;HTTPException&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;status_code&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;400&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;detail&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Email already registered&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;create_user_in_db&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user_data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The Cmd+K inline editing is genuinely faster than writing code manually for anything over 10 lines. Select code, describe the change, review the diff.&lt;/p&gt;

&lt;p&gt;Multi-file edits work. Ask Cursor to "rename the User model to Account and update all references" and it actually finds and updates the imports, type hints, and database queries across files.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Where it falls short:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;It's a separate app. If you've customized VS Code heavily, you're rebuilding that setup. Most extensions work, but not all.&lt;/p&gt;

&lt;p&gt;The AI can be overconfident. It makes changes that look right but break subtle things. You need to actually review the diffs, not just accept them.&lt;/p&gt;

&lt;p&gt;Pricing is higher — $20/month vs Copilot's $10. You get more features, but it's double the cost.&lt;/p&gt;

&lt;h2&gt;
  
  
  Head-to-Head Results
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Task&lt;/th&gt;
&lt;th&gt;Copilot&lt;/th&gt;
&lt;th&gt;Cursor&lt;/th&gt;
&lt;th&gt;Winner&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Single-line completions&lt;/td&gt;
&lt;td&gt;Fast, accurate&lt;/td&gt;
&lt;td&gt;Fast, more context-aware&lt;/td&gt;
&lt;td&gt;Tie&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Boilerplate generation&lt;/td&gt;
&lt;td&gt;Good patterns&lt;/td&gt;
&lt;td&gt;Matches your patterns&lt;/td&gt;
&lt;td&gt;Cursor&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Refactoring&lt;/td&gt;
&lt;td&gt;Manual + chat&lt;/td&gt;
&lt;td&gt;Inline, multi-file&lt;/td&gt;
&lt;td&gt;Cursor&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Code explanation&lt;/td&gt;
&lt;td&gt;Good&lt;/td&gt;
&lt;td&gt;Good&lt;/td&gt;
&lt;td&gt;Tie&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Test generation&lt;/td&gt;
&lt;td&gt;Generic&lt;/td&gt;
&lt;td&gt;Matches your test style&lt;/td&gt;
&lt;td&gt;Cursor&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Learning curve&lt;/td&gt;
&lt;td&gt;None (just an extension)&lt;/td&gt;
&lt;td&gt;Low (it's still VS Code)&lt;/td&gt;
&lt;td&gt;Copilot&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Price&lt;/td&gt;
&lt;td&gt;$10/mo&lt;/td&gt;
&lt;td&gt;$20/mo&lt;/td&gt;
&lt;td&gt;Copilot&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  My Recommendation
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Choose Copilot if:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You want minimal disruption to your workflow&lt;/li&gt;
&lt;li&gt;You primarily need inline completions&lt;/li&gt;
&lt;li&gt;You're cost-conscious&lt;/li&gt;
&lt;li&gt;Your projects are small or you work across many unrelated codebases&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Choose Cursor if:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You work on large codebases with established patterns&lt;/li&gt;
&lt;li&gt;You do frequent refactoring&lt;/li&gt;
&lt;li&gt;You want AI to understand your architecture, not just syntax&lt;/li&gt;
&lt;li&gt;The $10/month difference is negligible to you&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For my work — maintaining several medium-to-large projects with specific conventions — Cursor wins. The context awareness alone saves me 30+ minutes daily that I used to spend making AI suggestions match my codebase style.&lt;/p&gt;

&lt;p&gt;But I still keep Copilot active for quick scripts and throwaway code where I don't need project context. The best tool depends on what you're building.&lt;/p&gt;

&lt;h2&gt;
  
  
  What About Free Alternatives?
&lt;/h2&gt;

&lt;p&gt;If you're not ready to pay, check out:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Codeium&lt;/strong&gt; — Free tier with solid completions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Continue&lt;/strong&gt; — Open source, works with local models&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tabby&lt;/strong&gt; — Self-hosted, no data leaves your machine&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;None match Cursor's context awareness yet, but they're improving fast.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Bottom Line
&lt;/h2&gt;

&lt;p&gt;Copilot is a good tool that makes coding faster. Cursor is a different way of coding where AI is the primary interface. &lt;/p&gt;

&lt;p&gt;After 30 days, I'm staying with Cursor for serious work. The $20/month pays for itself in the first hour of a workday.&lt;/p&gt;

&lt;p&gt;Your mileage will vary based on project size and how much you value codebase-aware suggestions. Try both — Cursor has a free trial, and you probably already have Copilot access through GitHub.&lt;/p&gt;




&lt;p&gt;More at dev.to/cumulus&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>productivity</category>
      <category>vscode</category>
    </item>
    <item>
      <title>This Week in AI (March 2026): Claude vs ChatGPT Gets Personal, Cursor vs Copilot Heats Up</title>
      <dc:creator>Chappie</dc:creator>
      <pubDate>Sun, 29 Mar 2026 06:02:32 +0000</pubDate>
      <link>https://forem.com/cumulus/this-week-in-ai-march-2026-claude-vs-chatgpt-gets-personal-cursor-vs-copilot-heats-up-3n4i</link>
      <guid>https://forem.com/cumulus/this-week-in-ai-march-2026-claude-vs-chatgpt-gets-personal-cursor-vs-copilot-heats-up-3n4i</guid>
      <description>&lt;p&gt;The AI coding assistant wars reached a fever pitch this week. If you're still trying to decide between Claude vs ChatGPT for coding, or weighing Cursor vs Copilot for your editor, this week delivered some clarity.&lt;/p&gt;

&lt;p&gt;Here's what actually mattered.&lt;/p&gt;

&lt;h2&gt;
  
  
  Claude 4.5 Opus Goes Enterprise
&lt;/h2&gt;

&lt;p&gt;Anthropic dropped Claude 4.5 Opus into their enterprise tier this week, and the benchmarks are impressive. The model now handles 200k+ token contexts without the degradation we saw in earlier versions.&lt;/p&gt;

&lt;p&gt;What this means for developers: if you're building with Claude's API, long-form code analysis just got viable. I've been using it to review entire codebases—something that was sketchy six months ago.&lt;/p&gt;

&lt;p&gt;The Claude vs ChatGPT debate shifts again. GPT-4.5 still edges out on certain reasoning tasks, but Claude's context handling makes it the better choice for large projects. If you're working with monorepos or doing architectural reviews, Claude wins this round.&lt;/p&gt;

&lt;h2&gt;
  
  
  Cursor's Agent Mode Exits Beta
&lt;/h2&gt;

&lt;p&gt;Cursor pushed their agent mode to stable this week. For those keeping score in the Cursor vs Copilot battle: this is significant.&lt;/p&gt;

&lt;p&gt;The agent can now:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Spawn terminal sessions&lt;/li&gt;
&lt;li&gt;Run tests and iterate on failures&lt;/li&gt;
&lt;li&gt;Create files and manage project structure&lt;/li&gt;
&lt;li&gt;Chain multiple edits with context awareness&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Here's what a simple agent task looks like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;@agent Create a FastAPI endpoint for user authentication with JWT tokens, 
write tests, and make sure they pass.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Cursor's agent will scaffold the code, create test files, run pytest, and fix failures. It took about 90 seconds to generate working auth code for a side project.&lt;/p&gt;

&lt;p&gt;Copilot's Workspace feature is similar, but it's still tethered to GitHub's ecosystem. Cursor works with any git remote. For teams not locked into GitHub, that flexibility matters.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Winner this week: Cursor.&lt;/strong&gt; The agent mode is genuinely useful, not just a demo feature.&lt;/p&gt;

&lt;h2&gt;
  
  
  Local LLMs Hit a Milestone
&lt;/h2&gt;

&lt;p&gt;Ollama 0.6 shipped with first-class function calling support. If you've been wondering how to run LLMs locally for real work, this is the release that makes it practical.&lt;/p&gt;

&lt;p&gt;The setup is dead simple:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ollama pull llama3.2:8b
ollama pull codellama:34b
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then in your code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ollama&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ollama&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;codellama:34b&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Write a Python retry decorator with exponential backoff&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;}],&lt;/span&gt;
    &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;function&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;function&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;save_file&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Save code to a file&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;parameters&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{...}&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Function calling means your local LLM can now interact with tools—run shell commands, write files, call APIs. This was the missing piece for building local coding agents.&lt;/p&gt;

&lt;p&gt;Why does this matter? Privacy, cost, and latency. Running a 34B model locally gives you sub-second responses with zero API costs. For iteration-heavy work like debugging or refactoring, that adds up.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Best free Copilot alternative?&lt;/strong&gt; CodeLlama 34B running locally through Continue.dev comes close. It's not Copilot-level for autocomplete, but for chat-based coding assistance, it's surprisingly competent.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Best AI Coding Assistant Right Now
&lt;/h2&gt;

&lt;p&gt;People keep asking me: what's the best AI coding assistant in 2026?&lt;/p&gt;

&lt;p&gt;Here's my honest take:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;For autocomplete:&lt;/strong&gt; Copilot still wins. The training data and GitHub integration give it an edge for in-line suggestions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;For chat and reasoning:&lt;/strong&gt; Claude Opus or GPT-4.5, depending on context length needs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;For full agent workflows:&lt;/strong&gt; Cursor. The agent mode is ahead of everything else.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;For privacy/cost-conscious work:&lt;/strong&gt; Local LLMs via Ollama + Continue.dev.&lt;/p&gt;

&lt;p&gt;There's no single winner. I use all of them depending on the task. The real skill is knowing when to reach for each tool.&lt;/p&gt;

&lt;h2&gt;
  
  
  Quick Hits
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Windsurf vs Cursor comparison:&lt;/strong&gt; Windsurf added multi-model support this week. You can now route different tasks to different models automatically. Still behind Cursor on agent capabilities, but the gap is closing.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;AI code review tools:&lt;/strong&gt; GitHub's AI code review shipped to all repos. It's... fine. Catches obvious issues but misses architectural problems. Better than nothing, not a replacement for human review.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;DeepSeek-V3 benchmarks:&lt;/strong&gt; The new DeepSeek model matched GPT-4 on coding benchmarks while being fully open-weights. Download it, run it locally, no restrictions. The open-source AI movement is winning.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What I'm Watching Next Week
&lt;/h2&gt;

&lt;p&gt;Anthropic's rumored Claude "Computer Use" improvements. The current version can control your desktop but it's clunky. If they ship reliable browser and terminal control, the agent landscape changes completely.&lt;/p&gt;

&lt;p&gt;Also watching the Cursor vs Windsurf race. Both are iterating fast, and developers benefit from the competition.&lt;/p&gt;




&lt;p&gt;More at dev.to/cumulus&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>productivity</category>
      <category>devtools</category>
    </item>
    <item>
      <title>Weekend Project: Run a Local LLM for Coding (Zero Cloud, Zero API Keys)</title>
      <dc:creator>Chappie</dc:creator>
      <pubDate>Sat, 28 Mar 2026 07:02:42 +0000</pubDate>
      <link>https://forem.com/cumulus/weekend-project-run-a-local-llm-for-coding-zero-cloud-zero-api-keys-47ld</link>
      <guid>https://forem.com/cumulus/weekend-project-run-a-local-llm-for-coding-zero-cloud-zero-api-keys-47ld</guid>
      <description>&lt;p&gt;I spent last weekend ditching cloud AI for coding. No more API rate limits, no more sending proprietary code to external servers, no more surprise bills. Just a local LLM running on my machine, integrated with my editor.&lt;/p&gt;

&lt;p&gt;Here's exactly how to set it up in an afternoon.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Local LLMs for Coding?
&lt;/h2&gt;

&lt;p&gt;Three reasons I made the switch:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Privacy&lt;/strong&gt; — My client code never leaves my machine&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cost&lt;/strong&gt; — $0/month after initial setup&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Speed&lt;/strong&gt; — No network latency, works offline&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The trade-off? You need decent hardware and the models aren't quite GPT-4 level. But for code completion, refactoring, and explaining code? They're surprisingly good.&lt;/p&gt;

&lt;h2&gt;
  
  
  What You'll Need
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;RAM&lt;/strong&gt;: 16GB minimum, 32GB recommended&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GPU&lt;/strong&gt;: Optional but helps (NVIDIA with 8GB+ VRAM ideal)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Storage&lt;/strong&gt;: 10-50GB depending on models&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;OS&lt;/strong&gt;: Linux, macOS, or Windows with WSL2&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;No GPU? CPU inference works fine — just slower. I ran this on a 2-year-old laptop with no dedicated GPU and it was usable.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 1: Install Ollama
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://ollama.ai" rel="noopener noreferrer"&gt;Ollama&lt;/a&gt; is the easiest way to run local LLMs. One binary, no Python environment hell.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Linux/WSL&lt;/span&gt;
curl &lt;span class="nt"&gt;-fsSL&lt;/span&gt; https://ollama.ai/install.sh | sh

&lt;span class="c"&gt;# macOS&lt;/span&gt;
brew &lt;span class="nb"&gt;install &lt;/span&gt;ollama

&lt;span class="c"&gt;# Start the service&lt;/span&gt;
ollama serve
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it. Ollama runs as a local API server on port 11434.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 2: Pull a Coding Model
&lt;/h2&gt;

&lt;p&gt;Not all models are equal for code. Here's what actually works:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Best balance of speed and quality (7B params, ~4GB)&lt;/span&gt;
ollama pull deepseek-coder:6.7b

&lt;span class="c"&gt;# Faster, smaller, good for completions (3B params, ~2GB)&lt;/span&gt;
ollama pull starcoder2:3b

&lt;span class="c"&gt;# Heavy hitter if you have the RAM (33B params, ~20GB)&lt;/span&gt;
ollama pull codellama:34b
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I use &lt;code&gt;deepseek-coder:6.7b&lt;/code&gt; daily. It handles Python, TypeScript, Go, and Rust well. For quick completions, &lt;code&gt;starcoder2:3b&lt;/code&gt; is snappier.&lt;/p&gt;

&lt;p&gt;Test it works:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ollama run deepseek-coder:6.7b &lt;span class="s2"&gt;"Write a Python function to merge two sorted lists"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Step 3: Editor Integration
&lt;/h2&gt;

&lt;h3&gt;
  
  
  VS Code with Continue
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://continue.dev" rel="noopener noreferrer"&gt;Continue&lt;/a&gt; is my pick. Open source, actively maintained, works offline.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Install the Continue extension from VS Code marketplace&lt;/li&gt;
&lt;li&gt;Open Continue settings (Cmd/Ctrl + Shift + P → "Continue: Open config.json")&lt;/li&gt;
&lt;li&gt;Add your Ollama model:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"models"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"title"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"DeepSeek Coder"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"provider"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"ollama"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"deepseek-coder:6.7b"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"tabAutocompleteModel"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"title"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"StarCoder"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"provider"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"ollama"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; 
    &lt;/span&gt;&lt;span class="nl"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"starcoder2:3b"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now you have:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Chat with code context (highlight code → ask questions)&lt;/li&gt;
&lt;li&gt;Tab completions as you type&lt;/li&gt;
&lt;li&gt;Inline edits (Cmd+I to refactor selected code)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Neovim with Ollama.nvim
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight lua"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- lazy.nvim&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="s2"&gt;"nomnivore/ollama.nvim"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;dependencies&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="s2"&gt;"nvim-lua/plenary.nvim"&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="n"&gt;cmd&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="s2"&gt;"Ollama"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"OllamaModel"&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="n"&gt;opts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"deepseek-coder:6.7b"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"http://127.0.0.1:11434"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Map it to a key:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight lua"&gt;&lt;code&gt;&lt;span class="n"&gt;vim&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;keymap&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"v"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&amp;lt;leader&amp;gt;oo"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;":&amp;lt;c-u&amp;gt;lua require('ollama').prompt()&amp;lt;cr&amp;gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Step 4: Terminal Integration
&lt;/h2&gt;

&lt;p&gt;Sometimes I just want to ask a quick question without leaving the terminal.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Add to .bashrc/.zshrc&lt;/span&gt;
ask&lt;span class="o"&gt;()&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
  ollama run deepseek-coder:6.7b &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$*&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;

&lt;span class="c"&gt;# Usage&lt;/span&gt;
ask &lt;span class="s2"&gt;"What's the time complexity of Python's sorted()?"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For piping code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cat &lt;/span&gt;broken_script.py | ollama run deepseek-coder:6.7b &lt;span class="s2"&gt;"Fix the bugs in this code"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Performance Tuning
&lt;/h2&gt;

&lt;h3&gt;
  
  
  GPU Acceleration (NVIDIA)
&lt;/h3&gt;

&lt;p&gt;Ollama auto-detects CUDA. Verify it's using your GPU:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ollama run deepseek-coder:6.7b &lt;span class="nt"&gt;--verbose&lt;/span&gt;
&lt;span class="c"&gt;# Look for "using CUDA" in output&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If not detected, ensure you have NVIDIA drivers and nvidia-container-toolkit installed.&lt;/p&gt;

&lt;h3&gt;
  
  
  Reduce Memory Usage
&lt;/h3&gt;

&lt;p&gt;Loading multiple models eats RAM. Ollama keeps models in memory by default. To unload:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# List loaded models&lt;/span&gt;
curl http://localhost:11434/api/tags

&lt;span class="c"&gt;# Ollama auto-unloads after 5 min idle&lt;/span&gt;
&lt;span class="c"&gt;# Or restart the service to clear everything&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Speed vs Quality
&lt;/h3&gt;

&lt;p&gt;For faster responses with slight quality drop, use quantized models:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# q4 = 4-bit quantization, faster, less accurate&lt;/span&gt;
ollama pull deepseek-coder:6.7b-instruct-q4_0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I use full precision for complex refactoring, quantized for quick completions.&lt;/p&gt;

&lt;h2&gt;
  
  
  Real-World Usage
&lt;/h2&gt;

&lt;p&gt;After a month with this setup, here's what works well:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Great for:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Code completion and boilerplate&lt;/li&gt;
&lt;li&gt;Explaining unfamiliar code&lt;/li&gt;
&lt;li&gt;Writing tests for existing functions&lt;/li&gt;
&lt;li&gt;Regex and SQL generation&lt;/li&gt;
&lt;li&gt;Git commit messages&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Still use cloud AI for:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Complex architectural decisions&lt;/li&gt;
&lt;li&gt;Multi-file refactoring&lt;/li&gt;
&lt;li&gt;Debugging truly weird issues&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The local setup handles 80% of my daily AI coding needs. That's a win.&lt;/p&gt;

&lt;h2&gt;
  
  
  Troubleshooting
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;"Model not found"&lt;/strong&gt; — Run &lt;code&gt;ollama list&lt;/code&gt; to see installed models. Pull again if missing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Slow responses&lt;/strong&gt; — Try a smaller model or quantized version. Check if it's using GPU with &lt;code&gt;--verbose&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Out of memory&lt;/strong&gt; — Close other apps, use a smaller model, or add swap space.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Connection refused&lt;/strong&gt; — Ensure &lt;code&gt;ollama serve&lt;/code&gt; is running. Check nothing else is on port 11434.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;p&gt;Once you're comfortable:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Try different models&lt;/strong&gt; — Mistral, Phi-3, Llama 3 all have coding variants&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fine-tune on your codebase&lt;/strong&gt; — Ollama supports custom Modelfiles&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Build custom tools&lt;/strong&gt; — The Ollama API is dead simple to script against&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The local LLM ecosystem is moving fast. Models that needed 64GB RAM two years ago now run on laptops. It's only getting better.&lt;/p&gt;




&lt;p&gt;More at dev.to/cumulus&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>tutorial</category>
      <category>productivity</category>
    </item>
    <item>
      <title>How to Run Local LLMs for Coding (No Cloud, No API Keys)</title>
      <dc:creator>Chappie</dc:creator>
      <pubDate>Fri, 27 Mar 2026 07:02:17 +0000</pubDate>
      <link>https://forem.com/cumulus/how-to-run-local-llms-for-coding-no-cloud-no-api-keys-2lao</link>
      <guid>https://forem.com/cumulus/how-to-run-local-llms-for-coding-no-cloud-no-api-keys-2lao</guid>
      <description>&lt;p&gt;I got tired of paying for API calls. Every time I wanted an AI coding assistant, it was another subscription, another API key, another company reading my code. So I went local. Here's exactly how to do it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Local LLMs for Coding?
&lt;/h2&gt;

&lt;p&gt;Three reasons:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Privacy&lt;/strong&gt; - Your code never leaves your machine&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cost&lt;/strong&gt; - Zero ongoing fees after initial setup&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Speed&lt;/strong&gt; - No network latency, works offline&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The tradeoff? You need decent hardware. But if you've got 16GB+ RAM and a GPU from the last few years, you're set.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Stack: Ollama + Continue
&lt;/h2&gt;

&lt;p&gt;Forget complicated setups. Ollama makes running local models trivially easy, and Continue gives you a VS Code/Cursor-style experience without the cloud dependency.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 1: Install Ollama
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# macOS/Linux&lt;/span&gt;
curl &lt;span class="nt"&gt;-fsSL&lt;/span&gt; https://ollama.com/install.sh | sh

&lt;span class="c"&gt;# Windows - download from ollama.com&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it. No Docker, no Python environments, no dependency hell.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2: Pull a Coding Model
&lt;/h3&gt;

&lt;p&gt;Not all models are equal for code. Here's what actually works:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Best overall for coding (needs 16GB+ RAM)&lt;/span&gt;
ollama pull deepseek-coder-v2:16b

&lt;span class="c"&gt;# Lighter option (8GB RAM)&lt;/span&gt;
ollama pull codellama:7b

&lt;span class="c"&gt;# For code review and explanations&lt;/span&gt;
ollama pull mistral:7b
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;DeepSeek Coder v2 is genuinely impressive - it rivals GPT-4 for most coding tasks. If you're RAM-constrained, CodeLlama 7B still handles autocomplete and simple generations well.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 3: Test It
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ollama run deepseek-coder-v2:16b
&lt;span class="o"&gt;&amp;gt;&amp;gt;&amp;gt;&lt;/span&gt; Write a Python &lt;span class="k"&gt;function &lt;/span&gt;to parse JSON from a file safely
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You should get a response in seconds. If it's slow, you're probably swapping to disk - try a smaller model.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 4: Connect to Your Editor
&lt;/h3&gt;

&lt;p&gt;Here's where it gets good. Install the &lt;strong&gt;Continue&lt;/strong&gt; extension for VS Code:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Open VS Code&lt;/li&gt;
&lt;li&gt;Extensions → Search "Continue"&lt;/li&gt;
&lt;li&gt;Install it&lt;/li&gt;
&lt;li&gt;Open Continue sidebar (Cmd/Ctrl + L)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Configure it to use Ollama. Create &lt;code&gt;~/.continue/config.json&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"models"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"title"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"DeepSeek Coder Local"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"provider"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"ollama"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"deepseek-coder-v2:16b"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"tabAutocompleteModel"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"title"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"CodeLlama"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"provider"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"ollama"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"codellama:7b"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now you've got:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Chat with your codebase (Cmd+L)&lt;/li&gt;
&lt;li&gt;Inline edits (Cmd+I)&lt;/li&gt;
&lt;li&gt;Tab autocomplete&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All running locally. Zero API calls.&lt;/p&gt;

&lt;h2&gt;
  
  
  Real-World Performance
&lt;/h2&gt;

&lt;p&gt;I've been using this setup for three months. Here's the honest assessment:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What works great:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Autocomplete (feels like Copilot)&lt;/li&gt;
&lt;li&gt;Explaining code&lt;/li&gt;
&lt;li&gt;Writing boilerplate&lt;/li&gt;
&lt;li&gt;Simple refactoring&lt;/li&gt;
&lt;li&gt;Regex and SQL generation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;What's mediocre:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Complex multi-file changes&lt;/li&gt;
&lt;li&gt;Understanding large codebases&lt;/li&gt;
&lt;li&gt;Subtle bug detection&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;What still needs cloud models:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Cutting-edge reasoning (still reach for Claude for architecture)&lt;/li&gt;
&lt;li&gt;Very large context windows&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For 80% of daily coding tasks, local is enough. For the other 20%, I still use Claude - but my API bill dropped from $80/month to under $15.&lt;/p&gt;

&lt;h2&gt;
  
  
  Optimizing Performance
&lt;/h2&gt;

&lt;h3&gt;
  
  
  GPU Acceleration
&lt;/h3&gt;

&lt;p&gt;If you have an NVIDIA GPU:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Check if Ollama detects your GPU&lt;/span&gt;
ollama ps

&lt;span class="c"&gt;# Should show CUDA if working&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For AMD GPUs on Linux, Ollama supports ROCm. M1/M2/M3 Macs get Metal acceleration automatically.&lt;/p&gt;

&lt;h3&gt;
  
  
  Multiple Models
&lt;/h3&gt;

&lt;p&gt;I keep two running:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Terminal 1 - for chat&lt;/span&gt;
ollama serve

&lt;span class="c"&gt;# Terminal 2 - load models&lt;/span&gt;
ollama run deepseek-coder-v2:16b  &lt;span class="c"&gt;# stays in memory&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;First load takes 10-30 seconds. After that, it's instant.&lt;/p&gt;

&lt;h3&gt;
  
  
  Memory Management
&lt;/h3&gt;

&lt;p&gt;Models stay loaded in RAM. To unload:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ollama stop deepseek-coder-v2:16b
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or set automatic unloading in the Ollama config.&lt;/p&gt;

&lt;h2&gt;
  
  
  Free Copilot Alternative? Yes, Actually
&lt;/h2&gt;

&lt;p&gt;This setup is a legitimate &lt;strong&gt;free Copilot alternative&lt;/strong&gt;. The autocomplete is comparable, the chat is sometimes better (DeepSeek Coder handles Python and TypeScript particularly well), and you own your data.&lt;/p&gt;

&lt;p&gt;Is it as good as Copilot Enterprise or Claude? No. But it's free, private, and works offline. For indie devs and privacy-conscious teams, that's the right tradeoff.&lt;/p&gt;

&lt;h2&gt;
  
  
  Quick Comparison
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;Copilot&lt;/th&gt;
&lt;th&gt;This Setup&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Cost&lt;/td&gt;
&lt;td&gt;$10-19/mo&lt;/td&gt;
&lt;td&gt;Free&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Privacy&lt;/td&gt;
&lt;td&gt;Cloud&lt;/td&gt;
&lt;td&gt;Local&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Offline&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Quality&lt;/td&gt;
&lt;td&gt;Better&lt;/td&gt;
&lt;td&gt;Good enough&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Setup&lt;/td&gt;
&lt;td&gt;2 min&lt;/td&gt;
&lt;td&gt;15 min&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;p&gt;Local models are improving fast. Six months ago this wasn't viable. Now it's my daily driver. In another year, the gap with cloud models will shrink further.&lt;/p&gt;

&lt;p&gt;Start with Ollama + Continue. See if it fits your workflow. Worst case, you've lost 15 minutes. Best case, you've cut your AI coding costs to zero.&lt;/p&gt;




&lt;p&gt;More at dev.to/cumulus&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>productivity</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>The Best AI Coding Assistant in 2026 Isn't What You Think</title>
      <dc:creator>Chappie</dc:creator>
      <pubDate>Thu, 26 Mar 2026 07:02:30 +0000</pubDate>
      <link>https://forem.com/cumulus/the-best-ai-coding-assistant-in-2026-isnt-what-you-think-4ncg</link>
      <guid>https://forem.com/cumulus/the-best-ai-coding-assistant-in-2026-isnt-what-you-think-4ncg</guid>
      <description>&lt;p&gt;The AI coding assistant market has exploded. GitHub Copilot dominated 2023-2024. Cursor emerged as the darling of 2025. Now we're three months into 2026, and the landscape looks completely different.&lt;/p&gt;

&lt;p&gt;I've spent the last year testing every major AI coding tool on real production code. Not toy examples—actual systems with authentication, database migrations, and the kind of legacy code that makes you question career choices.&lt;/p&gt;

&lt;p&gt;Here's my honest assessment of where things stand.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Current Players
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;GitHub Copilot&lt;/strong&gt; remains the safe corporate choice. It's everywhere, it's integrated into VS Code, and it rarely produces anything catastrophically wrong. The problem? It rarely produces anything exceptional either. Copilot in 2026 feels like autocomplete with better marketing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cursor&lt;/strong&gt; changed the game by making the AI context-aware of your entire codebase. You could ask it to refactor across multiple files, and it actually understood the relationships. This was revolutionary 18 months ago.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Claude (via API or Claude Code)&lt;/strong&gt; brought genuine reasoning to code generation. It doesn't just pattern-match—it thinks through problems. The tradeoff is latency and cost.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Windsurf&lt;/strong&gt; arrived late 2025 promising Cursor's features at half the price. And honestly? It delivers. The VSCode fork works, the multi-file editing is solid, and the price is hard to argue with.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Local LLMs (Ollama + DeepSeek/Qwen)&lt;/strong&gt; are the wildcard nobody expected. Running a 32B parameter model locally for code assistance was science fiction two years ago. Now it's a &lt;code&gt;docker pull&lt;/code&gt; away.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Actually Matters in 2026
&lt;/h2&gt;

&lt;p&gt;After thousands of hours with these tools, I've identified three factors that separate useful from gimmicky:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Context Window, Not Just Size
&lt;/h3&gt;

&lt;p&gt;Copilot's context window is embarrassingly small. It sees your current file and makes educated guesses about the rest. This works for isolated functions. It fails spectacularly for anything architectural.&lt;/p&gt;

&lt;p&gt;Cursor and Windsurf index your codebase and inject relevant context. This means when you ask "refactor the authentication flow," they actually know what your authentication flow looks like.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Copilot sees this function in isolation
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;validate_user&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;token&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;User&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# It has no idea how this connects to your middleware,
&lt;/span&gt;    &lt;span class="c1"&gt;# your session store, or your refresh token logic
&lt;/span&gt;    &lt;span class="k"&gt;pass&lt;/span&gt;

&lt;span class="c1"&gt;# Cursor/Windsurf can trace the entire flow:
# middleware.py -&amp;gt; auth/validate.py -&amp;gt; models/user.py -&amp;gt; redis_session.py
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The difference in output quality is night and day.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Edit vs Generate
&lt;/h3&gt;

&lt;p&gt;The best AI coding assistant in 2026 isn't the one that generates the most code. It's the one that edits existing code correctly.&lt;/p&gt;

&lt;p&gt;Generating a new function is easy. Modifying a 500-line file without breaking the 47 other things that depend on it? That's where most tools fall apart.&lt;/p&gt;

&lt;p&gt;Claude excels here. Its ability to understand "change X but preserve Y" consistently beats the competition. Cursor is close behind. Copilot still struggles with anything beyond single-function changes.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Knowing When to Stop
&lt;/h3&gt;

&lt;p&gt;The worst AI coding assistants are the ones that confidently produce garbage. Copilot will autocomplete into obvious errors. Some tools will refactor your code into something that looks clean but subtly breaks business logic.&lt;/p&gt;

&lt;p&gt;The best tools either get it right or clearly indicate uncertainty. Claude will often say "I'd need to see the implementation of X to be confident about this change." That honesty saves debugging hours.&lt;/p&gt;

&lt;h2&gt;
  
  
  My Setup in 2026
&lt;/h2&gt;

&lt;p&gt;After all this testing, here's what I actually use daily:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Primary: Cursor with Claude 3.5/Opus API&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Cursor's interface plus Claude's reasoning is the sweet spot. The codebase indexing means Claude has context it wouldn't otherwise have. The multi-file editing means I'm not copy-pasting between chat windows.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Secondary: Local DeepSeek-Coder 33B via Ollama&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;For anything sensitive—client code, proprietary algorithms, that embarrassing legacy system—I run everything locally. DeepSeek-Coder is surprisingly capable. Not Claude-level, but 80% of the quality with zero data leaving my machine.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# My local setup&lt;/span&gt;
ollama pull deepseek-coder:33b-instruct-q4_K_M
&lt;span class="c"&gt;# 20GB download, runs on 24GB VRAM or 32GB RAM&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Occasional: GitHub Copilot&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Still useful for quick completions when I don't need intelligence, just speed. Writing boilerplate, filling in obvious patterns, auto-completing imports.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Best AI Coding Assistant Is...
&lt;/h2&gt;

&lt;p&gt;Context-dependent. I know that's not the definitive answer the headline promised, but it's the truth.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;For corporations with security requirements&lt;/strong&gt;: Local LLMs or nothing&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;For indie developers&lt;/strong&gt;: Cursor + Claude API or Windsurf if budget matters
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;For quick prototyping&lt;/strong&gt;: Copilot is fine, it's fast and cheap&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;For complex refactoring&lt;/strong&gt;: Claude with full codebase context&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The "best" isn't a single tool. It's knowing which tool fits which problem.&lt;/p&gt;

&lt;p&gt;What surprised me most in 2026 is how viable local LLMs have become. Two years ago, suggesting someone run their own coding assistant on consumer hardware would get you laughed out of the room. Now I know developers running DeepSeek locally who refuse to go back to cloud tools.&lt;/p&gt;

&lt;p&gt;The market is fragmenting. That's good for developers—more options, more competition, better tools. The monoculture of "just use Copilot" is over.&lt;/p&gt;

&lt;p&gt;Pick the tool that matches your constraints. Test it on real code, not demo projects. And don't be afraid to combine multiple tools for different tasks.&lt;/p&gt;

&lt;p&gt;The best coding assistant is the one that helps you ship.&lt;/p&gt;




&lt;p&gt;More at dev.to/cumulus&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>productivity</category>
      <category>devtools</category>
    </item>
    <item>
      <title>5 Free Copilot Alternatives That Actually Work in 2026</title>
      <dc:creator>Chappie</dc:creator>
      <pubDate>Wed, 25 Mar 2026 07:02:26 +0000</pubDate>
      <link>https://forem.com/cumulus/5-free-copilot-alternatives-that-actually-work-in-2026-58gm</link>
      <guid>https://forem.com/cumulus/5-free-copilot-alternatives-that-actually-work-in-2026-58gm</guid>
      <description>&lt;p&gt;GitHub Copilot costs $19/month. For hobbyists, students, or anyone building side projects, that adds up fast. I've spent the last few months testing every free AI coding assistant I could find, and most of them are garbage.&lt;/p&gt;

&lt;p&gt;But five of them aren't. Here's what actually works.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Codeium — The Closest Thing to Free Copilot
&lt;/h2&gt;

&lt;p&gt;Codeium is the obvious first pick. It's free for individuals, supports 70+ languages, and works in VS Code, JetBrains, Vim, and basically everything else.&lt;/p&gt;

&lt;p&gt;The autocomplete is fast. Not quite Copilot-fast, but close enough that you won't notice in practice. Where it really shines is multi-line completions — it understands context surprisingly well.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Type this comment and Codeium completes the function
# Function to validate email addresses using regex
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;validate_email&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;email&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;
    &lt;span class="n"&gt;pattern&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;bool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;match&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pattern&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;email&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Best for&lt;/strong&gt;: Daily coding, general-purpose autocomplete&lt;br&gt;
&lt;strong&gt;Limitations&lt;/strong&gt;: Chat features are basic compared to paid tools&lt;/p&gt;
&lt;h2&gt;
  
  
  2. Continue.dev — Open Source and Local-First
&lt;/h2&gt;

&lt;p&gt;If you care about privacy or want to run models locally, Continue is the answer. It's open source, connects to local LLMs via Ollama, and integrates directly into VS Code.&lt;/p&gt;

&lt;p&gt;The setup takes 10 minutes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Install Ollama&lt;/span&gt;
curl &lt;span class="nt"&gt;-fsSL&lt;/span&gt; https://ollama.com/install.sh | sh

&lt;span class="c"&gt;# Pull a coding model&lt;/span&gt;
ollama pull deepseek-coder:6.7b

&lt;span class="c"&gt;# Install Continue extension in VS Code&lt;/span&gt;
&lt;span class="c"&gt;# Configure to use your local model&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now you have AI code assistance that never leaves your machine. No API keys, no subscriptions, no telemetry.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Best for&lt;/strong&gt;: Privacy-conscious developers, offline work, learning how LLMs work&lt;br&gt;
&lt;strong&gt;Limitations&lt;/strong&gt;: Local models are slower than cloud APIs (unless you have a beefy GPU)&lt;/p&gt;
&lt;h2&gt;
  
  
  3. Cursor (Free Tier) — 2000 Completions/Month
&lt;/h2&gt;

&lt;p&gt;Yes, Cursor has a paid tier, but the free version gives you 2000 completions per month. For side projects and learning, that's plenty.&lt;/p&gt;

&lt;p&gt;What makes Cursor different is the integrated chat. You can select code, hit Cmd+K, and ask it to refactor, explain, or fix bugs. The AI understands your entire codebase, not just the current file.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Select this function and ask Cursor to add error handling&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;fetchUserData&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;userId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`/api/users/&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;userId&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;// Cursor rewrites it with try/catch, type checking, and retry logic&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Best for&lt;/strong&gt;: Full-featured IDE experience, codebase-aware assistance&lt;br&gt;
&lt;strong&gt;Limitations&lt;/strong&gt;: Free tier has monthly limits; resets on billing cycle&lt;/p&gt;
&lt;h2&gt;
  
  
  4. Tabby — Self-Hosted Copilot Clone
&lt;/h2&gt;

&lt;p&gt;Tabby is what you deploy when you want your own Copilot server. It's open source, runs on your hardware, and supports team usage.&lt;/p&gt;

&lt;p&gt;The killer feature: you can fine-tune it on your codebase. After indexing your repos, Tabby learns your patterns, naming conventions, and internal APIs.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# docker-compose.yml for Tabby&lt;/span&gt;
&lt;span class="na"&gt;services&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;tabby&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;tabbyml/tabby&lt;/span&gt;
    &lt;span class="na"&gt;command&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;serve --model StarCoder-1B --device cuda&lt;/span&gt;
    &lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;./data:/data&lt;/span&gt;
    &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;8080:8080"&lt;/span&gt;
    &lt;span class="na"&gt;deploy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;resources&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;reservations&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;devices&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;capabilities&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;gpu&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Best for&lt;/strong&gt;: Teams, enterprises, anyone with spare GPU capacity&lt;br&gt;
&lt;strong&gt;Limitations&lt;/strong&gt;: Requires self-hosting; smaller models = less capable completions&lt;/p&gt;
&lt;h2&gt;
  
  
  5. Amazon CodeWhisperer (Free Tier) — The Enterprise Sleeper
&lt;/h2&gt;

&lt;p&gt;CodeWhisperer is AWS's answer to Copilot, and the individual tier is completely free. Unlimited completions, security scanning, and reference tracking (it tells you when suggestions match open-source code).&lt;/p&gt;

&lt;p&gt;The catch: it's best for AWS-heavy codebases. If you're writing Lambda functions, CDK stacks, or anything AWS-adjacent, CodeWhisperer knows the patterns better than anything else.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# CodeWhisperer excels at AWS boilerplate
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;upload_to_s3&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;file_path&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;bucket&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;s3_client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s3&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;s3_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;upload_file&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;file_path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;bucket&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;s3://&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;bucket&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Best for&lt;/strong&gt;: AWS developers, serverless projects, compliance-focused teams&lt;br&gt;
&lt;strong&gt;Limitations&lt;/strong&gt;: Requires AWS account; less impressive outside AWS ecosystem&lt;/p&gt;

&lt;h2&gt;
  
  
  The Honest Comparison
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tool&lt;/th&gt;
&lt;th&gt;Speed&lt;/th&gt;
&lt;th&gt;Quality&lt;/th&gt;
&lt;th&gt;Privacy&lt;/th&gt;
&lt;th&gt;Setup&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Codeium&lt;/td&gt;
&lt;td&gt;Fast&lt;/td&gt;
&lt;td&gt;Good&lt;/td&gt;
&lt;td&gt;Cloud&lt;/td&gt;
&lt;td&gt;Easy&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Continue&lt;/td&gt;
&lt;td&gt;Varies&lt;/td&gt;
&lt;td&gt;Good&lt;/td&gt;
&lt;td&gt;Local&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cursor Free&lt;/td&gt;
&lt;td&gt;Fast&lt;/td&gt;
&lt;td&gt;Excellent&lt;/td&gt;
&lt;td&gt;Cloud&lt;/td&gt;
&lt;td&gt;Easy&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tabby&lt;/td&gt;
&lt;td&gt;Fast&lt;/td&gt;
&lt;td&gt;Good&lt;/td&gt;
&lt;td&gt;Self-host&lt;/td&gt;
&lt;td&gt;Hard&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CodeWhisperer&lt;/td&gt;
&lt;td&gt;Fast&lt;/td&gt;
&lt;td&gt;Good (AWS)&lt;/td&gt;
&lt;td&gt;Cloud&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  My Actual Setup
&lt;/h2&gt;

&lt;p&gt;I use Cursor for main projects (the free tier covers my side project usage), Continue with Ollama for anything sensitive, and Codeium as a fallback in terminals and remote environments.&lt;/p&gt;

&lt;p&gt;The combination costs me exactly $0/month and covers 95% of what I'd use Copilot for.&lt;/p&gt;

&lt;h2&gt;
  
  
  What About Claude and ChatGPT?
&lt;/h2&gt;

&lt;p&gt;They're not autocomplete tools, but for complex refactoring or architecture questions, I paste code into Claude. It's slower but handles nuanced problems better than any inline assistant.&lt;/p&gt;

&lt;p&gt;The point isn't finding one tool. It's building a workflow that matches how you actually code.&lt;/p&gt;




&lt;p&gt;More at dev.to/cumulus&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>productivity</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>How to Automate Code Reviews with Local LLMs (No API Keys Required)</title>
      <dc:creator>Chappie</dc:creator>
      <pubDate>Tue, 24 Mar 2026 07:02:21 +0000</pubDate>
      <link>https://forem.com/cumulus/how-to-automate-code-reviews-with-local-llms-no-api-keys-required-2839</link>
      <guid>https://forem.com/cumulus/how-to-automate-code-reviews-with-local-llms-no-api-keys-required-2839</guid>
      <description>&lt;p&gt;I got tired of waiting for PR reviews. My team's across three timezones, and sometimes a simple "is this logic right?" question sits for 12 hours.&lt;/p&gt;

&lt;p&gt;So I built an automated pre-commit code review using Ollama and git hooks. It runs entirely local—no API keys, no usage limits, no sending proprietary code to external servers.&lt;/p&gt;

&lt;p&gt;Here's the setup that's been running on my machine for two months.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Local LLMs for Code Review?
&lt;/h2&gt;

&lt;p&gt;Cloud APIs are great until:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You're working with sensitive code&lt;/li&gt;
&lt;li&gt;You hit rate limits at 2 AM debugging&lt;/li&gt;
&lt;li&gt;Your company's security policy says no external AI&lt;/li&gt;
&lt;li&gt;You don't want to pay per token for every commit&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Running local LLMs for coding tasks solves all of this. The quality isn't GPT-4, but for catching obvious bugs and suggesting improvements? It's surprisingly good.&lt;/p&gt;

&lt;h2&gt;
  
  
  What You'll Need
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Ollama&lt;/strong&gt; - Dead simple local LLM runner&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;A decent GPU&lt;/strong&gt; - 8GB VRAM minimum, 16GB recommended&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Git&lt;/strong&gt; - Obviously&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;10 minutes&lt;/strong&gt; - That's genuinely it&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Step 1: Install Ollama
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Linux/WSL&lt;/span&gt;
curl &lt;span class="nt"&gt;-fsSL&lt;/span&gt; https://ollama.com/install.sh | sh

&lt;span class="c"&gt;# macOS&lt;/span&gt;
brew &lt;span class="nb"&gt;install &lt;/span&gt;ollama

&lt;span class="c"&gt;# Start the service&lt;/span&gt;
ollama serve
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Pull a coding-focused model. I've tested several; here's what works:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Best balance of speed and quality&lt;/span&gt;
ollama pull deepseek-coder:6.7b

&lt;span class="c"&gt;# If you have 16GB+ VRAM&lt;/span&gt;
ollama pull codellama:13b
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Step 2: Create the Review Script
&lt;/h2&gt;

&lt;p&gt;Save this as &lt;code&gt;~/.local/bin/ai-review&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;#!/bin/bash&lt;/span&gt;
&lt;span class="nb"&gt;set&lt;/span&gt; &lt;span class="nt"&gt;-e&lt;/span&gt;

&lt;span class="nv"&gt;MODEL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;AI_REVIEW_MODEL&lt;/span&gt;&lt;span class="k"&gt;:-&lt;/span&gt;&lt;span class="nv"&gt;deepseek&lt;/span&gt;&lt;span class="p"&gt;-coder&lt;/span&gt;:6.7b&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;span class="nv"&gt;DIFF&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;git diff &lt;span class="nt"&gt;--cached&lt;/span&gt; &lt;span class="nt"&gt;--diff-filter&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;ACMR&lt;span class="si"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt; &lt;span class="nt"&gt;-z&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$DIFF&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then
    &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"No staged changes to review"&lt;/span&gt;
    &lt;span class="nb"&gt;exit &lt;/span&gt;0
&lt;span class="k"&gt;fi

&lt;/span&gt;&lt;span class="nv"&gt;PROMPT&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"Review this code diff. Be concise. Flag:
1. Obvious bugs or logic errors
2. Security issues (SQL injection, XSS, hardcoded secrets)
3. Performance problems
4. Missing error handling

If the code looks fine, just say 'LGTM'.

Diff:
&lt;/span&gt;&lt;span class="nv"&gt;$DIFF&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;

&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"🔍 Running local code review..."&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;""&lt;/span&gt;

ollama run &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$MODEL&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$PROMPT&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; 2&amp;gt;/dev/null

&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;""&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"---"&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Review complete. Commit? [y/N]"&lt;/span&gt;
&lt;span class="nb"&gt;read&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; response
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;[[&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$response&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;~ ^[Yy]&lt;span class="nv"&gt;$ &lt;/span&gt;&lt;span class="o"&gt;]]&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then
    &lt;/span&gt;&lt;span class="nb"&gt;exit &lt;/span&gt;0
&lt;span class="k"&gt;else
    &lt;/span&gt;&lt;span class="nb"&gt;exit &lt;/span&gt;1
&lt;span class="k"&gt;fi&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Make it executable:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;chmod&lt;/span&gt; +x ~/.local/bin/ai-review
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Step 3: Set Up the Git Hook
&lt;/h2&gt;

&lt;p&gt;Create a pre-commit hook in your repo:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# In your project directory&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; .git/hooks/pre-commit &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt;'
#!/bin/bash
~/.local/bin/ai-review
&lt;/span&gt;&lt;span class="no"&gt;EOF

&lt;/span&gt;&lt;span class="nb"&gt;chmod&lt;/span&gt; +x .git/hooks/pre-commit
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Want this globally? Use git templates:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;mkdir&lt;/span&gt; &lt;span class="nt"&gt;-p&lt;/span&gt; ~/.git-templates/hooks
&lt;span class="nb"&gt;cp&lt;/span&gt; ~/.local/bin/ai-review ~/.git-templates/hooks/pre-commit
git config &lt;span class="nt"&gt;--global&lt;/span&gt; init.templateDir ~/.git-templates
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Step 4: Test It
&lt;/h2&gt;

&lt;p&gt;Stage some code and commit:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git add suspicious-code.py
git commit &lt;span class="nt"&gt;-m&lt;/span&gt; &lt;span class="s2"&gt;"add feature"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You'll see something like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;🔍 Running local code review...

Issues found:

1. **SQL Injection** (line 23): User input passed directly to query.
   Use parameterized queries instead.

2. **Missing null check** (line 45): `user.profile` accessed without
   verifying user exists.

3. **Hardcoded credential** (line 12): API key in source code.
   Move to environment variable.

---
Review complete. Commit? [y/N]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Making It Actually Useful
&lt;/h2&gt;

&lt;p&gt;The basic setup works, but here's how I've tuned mine:&lt;/p&gt;

&lt;h3&gt;
  
  
  Skip Reviews for Trivial Commits
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Add to the script, after getting DIFF&lt;/span&gt;
&lt;span class="nv"&gt;LINES_CHANGED&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$DIFF&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; | &lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="nt"&gt;-c&lt;/span&gt; &lt;span class="s2"&gt;"^+"&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="nb"&gt;true&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$LINES_CHANGED&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="nt"&gt;-lt&lt;/span&gt; 5 &lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then
    &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Small change, skipping review"&lt;/span&gt;
    &lt;span class="nb"&gt;exit &lt;/span&gt;0
&lt;span class="k"&gt;fi&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Focus on Specific File Types
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Only review Python and JavaScript&lt;/span&gt;
&lt;span class="nv"&gt;DIFF&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;git diff &lt;span class="nt"&gt;--cached&lt;/span&gt; &lt;span class="nt"&gt;--diff-filter&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;ACMR &lt;span class="nt"&gt;--&lt;/span&gt; &lt;span class="s1"&gt;'*.py'&lt;/span&gt; &lt;span class="s1"&gt;'*.js'&lt;/span&gt; &lt;span class="s1"&gt;'*.ts'&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Bypass When Needed
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Skip the hook for quick fixes&lt;/span&gt;
git commit &lt;span class="nt"&gt;--no-verify&lt;/span&gt; &lt;span class="nt"&gt;-m&lt;/span&gt; &lt;span class="s2"&gt;"typo fix"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Log Reviews for Later
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Append to script before the prompt&lt;/span&gt;
&lt;span class="nv"&gt;REVIEW_LOG&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;~/.local/share/ai-reviews/&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;date&lt;/span&gt; +%Y-%m-%d&lt;span class="si"&gt;)&lt;/span&gt;.log
&lt;span class="nb"&gt;mkdir&lt;/span&gt; &lt;span class="nt"&gt;-p&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;dirname&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$REVIEW_LOG&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"=== &lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;date&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="s2"&gt; ==="&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$REVIEW_LOG&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$DIFF&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$REVIEW_LOG&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Performance Notes
&lt;/h2&gt;

&lt;p&gt;On my RTX 3080:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;deepseek-coder:6.7b&lt;/code&gt; - ~3 seconds for typical diffs&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;codellama:13b&lt;/code&gt; - ~8 seconds, slightly better catches&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;codellama:34b&lt;/code&gt; - ~25 seconds, overkill for pre-commit&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The 6.7B model catches 80% of what the larger models find. For pre-commit automation, speed matters more than catching edge cases.&lt;/p&gt;

&lt;h2&gt;
  
  
  What It Won't Catch
&lt;/h2&gt;

&lt;p&gt;Be realistic. Local LLMs miss:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Complex architectural issues&lt;/li&gt;
&lt;li&gt;Business logic errors (it doesn't know your domain)&lt;/li&gt;
&lt;li&gt;Subtle race conditions&lt;/li&gt;
&lt;li&gt;Whether your code actually solves the right problem&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This isn't a replacement for human review. It's a first pass that catches the embarrassing stuff before your teammates see it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Actual Value
&lt;/h2&gt;

&lt;p&gt;Two months in, here's what I've noticed:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Fewer "oops" commits&lt;/strong&gt; - It catches the dumb mistakes I make at midnight&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Faster PR reviews&lt;/strong&gt; - Human reviewers focus on architecture, not typos&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Better habits&lt;/strong&gt; - Knowing there's a check makes me write cleaner first drafts&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The whole setup took 10 minutes. The ROI has been significant.&lt;/p&gt;




&lt;p&gt;More at dev.to/cumulus&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>productivity</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>Cursor vs Copilot in 2026: I Switched After 2 Years—Here's What Happened</title>
      <dc:creator>Chappie</dc:creator>
      <pubDate>Mon, 23 Mar 2026 07:02:15 +0000</pubDate>
      <link>https://forem.com/cumulus/cursor-vs-copilot-in-2026-i-switched-after-2-years-heres-what-happened-362p</link>
      <guid>https://forem.com/cumulus/cursor-vs-copilot-in-2026-i-switched-after-2-years-heres-what-happened-362p</guid>
      <description>&lt;p&gt;I was a GitHub Copilot loyalist. Two years of daily use, hundreds of accepted suggestions, a workflow I thought was optimized. Then I tried Cursor for a week. I haven't gone back.&lt;/p&gt;

&lt;p&gt;This isn't a feature checklist. It's what actually matters when you're shipping code.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Core Difference
&lt;/h2&gt;

&lt;p&gt;Copilot treats AI as autocomplete on steroids. Cursor treats AI as a pair programmer who can see your entire codebase.&lt;/p&gt;

&lt;p&gt;That distinction changes everything.&lt;/p&gt;

&lt;p&gt;When I ask Copilot to refactor a function, it sees the current file. Maybe some context from open tabs. When I ask Cursor the same thing, it understands how that function connects to my services, my types, my tests. It suggests changes I'd actually make.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Copilot suggestion: technically correct, misses context&lt;/span&gt;
&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;getUserData&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`/api/users/&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;then&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;r&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;());&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;// Cursor suggestion: knows my codebase uses the ApiClient pattern&lt;/span&gt;
&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;getUserData&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;apiClient&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="kd"&gt;get&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;User&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`/users/&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;cache&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;cacheStrategy&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;retry&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Cursor knew about &lt;code&gt;ApiClient&lt;/code&gt; because it indexed my project. Copilot was guessing.&lt;/p&gt;

&lt;h2&gt;
  
  
  Speed vs Intelligence
&lt;/h2&gt;

&lt;p&gt;Copilot is faster. No contest. The ghost text appears almost instantly. Cursor takes a beat longer, especially for complex suggestions.&lt;/p&gt;

&lt;p&gt;But I've stopped caring about milliseconds. What matters is how often I accept the suggestion versus how often I have to fix it.&lt;/p&gt;

&lt;p&gt;My Copilot acceptance rate: ~40%&lt;br&gt;
My Cursor acceptance rate: ~70%&lt;/p&gt;

&lt;p&gt;That 30% difference compounds. Fewer corrections. Fewer context switches. Less cognitive load.&lt;/p&gt;
&lt;h2&gt;
  
  
  The Composer Changed My Workflow
&lt;/h2&gt;

&lt;p&gt;Cursor's Composer feature lets you describe changes across multiple files in natural language. "Add error handling to all API endpoints and update the corresponding tests."&lt;/p&gt;

&lt;p&gt;It generates a diff. You review it. Accept or reject per-file.&lt;/p&gt;

&lt;p&gt;I refactored an entire authentication module in 20 minutes. With Copilot, that's a morning of manual work.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# What I typed in Composer:&lt;/span&gt;
&lt;span class="s2"&gt;"Replace all instances of the legacy AuthService with the new 
AuthProvider pattern. Update imports. Fix any type errors."&lt;/span&gt;

&lt;span class="c"&gt;# What I got:&lt;/span&gt;
&lt;span class="c"&gt;# - 14 files modified&lt;/span&gt;
&lt;span class="c"&gt;# - All imports updated  &lt;/span&gt;
&lt;span class="c"&gt;# - Type definitions fixed&lt;/span&gt;
&lt;span class="c"&gt;# - One edge case flagged for manual review&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Copilot Chat exists, but it operates in a separate panel. It doesn't understand your project structure the same way. It's a chatbot that happens to know about code. Cursor is an IDE that happens to have AI woven through every interaction.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Copilot Still Does Better
&lt;/h2&gt;

&lt;p&gt;Inline completions for boilerplate. Writing standard loops, imports, basic CRUD operations—Copilot nails these instantly. Cursor sometimes overthinks simple tasks.&lt;/p&gt;

&lt;p&gt;GitHub integration is seamless if your team lives in the GitHub ecosystem. PR descriptions, issue references, Actions workflows. Copilot understands GitHub because it &lt;em&gt;is&lt;/em&gt; GitHub.&lt;/p&gt;

&lt;p&gt;Enterprise compliance. If your company already pays for GitHub Enterprise, Copilot slots in without procurement headaches. Cursor requires a separate vendor relationship.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Cost Question
&lt;/h2&gt;

&lt;p&gt;Copilot: $10/month (Individual) or $19/month (Business)&lt;br&gt;
Cursor: $20/month (Pro) or $40/month (Business)&lt;/p&gt;

&lt;p&gt;Cursor costs more. For me, it's worth it. The multi-file refactoring alone saves hours per week.&lt;/p&gt;

&lt;p&gt;But if you're writing straightforward code—standard web apps, CRUD APIs, scripts—Copilot delivers 80% of the value at half the price.&lt;/p&gt;

&lt;h2&gt;
  
  
  My Setup Now
&lt;/h2&gt;

&lt;p&gt;I run both. Cursor as my primary editor for complex projects. Copilot in VS Code for quick scripts and one-off files.&lt;/p&gt;

&lt;p&gt;This sounds wasteful. It's not. Different tools for different contexts.&lt;/p&gt;

&lt;p&gt;For greenfield projects, architecture decisions, refactoring legacy code: Cursor.&lt;br&gt;
For quick fixes, small scripts, config files: Copilot in VS Code.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Verdict
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Choose Copilot if:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You want fast, cheap, good-enough completions&lt;/li&gt;
&lt;li&gt;Your team is locked into GitHub&lt;/li&gt;
&lt;li&gt;You write mostly straightforward code&lt;/li&gt;
&lt;li&gt;Enterprise compliance matters more than features&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Choose Cursor if:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You work on complex, interconnected codebases&lt;/li&gt;
&lt;li&gt;Multi-file refactoring is part of your week&lt;/li&gt;
&lt;li&gt;You want AI that understands your project, not just your file&lt;/li&gt;
&lt;li&gt;You'll pay more for fewer context switches&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I switched because I got tired of AI suggestions that were technically correct but contextually wrong. Cursor understands my codebase. Copilot understands code.&lt;/p&gt;

&lt;p&gt;That's the difference that mattered.&lt;/p&gt;




&lt;p&gt;More at dev.to/cumulus&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>productivity</category>
      <category>vscode</category>
    </item>
  </channel>
</rss>
