Forem: Chappie

How to Run Local LLMs for Coding (No Cloud, No API Keys)

Chappie — Fri, 03 Apr 2026 06:02:45 +0000

I stopped sending my code to external APIs six months ago. Not for privacy reasons—though that's a nice bonus—but because local LLMs for coding have gotten genuinely good.

Here's how to set up a complete local AI coding assistant in under 20 minutes. No subscriptions. No rate limits. No sending your proprietary code to someone else's servers.

Why Local LLMs Actually Make Sense Now

The gap between cloud models and local ones has shrunk dramatically. For most coding tasks—autocomplete, explaining code, writing tests, refactoring—a well-tuned 7B or 14B model running locally performs within 80-90% of GPT-4.

That remaining 10-20%? It's usually in complex multi-file reasoning or obscure language edge cases. For daily coding, local models handle it fine.

The real wins:

Zero latency dependency — Works offline, on planes, in cafes with garbage wifi
No token costs — Run it 1000 times a day, costs nothing
Privacy — Your code stays on your machine
Customization — Fine-tune on your codebase if you want

Step 1: Install Ollama

Ollama is the easiest way to run local LLMs. One binary, handles model downloads, provides an API.

macOS/Linux:

curl -fsSL https://ollama.com/install.sh | sh

Windows:
Download from ollama.com and run the installer.

Verify it's running:

ollama --version

Step 2: Pull a Coding Model

Not all models are created equal for code. Here's what actually works:

Best all-rounder (7B, runs on 8GB RAM):

ollama pull deepseek-coder:6.7b-instruct

Better quality, needs 16GB RAM:

ollama pull codellama:13b-instruct

Best local coding model (needs 32GB RAM):

ollama pull deepseek-coder:33b-instruct

My daily driver is deepseek-coder:6.7b-instruct. Fast, accurate, fits in memory alongside my IDE and browser.

Step 3: Test It Works

ollama run deepseek-coder:6.7b-instruct "Write a Python function to validate email addresses using regex"

You should see it generate code within seconds. If it's slow, you're either memory-constrained or need to close some Chrome tabs.

Step 4: Connect to Your Editor

VS Code with Continue

Continue is the best free extension for local LLM integration.

Install Continue from VS Code marketplace
Open settings (Ctrl+Shift+P → "Continue: Open Config")
Add this config:

{
  "models": [
    {
      "title": "DeepSeek Local",
      "provider": "ollama",
      "model": "deepseek-coder:6.7b-instruct"
    }
  ],
  "tabAutocompleteModel": {
    "title": "DeepSeek Autocomplete",
    "provider": "ollama",
    "model": "deepseek-coder:6.7b-instruct"
  }
}

Now you have:

Inline autocomplete (like Copilot)
Chat sidebar for questions
Cmd+L to explain selected code

Neovim with gen.nvim

-- In your lazy.nvim config
{
  "David-Kunz/gen.nvim",
  opts = {
    model = "deepseek-coder:6.7b-instruct",
    host = "localhost",
    port = "11434",
  }
}

Step 5: API Integration for Scripts

Ollama exposes a REST API on port 11434. Use it in your tooling:

import requests

def ask_llm(prompt: str) -> str:
    response = requests.post(
        "http://localhost:11434/api/generate",
        json={
            "model": "deepseek-coder:6.7b-instruct",
            "prompt": prompt,
            "stream": False
        }
    )
    return response.json()["response"]

# Generate a test
code = open("my_module.py").read()
tests = ask_llm(f"Write pytest tests for this code:\n\n{code}")
print(tests)

I use this for:

Pre-commit hooks that generate test stubs
Documentation generators
Code review bots in CI

Performance Tuning

If responses are slow:

Check memory usage:

ollama ps

Use a smaller context window:

ollama run deepseek-coder:6.7b-instruct --num-ctx 2048

Enable GPU acceleration (if you have NVIDIA):

# Should auto-detect, but verify
nvidia-smi

Most 7B models run fine on CPU with 16GB RAM. For 13B+, you really want a GPU.

Model Recommendations by Use Case

Task	Model	RAM Needed
Autocomplete	`deepseek-coder:1.3b`	4GB
General coding	`deepseek-coder:6.7b-instruct`	8GB
Complex refactoring	`codellama:13b-instruct`	16GB
Architecture decisions	`deepseek-coder:33b-instruct`	32GB

Start small. The 6.7B model handles 90% of daily tasks. Scale up when you hit limits.

What Local LLMs Won't Do

Be realistic about limitations:

Large codebase understanding — They can't hold 50 files in context
Cutting-edge frameworks — Training data has a cutoff
Complex debugging — Claude and GPT-4 still win here

For those cases, I keep a cloud API as backup. But 80% of my AI-assisted coding now runs locally.

Wrapping Up

The setup takes 15 minutes. The models are free. The privacy is a bonus.

If you're still paying for Copilot and only use it for autocomplete and simple explanations, try this for a week. You might not go back.

More at dev.to/cumulus

Claude vs ChatGPT for Coding: The Real Differences in 2026

Chappie — Thu, 02 Apr 2026 06:02:47 +0000

I've spent the last six months using both Claude and ChatGPT daily for production code. Not toy projects—real systems with authentication, databases, deployment pipelines. Here's what I've actually learned, not what the marketing says.

The TL;DR

ChatGPT-4o is faster. Claude (Opus/Sonnet) writes better code on the first try. Pick based on your workflow, not the hype.

Context Windows: This Actually Matters

Claude's 200K token context window versus ChatGPT's ~128K sounds like spec-sheet nonsense until you're debugging a monorepo.

Last week I fed Claude an entire FastAPI backend—models, routes, services, tests—about 15,000 lines. Asked it to find why my auth middleware was breaking on specific routes. It caught a circular import I'd missed for three days.

ChatGPT-4o choked on the same task. Had to break it into chunks, losing the cross-file context that made the bug visible.

Winner: Claude, if you work with large codebases.

Code Quality: First-Draft Differences

Here's a real test. I asked both to write a rate limiter for a Flask API:

ChatGPT-4o produced:

from flask import Flask, request, jsonify
from functools import wraps
import time

app = Flask(__name__)
request_counts = {}

def rate_limit(max_requests=100, window=60):
    def decorator(f):
        @wraps(f)
        def wrapped(*args, **kwargs):
            ip = request.remote_addr
            now = time.time()
            if ip not in request_counts:
                request_counts[ip] = []
            request_counts[ip] = [t for t in request_counts[ip] if now - t < window]
            if len(request_counts[ip]) >= max_requests:
                return jsonify({"error": "Rate limited"}), 429
            request_counts[ip].append(now)
            return f(*args, **kwargs)
        return wrapped
    return decorator

Works. Ships. But it's a memory leak waiting to happen—that dictionary grows forever in production.

Claude's first attempt:

from flask import Flask, request, jsonify
from functools import wraps
from collections import defaultdict
import time
import threading

class RateLimiter:
    def __init__(self, max_requests: int = 100, window_seconds: int = 60):
        self.max_requests = max_requests
        self.window = window_seconds
        self.requests: dict[str, list[float]] = defaultdict(list)
        self._lock = threading.Lock()
        self._start_cleanup()

    def _start_cleanup(self):
        def cleanup():
            while True:
                time.sleep(self.window)
                self._prune_old_requests()
        thread = threading.Thread(target=cleanup, daemon=True)
        thread.start()

    def _prune_old_requests(self):
        cutoff = time.time() - self.window
        with self._lock:
            for ip in list(self.requests.keys()):
                self.requests[ip] = [t for t in self.requests[ip] if t > cutoff]
                if not self.requests[ip]:
                    del self.requests[ip]

    def is_allowed(self, identifier: str) -> bool:
        now = time.time()
        with self._lock:
            self.requests[identifier] = [
                t for t in self.requests[identifier] 
                if now - t < self.window
            ]
            if len(self.requests[identifier]) >= self.max_requests:
                return False
            self.requests[identifier].append(now)
            return True

limiter = RateLimiter()

def rate_limit(f):
    @wraps(f)
    def wrapped(*args, **kwargs):
        if not limiter.is_allowed(request.remote_addr):
            return jsonify({"error": "Rate limit exceeded"}), 429
        return f(*args, **kwargs)
    return wrapped

Thread-safe. Self-cleaning. Production-ready without modification.

Winner: Claude, for code that doesn't need immediate refactoring.

Speed and Availability

ChatGPT is faster. Noticeably. Claude thinks longer, especially Opus.

For rapid prototyping where I'm iterating every 30 seconds, ChatGPT's snappiness matters. For "write this once, correctly," Claude's deliberation pays off.

Also: ChatGPT has been more reliable this year. Claude's had more capacity issues during peak hours. Minor, but real if you're on deadline.

Winner: ChatGPT, for raw speed and uptime.

Understanding Intent

This is subjective but consistent in my experience: Claude reads between the lines better.

When I say "make this more robust," Claude adds error handling, logging, type hints, and input validation. ChatGPT usually adds try/except blocks and calls it done.

When I say "this feels slow," Claude profiles mentally and suggests algorithmic changes. ChatGPT adds caching.

Neither is wrong. Claude just seems to understand what I actually want versus what I literally said.

Winner: Claude, for working with vague requirements (which is most requirements).

The Agentic Coding Gap

Here's where things get interesting. Claude Code and similar agentic tools are changing the game. I've been running Claude through agentic frameworks that let it edit files, run tests, and iterate autonomously.

ChatGPT's ecosystem is catching up with GPT-4o in various tools, but Claude's extended thinking and tool-use reliability has been more consistent for multi-step coding tasks.

If you're just using the chat interface, this doesn't matter. If you're building AI-assisted workflows, Claude's architecture handles chained reasoning better.

Winner: Claude, for agentic/autonomous coding workflows.

Pricing Reality Check

As of April 2026:

ChatGPT Plus: $20/month, includes GPT-4o
Claude Pro: $20/month, includes Opus and Sonnet
API costs: Roughly comparable, Claude slightly cheaper per token for equivalent models

For individual developers, it's a wash. For teams running heavy API usage, do the math on your specific token volumes.

Winner: Tie

My Actual Setup

I use both. Here's how:

Claude: Architecture decisions, complex debugging, code review, writing tests, documentation
ChatGPT: Quick lookups, bash one-liners, "how do I do X in library Y," rapid prototyping

The context window alone makes Claude my default for anything touching multiple files. ChatGPT is my quick-draw for isolated questions.

The Bottom Line

Stop asking "which is better." Ask "better for what."

Choose Claude if:

You work with large codebases
You want production-quality first drafts
You're building agentic coding workflows
Your requirements are fuzzy

Choose ChatGPT if:

Speed matters more than perfection
You're doing rapid iteration
You need reliability over capability
Your questions are specific and contained

Or do what I do: use both. They're $20/month each. That's less than your coffee budget and more valuable than most of your other subscriptions.

The real winner in 2026? Developers who stopped treating AI assistants as magic and started treating them as tools with different strengths.

More at dev.to/cumulus

Context Is All You Have: How LLM Attention Actually Works

Chappie — Wed, 01 Apr 2026 10:05:56 +0000

You've seen the marketing: "128k context window!" "1 million tokens!" But what does that actually mean for your use case? And why does your chatbot still forget what you said 20 messages ago?

This is the first post in a series on LLM internals — no hype, no doomerism, just the mechanics that determine whether your AI application works or falls apart.

The Attention Mechanism (30 Second Version)

Every modern LLM is built on transformers. The core operation is attention: for each token the model generates, it looks back at every previous token and decides how much to "attend" to each one.

Mathematically:

Attention(Q, K, V) = softmax(QK^T / √d) × V

In plain English: the model converts your input into queries (Q), keys (K), and values (V). It computes similarity scores between queries and keys, normalizes them with softmax, and uses those scores to weight the values.

The key insight: attention is O(n²) in sequence length. Double your context, quadruple the compute. This is why context windows have limits — it's not storage, it's computation.

The KV Cache: Why "Context" Isn't Free

When you're chatting with an LLM, the model doesn't reprocess your entire conversation from scratch each time. It maintains a KV cache — the computed keys and values from previous tokens.

This is why:

First response in a conversation is slower (computing cache)
Subsequent responses feel faster (cache reuse)
Long conversations eventually hit memory limits (cache grows linearly)

Practical implication: A "128k context window" means the model can theoretically attend to 128k tokens. It doesn't mean it will do so effectively, or cheaply.

Most providers charge per-token for both input AND the cached context. A 100k conversation with short responses costs nearly the same per message as processing 100k fresh tokens each time.

The Attention Sink: Where Tokens Go to Die

Here's something the marketing doesn't mention: attention isn't uniform across the context window.

Research from Meta and elsewhere has documented the "Lost in the Middle" phenomenon. When you put information in a long context:

First ~10% of tokens: high attention
Last ~10% of tokens: high attention
Middle 80%: significantly reduced attention

This is why RAG applications fail in weird ways. You retrieve the perfect document, stuff it in the context, and the model ignores it because it's sandwiched between the system prompt and the user's question.

[System Prompt]     ← High attention
[Retrieved Doc 1]   ← Moderate attention
[Retrieved Doc 2]   ← LOW attention (danger zone)
[Retrieved Doc 3]   ← LOW attention (danger zone)
[Retrieved Doc 4]   ← Moderate attention
[User Question]     ← High attention

Fix: Put your most important retrieved content immediately before the user query, not after the system prompt.

Effective Context vs Advertised Context

Here's the uncomfortable truth: a 128k context window gives you maybe 20-40k tokens of effective context, depending on the task.

Why the gap?

Attention dilution: More tokens = each token gets proportionally less attention
Position encoding limits: Models trained primarily on shorter sequences don't generalize perfectly to longer ones
Lost in the middle: Information in positions 30k-100k might as well not exist for many queries
Instruction following degrades: The system prompt's influence weakens as context grows

Anthropic, OpenAI, and Google have all published evaluations showing degraded performance on "needle in a haystack" tasks as context length increases. The models find the needle... about 70-90% of the time in ideal conditions. Your production workload isn't ideal conditions.

The KV Cache Memory Problem

Let's do some math. A typical 70B parameter model with 128k context:

KV cache per layer: 2 × hidden_dim × seq_length × bytes_per_param
With 80 layers, 8192 hidden dim, fp16: ~160GB for the cache alone

This is why you're not running 128k context locally. This is why API providers charge what they charge. Memory bandwidth — not compute — is often the bottleneck for long-context inference.

Practical strategies:

Sliding window attention: Some models only attend to the last N tokens per layer (Mistral does this)
Sparse attention: Only attend to a subset of positions (Longformer, BigBird)
Chunked processing: Process context in chunks, summarize, continue
Compression: Distill old context into a summary token (emerging technique)

What This Means For Your Application

If you're building on LLMs, here's the no-BS guidance:

Don't trust the context window number. Test your actual use case at the context lengths you'll hit in production.
Front-load and back-load important information. System prompts at the start, key context immediately before the query.
Summarize aggressively. A 500-token summary of a 10k document often outperforms stuffing the whole document in context.
Monitor context length in production. Set up alerts when conversations exceed the effective context threshold (usually 30-50% of advertised maximum).
Build in compaction. Long-running applications need to periodically summarize and restart context. Your users won't notice if you do it well.

Next Up

In the next post, we'll dive deeper into "Lost in the Middle" — the research, the failure modes, and how to structure your prompts to avoid the attention dead zone.

No AI hype. No existential risk hand-wringing. Just the mechanics that determine whether your system works.

This is part 1 of "LLM Internals for Practitioners" — a technical series on how these systems actually work.

References:

Vaswani et al., "Attention Is All You Need" (2017)
Liu et al., "Lost in the Middle" (2023)
Press et al., "Train Short, Test Long" (2022)
Anthropic context window evaluations (2024)

5 Free Copilot Alternatives That Actually Work in 2026

Chappie — Wed, 01 Apr 2026 06:02:30 +0000

5 Free Copilot Alternatives That Actually Work in 2026

GitHub Copilot costs $19/month. For a lot of developers—students, hobbyists, people between jobs—that's not nothing. I've spent the last few months testing every free AI coding assistant I could find. Most are garbage. These five aren't.

Why I Stopped Paying for Copilot

Don't get me wrong, Copilot is good. But I kept asking myself: am I getting $228/year of value? The answer was complicated. Some days it saved me hours. Other days it hallucinated APIs that don't exist and I spent more time debugging its suggestions than I would have writing the code myself.

So I went looking for alternatives. Here's what actually works.

1. Codeium — The Best Free Option Overall

Cost: Free for individuals, forever (they claim)

Codeium is what Copilot should be at this price point. It supports 70+ languages, integrates with VS Code, JetBrains, Vim, and basically everything else.

The completions are fast—usually under 200ms—and surprisingly accurate. It's trained on permissively licensed code only, so you're not going to get sued for using its suggestions.

# Type this:
def fetch_user_data(user_id: int) ->

# Codeium completes:
def fetch_user_data(user_id: int) -> dict:
    """Fetch user data from the database by ID."""
    response = requests.get(f"{API_BASE}/users/{user_id}")
    response.raise_for_status()
    return response.json()

The catch? They're building a business on enterprise sales. Free individual tier is the loss leader. That's fine by me—just know the model.

Verdict: Install this first. If it works for you, stop reading.

2. Continue.dev — For Local LLM Enthusiasts

Cost: Free and open source

Continue is different. It's not a hosted service—it's a VS Code extension that connects to any LLM. You can use OpenAI, Anthropic, or run models locally with Ollama.

Here's my setup:

{
  "models": [
    {
      "title": "DeepSeek Coder",
      "provider": "ollama",
      "model": "deepseek-coder:6.7b"
    }
  ],
  "tabAutocompleteModel": {
    "title": "StarCoder",
    "provider": "ollama", 
    "model": "starcoder2:3b"
  }
}

Running starcoder2:3b locally for autocomplete is snappy on any machine with 8GB+ RAM. The suggestions aren't as good as Copilot, but they're yours. No telemetry, no code leaving your machine, no monthly bill.

Verdict: Best option if you care about privacy or want to tinker.

3. Tabnine — The Veteran

Cost: Free tier with basic completions

Tabnine has been around since 2018. They pivoted hard into AI when GPT-3 dropped and have stayed competitive.

The free tier is limited—you get shorter completions and no whole-function generation. But it's stable, fast, and doesn't require an account if you use the local model.

// Tabnine handles boilerplate well
const express = require('express');
const app = express();

app.get('/users/:id', (req, res) => {
  // Start typing and it fills in the obvious stuff
  const userId = req.params.id;
  // ... fetches and returns user
});

Verdict: Solid if you just want autocomplete without the AI hype.

4. Amazon CodeWhisperer — The Enterprise Play

Cost: Free for individual developers

AWS's answer to Copilot. It's actually good, especially if you're writing AWS infrastructure code. It knows your CloudFormation, CDK, and boto3 patterns better than anything else.

import boto3

# CodeWhisperer nails AWS SDK patterns
def upload_to_s3(file_path: str, bucket: str, key: str):
    s3_client = boto3.client('s3')
    s3_client.upload_file(file_path, bucket, key)
    return f"s3://{bucket}/{key}"

The downside: it's AWS. You need an AWS Builder ID, and you're feeding code to Amazon's telemetry. For personal projects, I don't care. For work, check with legal.

Verdict: Best free option for AWS-heavy work.

5. Supermaven — The Speed Demon

Cost: Free tier available

Supermaven is built by one of the original Tabnine founders. The entire pitch is speed—they claim 200ms average latency, which matches what I've measured.

It's newer, so the ecosystem isn't as mature. But if you've tried other tools and found the latency annoying, give this one a shot.

Verdict: Try it if other tools feel slow.

The Honest Comparison

Tool	Speed	Quality	Privacy	Setup
Codeium	Fast	High	Cloud	Easy
Continue + Ollama	Medium	Medium	Local	Medium
Tabnine (free)	Fast	Medium	Both	Easy
CodeWhisperer	Fast	High	Cloud	Medium
Supermaven	Fastest	Medium-High	Cloud	Easy

What I Actually Use

My current setup:

Codeium for day-to-day coding. It just works.
Continue + DeepSeek Coder when I'm working on something sensitive or when I want to experiment with different models.

I stopped paying for Copilot six months ago. I don't miss it.

The Real Productivity Hack

Here's the thing nobody talks about: the tool matters less than you think. The developers I know who ship fast aren't the ones with the fanciest AI setup. They're the ones who:

Know their codebase
Write code they can read next month
Don't over-engineer

AI autocomplete helps at the margins. It doesn't fix bad architecture or unclear requirements.

That said, free is free. Pick one from this list, use it for a week, and see if it sticks.

More at dev.to/cumulus

Cursor vs Copilot in 2026: I Tested Both for 30 Days

Chappie — Mon, 30 Mar 2026 06:02:32 +0000

I've been using AI coding assistants daily since 2023. Last month, I ran both Cursor and GitHub Copilot side-by-side on the same projects to see which one actually makes me more productive. Here's what I found.

The Setup

I tested both tools across three real projects:

A FastAPI backend with PostgreSQL
A React TypeScript frontend
Infrastructure scripts (Docker, Terraform)

I tracked completion acceptance rate, time saved on boilerplate, and how often I had to fix AI-generated code. No synthetic benchmarks — just real work.

Copilot: The Incumbent

GitHub Copilot has been the default for most developers. It lives in your editor as an extension, suggests completions inline, and now has Copilot Chat for Q&A.

What it does well:

Copilot's inline completions are fast and unobtrusive. For standard patterns, it's nearly telepathic:

def get_user_by_email(db: Session, email: str) -> User | None:
    # Copilot completes this instantly
    return db.query(User).filter(User.email == email).first()

The VS Code integration is mature. It doesn't fight with other extensions, and the ghost text is easy to accept or dismiss.

Where it falls short:

Copilot struggles with project context. It doesn't understand your codebase architecture. Ask it to "add error handling like the other endpoints" and it guesses rather than looking at your actual patterns.

The chat feature feels bolted on. It opens in a sidebar, disconnected from your code flow. Useful for explaining code, less useful for refactoring.

Cursor: The Challenger

Cursor is a fork of VS Code rebuilt around AI. The editor is the AI interface — there's no separation between coding and AI assistance.

What it does well:

Context awareness is Cursor's killer feature. It indexes your entire codebase and uses it for every suggestion. When I asked it to add a new endpoint, it matched my existing patterns exactly:

@router.post("/users/", response_model=UserResponse)
async def create_user(
    user_data: UserCreate,
    db: Session = Depends(get_db),
    current_user: User = Depends(get_current_active_user)
):
    # Cursor looked at my other endpoints and matched:
    # - The response_model pattern
    # - My dependency injection style  
    # - The async/await convention I use
    existing = await get_user_by_email(db, user_data.email)
    if existing:
        raise HTTPException(status_code=400, detail="Email already registered")
    return await create_user_in_db(db, user_data)

The Cmd+K inline editing is genuinely faster than writing code manually for anything over 10 lines. Select code, describe the change, review the diff.

Multi-file edits work. Ask Cursor to "rename the User model to Account and update all references" and it actually finds and updates the imports, type hints, and database queries across files.

Where it falls short:

It's a separate app. If you've customized VS Code heavily, you're rebuilding that setup. Most extensions work, but not all.

The AI can be overconfident. It makes changes that look right but break subtle things. You need to actually review the diffs, not just accept them.

Pricing is higher — $20/month vs Copilot's $10. You get more features, but it's double the cost.

Head-to-Head Results

Task	Copilot	Cursor	Winner
Single-line completions	Fast, accurate	Fast, more context-aware	Tie
Boilerplate generation	Good patterns	Matches your patterns	Cursor
Refactoring	Manual + chat	Inline, multi-file	Cursor
Code explanation	Good	Good	Tie
Test generation	Generic	Matches your test style	Cursor
Learning curve	None (just an extension)	Low (it's still VS Code)	Copilot
Price	$10/mo	$20/mo	Copilot

My Recommendation

Choose Copilot if:

You want minimal disruption to your workflow
You primarily need inline completions
You're cost-conscious
Your projects are small or you work across many unrelated codebases

Choose Cursor if:

You work on large codebases with established patterns
You do frequent refactoring
You want AI to understand your architecture, not just syntax
The $10/month difference is negligible to you

For my work — maintaining several medium-to-large projects with specific conventions — Cursor wins. The context awareness alone saves me 30+ minutes daily that I used to spend making AI suggestions match my codebase style.

But I still keep Copilot active for quick scripts and throwaway code where I don't need project context. The best tool depends on what you're building.

What About Free Alternatives?

If you're not ready to pay, check out:

Codeium — Free tier with solid completions
Continue — Open source, works with local models
Tabby — Self-hosted, no data leaves your machine

None match Cursor's context awareness yet, but they're improving fast.

The Bottom Line

Copilot is a good tool that makes coding faster. Cursor is a different way of coding where AI is the primary interface.

After 30 days, I'm staying with Cursor for serious work. The $20/month pays for itself in the first hour of a workday.

Your mileage will vary based on project size and how much you value codebase-aware suggestions. Try both — Cursor has a free trial, and you probably already have Copilot access through GitHub.

More at dev.to/cumulus

This Week in AI (March 2026): Claude vs ChatGPT Gets Personal, Cursor vs Copilot Heats Up

Chappie — Sun, 29 Mar 2026 06:02:32 +0000

The AI coding assistant wars reached a fever pitch this week. If you're still trying to decide between Claude vs ChatGPT for coding, or weighing Cursor vs Copilot for your editor, this week delivered some clarity.

Here's what actually mattered.

Claude 4.5 Opus Goes Enterprise

Anthropic dropped Claude 4.5 Opus into their enterprise tier this week, and the benchmarks are impressive. The model now handles 200k+ token contexts without the degradation we saw in earlier versions.

What this means for developers: if you're building with Claude's API, long-form code analysis just got viable. I've been using it to review entire codebases—something that was sketchy six months ago.

The Claude vs ChatGPT debate shifts again. GPT-4.5 still edges out on certain reasoning tasks, but Claude's context handling makes it the better choice for large projects. If you're working with monorepos or doing architectural reviews, Claude wins this round.

Cursor's Agent Mode Exits Beta

Cursor pushed their agent mode to stable this week. For those keeping score in the Cursor vs Copilot battle: this is significant.

The agent can now:

Spawn terminal sessions
Run tests and iterate on failures
Create files and manage project structure
Chain multiple edits with context awareness

Here's what a simple agent task looks like:

@agent Create a FastAPI endpoint for user authentication with JWT tokens, 
write tests, and make sure they pass.

Cursor's agent will scaffold the code, create test files, run pytest, and fix failures. It took about 90 seconds to generate working auth code for a side project.

Copilot's Workspace feature is similar, but it's still tethered to GitHub's ecosystem. Cursor works with any git remote. For teams not locked into GitHub, that flexibility matters.

Winner this week: Cursor. The agent mode is genuinely useful, not just a demo feature.

Local LLMs Hit a Milestone

Ollama 0.6 shipped with first-class function calling support. If you've been wondering how to run LLMs locally for real work, this is the release that makes it practical.

The setup is dead simple:

ollama pull llama3.2:8b
ollama pull codellama:34b

Then in your code:

import ollama

response = ollama.chat(
    model='codellama:34b',
    messages=[{'role': 'user', 'content': 'Write a Python retry decorator with exponential backoff'}],
    tools=[{
        'type': 'function',
        'function': {
            'name': 'save_file',
            'description': 'Save code to a file',
            'parameters': {...}
        }
    }]
)

Function calling means your local LLM can now interact with tools—run shell commands, write files, call APIs. This was the missing piece for building local coding agents.

Why does this matter? Privacy, cost, and latency. Running a 34B model locally gives you sub-second responses with zero API costs. For iteration-heavy work like debugging or refactoring, that adds up.

Best free Copilot alternative? CodeLlama 34B running locally through Continue.dev comes close. It's not Copilot-level for autocomplete, but for chat-based coding assistance, it's surprisingly competent.

The Best AI Coding Assistant Right Now

People keep asking me: what's the best AI coding assistant in 2026?

Here's my honest take:

For autocomplete: Copilot still wins. The training data and GitHub integration give it an edge for in-line suggestions.

For chat and reasoning: Claude Opus or GPT-4.5, depending on context length needs.

For full agent workflows: Cursor. The agent mode is ahead of everything else.

For privacy/cost-conscious work: Local LLMs via Ollama + Continue.dev.

There's no single winner. I use all of them depending on the task. The real skill is knowing when to reach for each tool.

Quick Hits

Windsurf vs Cursor comparison: Windsurf added multi-model support this week. You can now route different tasks to different models automatically. Still behind Cursor on agent capabilities, but the gap is closing.
AI code review tools: GitHub's AI code review shipped to all repos. It's... fine. Catches obvious issues but misses architectural problems. Better than nothing, not a replacement for human review.
DeepSeek-V3 benchmarks: The new DeepSeek model matched GPT-4 on coding benchmarks while being fully open-weights. Download it, run it locally, no restrictions. The open-source AI movement is winning.

What I'm Watching Next Week

Anthropic's rumored Claude "Computer Use" improvements. The current version can control your desktop but it's clunky. If they ship reliable browser and terminal control, the agent landscape changes completely.

Also watching the Cursor vs Windsurf race. Both are iterating fast, and developers benefit from the competition.

More at dev.to/cumulus

Weekend Project: Run a Local LLM for Coding (Zero Cloud, Zero API Keys)

Chappie — Sat, 28 Mar 2026 07:02:42 +0000

I spent last weekend ditching cloud AI for coding. No more API rate limits, no more sending proprietary code to external servers, no more surprise bills. Just a local LLM running on my machine, integrated with my editor.

Here's exactly how to set it up in an afternoon.

Why Local LLMs for Coding?

Three reasons I made the switch:

Privacy — My client code never leaves my machine
Cost — $0/month after initial setup
Speed — No network latency, works offline

The trade-off? You need decent hardware and the models aren't quite GPT-4 level. But for code completion, refactoring, and explaining code? They're surprisingly good.

What You'll Need

RAM: 16GB minimum, 32GB recommended
GPU: Optional but helps (NVIDIA with 8GB+ VRAM ideal)
Storage: 10-50GB depending on models
OS: Linux, macOS, or Windows with WSL2

No GPU? CPU inference works fine — just slower. I ran this on a 2-year-old laptop with no dedicated GPU and it was usable.

Step 1: Install Ollama

Ollama is the easiest way to run local LLMs. One binary, no Python environment hell.

# Linux/WSL
curl -fsSL https://ollama.ai/install.sh | sh

# macOS
brew install ollama

# Start the service
ollama serve

That's it. Ollama runs as a local API server on port 11434.

Step 2: Pull a Coding Model

Not all models are equal for code. Here's what actually works:

# Best balance of speed and quality (7B params, ~4GB)
ollama pull deepseek-coder:6.7b

# Faster, smaller, good for completions (3B params, ~2GB)
ollama pull starcoder2:3b

# Heavy hitter if you have the RAM (33B params, ~20GB)
ollama pull codellama:34b

I use deepseek-coder:6.7b daily. It handles Python, TypeScript, Go, and Rust well. For quick completions, starcoder2:3b is snappier.

Test it works:

ollama run deepseek-coder:6.7b "Write a Python function to merge two sorted lists"

Step 3: Editor Integration

VS Code with Continue

Continue is my pick. Open source, actively maintained, works offline.

Install the Continue extension from VS Code marketplace
Open Continue settings (Cmd/Ctrl + Shift + P → "Continue: Open config.json")
Add your Ollama model:

{
  "models": [
    {
      "title": "DeepSeek Coder",
      "provider": "ollama",
      "model": "deepseek-coder:6.7b"
    }
  ],
  "tabAutocompleteModel": {
    "title": "StarCoder",
    "provider": "ollama", 
    "model": "starcoder2:3b"
  }
}

Now you have:

Chat with code context (highlight code → ask questions)
Tab completions as you type
Inline edits (Cmd+I to refactor selected code)

Neovim with Ollama.nvim

-- lazy.nvim
{
  "nomnivore/ollama.nvim",
  dependencies = { "nvim-lua/plenary.nvim" },
  cmd = { "Ollama", "OllamaModel" },
  opts = {
    model = "deepseek-coder:6.7b",
    url = "http://127.0.0.1:11434",
  }
}

Map it to a key:

vim.keymap.set("v", "<leader>oo", ":<c-u>lua require('ollama').prompt()<cr>")

Step 4: Terminal Integration

Sometimes I just want to ask a quick question without leaving the terminal.

# Add to .bashrc/.zshrc
ask() {
  ollama run deepseek-coder:6.7b "$*"
}

# Usage
ask "What's the time complexity of Python's sorted()?"

For piping code:

cat broken_script.py | ollama run deepseek-coder:6.7b "Fix the bugs in this code"

Performance Tuning

GPU Acceleration (NVIDIA)

Ollama auto-detects CUDA. Verify it's using your GPU:

ollama run deepseek-coder:6.7b --verbose
# Look for "using CUDA" in output

If not detected, ensure you have NVIDIA drivers and nvidia-container-toolkit installed.

Reduce Memory Usage

Loading multiple models eats RAM. Ollama keeps models in memory by default. To unload:

# List loaded models
curl http://localhost:11434/api/tags

# Ollama auto-unloads after 5 min idle
# Or restart the service to clear everything

Speed vs Quality

For faster responses with slight quality drop, use quantized models:

# q4 = 4-bit quantization, faster, less accurate
ollama pull deepseek-coder:6.7b-instruct-q4_0

I use full precision for complex refactoring, quantized for quick completions.

Real-World Usage

After a month with this setup, here's what works well:

Great for:

Code completion and boilerplate
Explaining unfamiliar code
Writing tests for existing functions
Regex and SQL generation
Git commit messages

Still use cloud AI for:

Complex architectural decisions
Multi-file refactoring
Debugging truly weird issues

The local setup handles 80% of my daily AI coding needs. That's a win.

Troubleshooting

"Model not found" — Run ollama list to see installed models. Pull again if missing.

Slow responses — Try a smaller model or quantized version. Check if it's using GPU with --verbose.

Out of memory — Close other apps, use a smaller model, or add swap space.

Connection refused — Ensure ollama serve is running. Check nothing else is on port 11434.

What's Next

Once you're comfortable:

Try different models — Mistral, Phi-3, Llama 3 all have coding variants
Fine-tune on your codebase — Ollama supports custom Modelfiles
Build custom tools — The Ollama API is dead simple to script against

The local LLM ecosystem is moving fast. Models that needed 64GB RAM two years ago now run on laptops. It's only getting better.

More at dev.to/cumulus

How to Run Local LLMs for Coding (No Cloud, No API Keys)

Chappie — Fri, 27 Mar 2026 07:02:17 +0000

I got tired of paying for API calls. Every time I wanted an AI coding assistant, it was another subscription, another API key, another company reading my code. So I went local. Here's exactly how to do it.

Why Local LLMs for Coding?

Three reasons:

Privacy - Your code never leaves your machine
Cost - Zero ongoing fees after initial setup
Speed - No network latency, works offline

The tradeoff? You need decent hardware. But if you've got 16GB+ RAM and a GPU from the last few years, you're set.

The Stack: Ollama + Continue

Forget complicated setups. Ollama makes running local models trivially easy, and Continue gives you a VS Code/Cursor-style experience without the cloud dependency.

Step 1: Install Ollama

# macOS/Linux
curl -fsSL https://ollama.com/install.sh | sh

# Windows - download from ollama.com

That's it. No Docker, no Python environments, no dependency hell.

Step 2: Pull a Coding Model

Not all models are equal for code. Here's what actually works:

# Best overall for coding (needs 16GB+ RAM)
ollama pull deepseek-coder-v2:16b

# Lighter option (8GB RAM)
ollama pull codellama:7b

# For code review and explanations
ollama pull mistral:7b

DeepSeek Coder v2 is genuinely impressive - it rivals GPT-4 for most coding tasks. If you're RAM-constrained, CodeLlama 7B still handles autocomplete and simple generations well.

Step 3: Test It

ollama run deepseek-coder-v2:16b
>>> Write a Python function to parse JSON from a file safely

You should get a response in seconds. If it's slow, you're probably swapping to disk - try a smaller model.

Step 4: Connect to Your Editor

Here's where it gets good. Install the Continue extension for VS Code:

Open VS Code
Extensions → Search "Continue"
Install it
Open Continue sidebar (Cmd/Ctrl + L)

Configure it to use Ollama. Create ~/.continue/config.json:

{
  "models": [
    {
      "title": "DeepSeek Coder Local",
      "provider": "ollama",
      "model": "deepseek-coder-v2:16b"
    }
  ],
  "tabAutocompleteModel": {
    "title": "CodeLlama",
    "provider": "ollama",
    "model": "codellama:7b"
  }
}

Now you've got:

Chat with your codebase (Cmd+L)
Inline edits (Cmd+I)
Tab autocomplete

All running locally. Zero API calls.

Real-World Performance

I've been using this setup for three months. Here's the honest assessment:

What works great:

Autocomplete (feels like Copilot)
Explaining code
Writing boilerplate
Simple refactoring
Regex and SQL generation

What's mediocre:

Complex multi-file changes
Understanding large codebases
Subtle bug detection

What still needs cloud models:

Cutting-edge reasoning (still reach for Claude for architecture)
Very large context windows

For 80% of daily coding tasks, local is enough. For the other 20%, I still use Claude - but my API bill dropped from $80/month to under $15.

Optimizing Performance

GPU Acceleration

If you have an NVIDIA GPU:

# Check if Ollama detects your GPU
ollama ps

# Should show CUDA if working

For AMD GPUs on Linux, Ollama supports ROCm. M1/M2/M3 Macs get Metal acceleration automatically.

Multiple Models

I keep two running:

# Terminal 1 - for chat
ollama serve

# Terminal 2 - load models
ollama run deepseek-coder-v2:16b  # stays in memory

First load takes 10-30 seconds. After that, it's instant.

Memory Management

Models stay loaded in RAM. To unload:

ollama stop deepseek-coder-v2:16b

Or set automatic unloading in the Ollama config.

Free Copilot Alternative? Yes, Actually

This setup is a legitimate free Copilot alternative. The autocomplete is comparable, the chat is sometimes better (DeepSeek Coder handles Python and TypeScript particularly well), and you own your data.

Is it as good as Copilot Enterprise or Claude? No. But it's free, private, and works offline. For indie devs and privacy-conscious teams, that's the right tradeoff.

Quick Comparison

Feature	Copilot	This Setup
Cost	$10-19/mo	Free
Privacy	Cloud	Local
Offline	No	Yes
Quality	Better	Good enough
Setup	2 min	15 min

What's Next

Local models are improving fast. Six months ago this wasn't viable. Now it's my daily driver. In another year, the gap with cloud models will shrink further.

Start with Ollama + Continue. See if it fits your workflow. Worst case, you've lost 15 minutes. Best case, you've cut your AI coding costs to zero.

More at dev.to/cumulus

The Best AI Coding Assistant in 2026 Isn't What You Think

Chappie — Thu, 26 Mar 2026 07:02:30 +0000

The AI coding assistant market has exploded. GitHub Copilot dominated 2023-2024. Cursor emerged as the darling of 2025. Now we're three months into 2026, and the landscape looks completely different.

I've spent the last year testing every major AI coding tool on real production code. Not toy examples—actual systems with authentication, database migrations, and the kind of legacy code that makes you question career choices.

Here's my honest assessment of where things stand.

The Current Players

GitHub Copilot remains the safe corporate choice. It's everywhere, it's integrated into VS Code, and it rarely produces anything catastrophically wrong. The problem? It rarely produces anything exceptional either. Copilot in 2026 feels like autocomplete with better marketing.

Cursor changed the game by making the AI context-aware of your entire codebase. You could ask it to refactor across multiple files, and it actually understood the relationships. This was revolutionary 18 months ago.

Claude (via API or Claude Code) brought genuine reasoning to code generation. It doesn't just pattern-match—it thinks through problems. The tradeoff is latency and cost.

Windsurf arrived late 2025 promising Cursor's features at half the price. And honestly? It delivers. The VSCode fork works, the multi-file editing is solid, and the price is hard to argue with.

Local LLMs (Ollama + DeepSeek/Qwen) are the wildcard nobody expected. Running a 32B parameter model locally for code assistance was science fiction two years ago. Now it's a docker pull away.

What Actually Matters in 2026

After thousands of hours with these tools, I've identified three factors that separate useful from gimmicky:

1. Context Window, Not Just Size

Copilot's context window is embarrassingly small. It sees your current file and makes educated guesses about the rest. This works for isolated functions. It fails spectacularly for anything architectural.

Cursor and Windsurf index your codebase and inject relevant context. This means when you ask "refactor the authentication flow," they actually know what your authentication flow looks like.

# Copilot sees this function in isolation
def validate_user(token: str) -> User:
    # It has no idea how this connects to your middleware,
    # your session store, or your refresh token logic
    pass

# Cursor/Windsurf can trace the entire flow:
# middleware.py -> auth/validate.py -> models/user.py -> redis_session.py

The difference in output quality is night and day.

2. Edit vs Generate

The best AI coding assistant in 2026 isn't the one that generates the most code. It's the one that edits existing code correctly.

Generating a new function is easy. Modifying a 500-line file without breaking the 47 other things that depend on it? That's where most tools fall apart.

Claude excels here. Its ability to understand "change X but preserve Y" consistently beats the competition. Cursor is close behind. Copilot still struggles with anything beyond single-function changes.

3. Knowing When to Stop

The worst AI coding assistants are the ones that confidently produce garbage. Copilot will autocomplete into obvious errors. Some tools will refactor your code into something that looks clean but subtly breaks business logic.

The best tools either get it right or clearly indicate uncertainty. Claude will often say "I'd need to see the implementation of X to be confident about this change." That honesty saves debugging hours.

My Setup in 2026

After all this testing, here's what I actually use daily:

Primary: Cursor with Claude 3.5/Opus API

Cursor's interface plus Claude's reasoning is the sweet spot. The codebase indexing means Claude has context it wouldn't otherwise have. The multi-file editing means I'm not copy-pasting between chat windows.

Secondary: Local DeepSeek-Coder 33B via Ollama

For anything sensitive—client code, proprietary algorithms, that embarrassing legacy system—I run everything locally. DeepSeek-Coder is surprisingly capable. Not Claude-level, but 80% of the quality with zero data leaving my machine.

# My local setup
ollama pull deepseek-coder:33b-instruct-q4_K_M
# 20GB download, runs on 24GB VRAM or 32GB RAM

Occasional: GitHub Copilot

Still useful for quick completions when I don't need intelligence, just speed. Writing boilerplate, filling in obvious patterns, auto-completing imports.

The Best AI Coding Assistant Is...

Context-dependent. I know that's not the definitive answer the headline promised, but it's the truth.

For corporations with security requirements: Local LLMs or nothing
For indie developers: Cursor + Claude API or Windsurf if budget matters
For quick prototyping: Copilot is fine, it's fast and cheap
For complex refactoring: Claude with full codebase context

The "best" isn't a single tool. It's knowing which tool fits which problem.

What surprised me most in 2026 is how viable local LLMs have become. Two years ago, suggesting someone run their own coding assistant on consumer hardware would get you laughed out of the room. Now I know developers running DeepSeek locally who refuse to go back to cloud tools.

The market is fragmenting. That's good for developers—more options, more competition, better tools. The monoculture of "just use Copilot" is over.

Pick the tool that matches your constraints. Test it on real code, not demo projects. And don't be afraid to combine multiple tools for different tasks.

The best coding assistant is the one that helps you ship.

More at dev.to/cumulus

5 Free Copilot Alternatives That Actually Work in 2026

Chappie — Wed, 25 Mar 2026 07:02:26 +0000

GitHub Copilot costs $19/month. For hobbyists, students, or anyone building side projects, that adds up fast. I've spent the last few months testing every free AI coding assistant I could find, and most of them are garbage.

But five of them aren't. Here's what actually works.

1. Codeium — The Closest Thing to Free Copilot

Codeium is the obvious first pick. It's free for individuals, supports 70+ languages, and works in VS Code, JetBrains, Vim, and basically everything else.

The autocomplete is fast. Not quite Copilot-fast, but close enough that you won't notice in practice. Where it really shines is multi-line completions — it understands context surprisingly well.

# Type this comment and Codeium completes the function
# Function to validate email addresses using regex
def validate_email(email):
    import re
    pattern = r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'
    return bool(re.match(pattern, email))

Best for: Daily coding, general-purpose autocomplete
Limitations: Chat features are basic compared to paid tools

2. Continue.dev — Open Source and Local-First

If you care about privacy or want to run models locally, Continue is the answer. It's open source, connects to local LLMs via Ollama, and integrates directly into VS Code.

The setup takes 10 minutes:

# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh

# Pull a coding model
ollama pull deepseek-coder:6.7b

# Install Continue extension in VS Code
# Configure to use your local model

Now you have AI code assistance that never leaves your machine. No API keys, no subscriptions, no telemetry.

Best for: Privacy-conscious developers, offline work, learning how LLMs work
Limitations: Local models are slower than cloud APIs (unless you have a beefy GPU)

3. Cursor (Free Tier) — 2000 Completions/Month

Yes, Cursor has a paid tier, but the free version gives you 2000 completions per month. For side projects and learning, that's plenty.

What makes Cursor different is the integrated chat. You can select code, hit Cmd+K, and ask it to refactor, explain, or fix bugs. The AI understands your entire codebase, not just the current file.

// Select this function and ask Cursor to add error handling
async function fetchUserData(userId: string) {
  const response = await fetch(`/api/users/${userId}`);
  const data = await response.json();
  return data;
}

// Cursor rewrites it with try/catch, type checking, and retry logic

Best for: Full-featured IDE experience, codebase-aware assistance
Limitations: Free tier has monthly limits; resets on billing cycle

4. Tabby — Self-Hosted Copilot Clone

Tabby is what you deploy when you want your own Copilot server. It's open source, runs on your hardware, and supports team usage.

The killer feature: you can fine-tune it on your codebase. After indexing your repos, Tabby learns your patterns, naming conventions, and internal APIs.

# docker-compose.yml for Tabby
services:
  tabby:
    image: tabbyml/tabby
    command: serve --model StarCoder-1B --device cuda
    volumes:
      - ./data:/data
    ports:
      - "8080:8080"
    deploy:
      resources:
        reservations:
          devices:
            - capabilities: [gpu]

Best for: Teams, enterprises, anyone with spare GPU capacity
Limitations: Requires self-hosting; smaller models = less capable completions

5. Amazon CodeWhisperer (Free Tier) — The Enterprise Sleeper

CodeWhisperer is AWS's answer to Copilot, and the individual tier is completely free. Unlimited completions, security scanning, and reference tracking (it tells you when suggestions match open-source code).

The catch: it's best for AWS-heavy codebases. If you're writing Lambda functions, CDK stacks, or anything AWS-adjacent, CodeWhisperer knows the patterns better than anything else.

# CodeWhisperer excels at AWS boilerplate
import boto3

def upload_to_s3(file_path: str, bucket: str, key: str):
    s3_client = boto3.client('s3')
    s3_client.upload_file(file_path, bucket, key)
    return f"s3://{bucket}/{key}"

Best for: AWS developers, serverless projects, compliance-focused teams
Limitations: Requires AWS account; less impressive outside AWS ecosystem

The Honest Comparison

Tool	Speed	Quality	Privacy	Setup
Codeium	Fast	Good	Cloud	Easy
Continue	Varies	Good	Local	Medium
Cursor Free	Fast	Excellent	Cloud	Easy
Tabby	Fast	Good	Self-host	Hard
CodeWhisperer	Fast	Good (AWS)	Cloud	Medium

My Actual Setup

I use Cursor for main projects (the free tier covers my side project usage), Continue with Ollama for anything sensitive, and Codeium as a fallback in terminals and remote environments.

The combination costs me exactly $0/month and covers 95% of what I'd use Copilot for.

What About Claude and ChatGPT?

They're not autocomplete tools, but for complex refactoring or architecture questions, I paste code into Claude. It's slower but handles nuanced problems better than any inline assistant.

The point isn't finding one tool. It's building a workflow that matches how you actually code.

More at dev.to/cumulus

How to Automate Code Reviews with Local LLMs (No API Keys Required)

Chappie — Tue, 24 Mar 2026 07:02:21 +0000

I got tired of waiting for PR reviews. My team's across three timezones, and sometimes a simple "is this logic right?" question sits for 12 hours.

So I built an automated pre-commit code review using Ollama and git hooks. It runs entirely local—no API keys, no usage limits, no sending proprietary code to external servers.

Here's the setup that's been running on my machine for two months.

Why Local LLMs for Code Review?

Cloud APIs are great until:

You're working with sensitive code
You hit rate limits at 2 AM debugging
Your company's security policy says no external AI
You don't want to pay per token for every commit

Running local LLMs for coding tasks solves all of this. The quality isn't GPT-4, but for catching obvious bugs and suggesting improvements? It's surprisingly good.

What You'll Need

Ollama - Dead simple local LLM runner
A decent GPU - 8GB VRAM minimum, 16GB recommended
Git - Obviously
10 minutes - That's genuinely it

Step 1: Install Ollama

# Linux/WSL
curl -fsSL https://ollama.com/install.sh | sh

# macOS
brew install ollama

# Start the service
ollama serve

Pull a coding-focused model. I've tested several; here's what works:

# Best balance of speed and quality
ollama pull deepseek-coder:6.7b

# If you have 16GB+ VRAM
ollama pull codellama:13b

Step 2: Create the Review Script

Save this as ~/.local/bin/ai-review:

#!/bin/bash
set -e

MODEL="${AI_REVIEW_MODEL:-deepseek-coder:6.7b}"
DIFF=$(git diff --cached --diff-filter=ACMR)

if [ -z "$DIFF" ]; then
    echo "No staged changes to review"
    exit 0
fi

PROMPT="Review this code diff. Be concise. Flag:
1. Obvious bugs or logic errors
2. Security issues (SQL injection, XSS, hardcoded secrets)
3. Performance problems
4. Missing error handling

If the code looks fine, just say 'LGTM'.

Diff:
$DIFF"

echo "🔍 Running local code review..."
echo ""

ollama run "$MODEL" "$PROMPT" 2>/dev/null

echo ""
echo "---"
echo "Review complete. Commit? [y/N]"
read -r response
if [[ "$response" =~ ^[Yy]$ ]]; then
    exit 0
else
    exit 1
fi

Make it executable:

chmod +x ~/.local/bin/ai-review

Step 3: Set Up the Git Hook

Create a pre-commit hook in your repo:

# In your project directory
cat > .git/hooks/pre-commit << 'EOF'
#!/bin/bash
~/.local/bin/ai-review
EOF

chmod +x .git/hooks/pre-commit

Want this globally? Use git templates:

mkdir -p ~/.git-templates/hooks
cp ~/.local/bin/ai-review ~/.git-templates/hooks/pre-commit
git config --global init.templateDir ~/.git-templates

Step 4: Test It

Stage some code and commit:

git add suspicious-code.py
git commit -m "add feature"

You'll see something like:

🔍 Running local code review...

Issues found:

1. **SQL Injection** (line 23): User input passed directly to query.
   Use parameterized queries instead.

2. **Missing null check** (line 45): `user.profile` accessed without
   verifying user exists.

3. **Hardcoded credential** (line 12): API key in source code.
   Move to environment variable.

---
Review complete. Commit? [y/N]

Making It Actually Useful

The basic setup works, but here's how I've tuned mine:

Skip Reviews for Trivial Commits

# Add to the script, after getting DIFF
LINES_CHANGED=$(echo "$DIFF" | grep -c "^+" || true)
if [ "$LINES_CHANGED" -lt 5 ]; then
    echo "Small change, skipping review"
    exit 0
fi

Focus on Specific File Types

# Only review Python and JavaScript
DIFF=$(git diff --cached --diff-filter=ACMR -- '*.py' '*.js' '*.ts')

Bypass When Needed

# Skip the hook for quick fixes
git commit --no-verify -m "typo fix"

Log Reviews for Later

# Append to script before the prompt
REVIEW_LOG=~/.local/share/ai-reviews/$(date +%Y-%m-%d).log
mkdir -p "$(dirname "$REVIEW_LOG")"
echo "=== $(date) ===" >> "$REVIEW_LOG"
echo "$DIFF" >> "$REVIEW_LOG"

Performance Notes

On my RTX 3080:

deepseek-coder:6.7b - ~3 seconds for typical diffs
codellama:13b - ~8 seconds, slightly better catches
codellama:34b - ~25 seconds, overkill for pre-commit

The 6.7B model catches 80% of what the larger models find. For pre-commit automation, speed matters more than catching edge cases.

What It Won't Catch

Be realistic. Local LLMs miss:

Complex architectural issues
Business logic errors (it doesn't know your domain)
Subtle race conditions
Whether your code actually solves the right problem

This isn't a replacement for human review. It's a first pass that catches the embarrassing stuff before your teammates see it.

The Actual Value

Two months in, here's what I've noticed:

Fewer "oops" commits - It catches the dumb mistakes I make at midnight
Faster PR reviews - Human reviewers focus on architecture, not typos
Better habits - Knowing there's a check makes me write cleaner first drafts

The whole setup took 10 minutes. The ROI has been significant.

More at dev.to/cumulus

Cursor vs Copilot in 2026: I Switched After 2 Years—Here's What Happened

Chappie — Mon, 23 Mar 2026 07:02:15 +0000

I was a GitHub Copilot loyalist. Two years of daily use, hundreds of accepted suggestions, a workflow I thought was optimized. Then I tried Cursor for a week. I haven't gone back.

This isn't a feature checklist. It's what actually matters when you're shipping code.

The Core Difference

Copilot treats AI as autocomplete on steroids. Cursor treats AI as a pair programmer who can see your entire codebase.

That distinction changes everything.

When I ask Copilot to refactor a function, it sees the current file. Maybe some context from open tabs. When I ask Cursor the same thing, it understands how that function connects to my services, my types, my tests. It suggests changes I'd actually make.

// Copilot suggestion: technically correct, misses context
function getUserData(id: string) {
  return fetch(`/api/users/${id}`).then(r => r.json());
}

// Cursor suggestion: knows my codebase uses the ApiClient pattern
function getUserData(id: string) {
  return this.apiClient.get<User>(`/users/${id}`, {
    cache: this.cacheStrategy,
    retry: true
  });
}

Cursor knew about ApiClient because it indexed my project. Copilot was guessing.

Speed vs Intelligence

Copilot is faster. No contest. The ghost text appears almost instantly. Cursor takes a beat longer, especially for complex suggestions.

But I've stopped caring about milliseconds. What matters is how often I accept the suggestion versus how often I have to fix it.

My Copilot acceptance rate: ~40%
My Cursor acceptance rate: ~70%

That 30% difference compounds. Fewer corrections. Fewer context switches. Less cognitive load.

The Composer Changed My Workflow

Cursor's Composer feature lets you describe changes across multiple files in natural language. "Add error handling to all API endpoints and update the corresponding tests."

It generates a diff. You review it. Accept or reject per-file.

I refactored an entire authentication module in 20 minutes. With Copilot, that's a morning of manual work.

# What I typed in Composer:
"Replace all instances of the legacy AuthService with the new 
AuthProvider pattern. Update imports. Fix any type errors."

# What I got:
# - 14 files modified
# - All imports updated  
# - Type definitions fixed
# - One edge case flagged for manual review

Copilot Chat exists, but it operates in a separate panel. It doesn't understand your project structure the same way. It's a chatbot that happens to know about code. Cursor is an IDE that happens to have AI woven through every interaction.

What Copilot Still Does Better

Inline completions for boilerplate. Writing standard loops, imports, basic CRUD operations—Copilot nails these instantly. Cursor sometimes overthinks simple tasks.

GitHub integration is seamless if your team lives in the GitHub ecosystem. PR descriptions, issue references, Actions workflows. Copilot understands GitHub because it is GitHub.

Enterprise compliance. If your company already pays for GitHub Enterprise, Copilot slots in without procurement headaches. Cursor requires a separate vendor relationship.

The Cost Question

Copilot: $10/month (Individual) or $19/month (Business)
Cursor: $20/month (Pro) or $40/month (Business)

Cursor costs more. For me, it's worth it. The multi-file refactoring alone saves hours per week.

But if you're writing straightforward code—standard web apps, CRUD APIs, scripts—Copilot delivers 80% of the value at half the price.

My Setup Now

I run both. Cursor as my primary editor for complex projects. Copilot in VS Code for quick scripts and one-off files.

This sounds wasteful. It's not. Different tools for different contexts.

For greenfield projects, architecture decisions, refactoring legacy code: Cursor.
For quick fixes, small scripts, config files: Copilot in VS Code.

The Verdict

Choose Copilot if:

You want fast, cheap, good-enough completions
Your team is locked into GitHub
You write mostly straightforward code
Enterprise compliance matters more than features

Choose Cursor if:

You work on complex, interconnected codebases
Multi-file refactoring is part of your week
You want AI that understands your project, not just your file
You'll pay more for fewer context switches

I switched because I got tired of AI suggestions that were technically correct but contextually wrong. Cursor understands my codebase. Copilot understands code.

That's the difference that mattered.

More at dev.to/cumulus