Forem

# llm

Posts

👋 Sign in for the ability to sort posts by relevant, latest, or top.
Local AI in 2026: Ollama Benchmarks, $0 Inference, and the End of Per-Token Pricing
Cover image for Local AI in 2026: Ollama Benchmarks, $0 Inference, and the End of Per-Token Pricing

Local AI in 2026: Ollama Benchmarks, $0 Inference, and the End of Per-Token Pricing

1
Comments
6 min read
Google's Gemma 4 Explained: The Open-Source Agent Toolkit We've Been Waiting For
Cover image for Google's Gemma 4 Explained: The Open-Source Agent Toolkit We've Been Waiting For

Google's Gemma 4 Explained: The Open-Source Agent Toolkit We've Been Waiting For

Comments 2
3 min read
The 12 approaches I tested before finding one that works
Cover image for The 12 approaches I tested before finding one that works

The 12 approaches I tested before finding one that works

Comments
5 min read
I Benchmarked 5 File Editing Strategies for AI Coding Agents. Here's What Actually Works.

I Benchmarked 5 File Editing Strategies for AI Coding Agents. Here's What Actually Works.

2
Comments 3
2 min read
How I Built Persistent Memory for Claude Code

Uses all six Claude Code hooks

How I Built Persistent Memory for Claude Code

9
Comments 32
9 min read
RAG in the Wild: What I Learned After Two Weeks of Chunking Experiments

RAG in the Wild: What I Learned After Two Weeks of Chunking Experiments

Comments 2
7 min read
How to Reduce OpenAI Bill Without Hurting Quality: A Practical Audit Framework
Cover image for How to Reduce OpenAI Bill Without Hurting Quality: A Practical Audit Framework

How to Reduce OpenAI Bill Without Hurting Quality: A Practical Audit Framework

6
Comments 3
6 min read
Running 1M-token context on a single GPU (the math)
Cover image for Running 1M-token context on a single GPU (the math)

Running 1M-token context on a single GPU (the math)

Comments
2 min read
I Read a Paper That Genuinely Made Me Stop and Think — AI is Now Jailbreaking Other AI
Cover image for I Read a Paper That Genuinely Made Me Stop and Think — AI is Now Jailbreaking Other AI

I Read a Paper That Genuinely Made Me Stop and Think — AI is Now Jailbreaking Other AI

Comments
3 min read
One line of Python to extend your LLM's context window 10x
Cover image for One line of Python to extend your LLM's context window 10x

One line of Python to extend your LLM's context window 10x

Comments
1 min read
KV cache memory calculator: how much does your LLM actually use?
Cover image for KV cache memory calculator: how much does your LLM actually use?

KV cache memory calculator: how much does your LLM actually use?

Comments
3 min read
Build Your Own AI-Powered Knowledge Base with LLMs and Obsidian

Build Your Own AI-Powered Knowledge Base with LLMs and Obsidian

3
Comments
6 min read
How Much GPU Memory Does NexusQuant Actually Save?
Cover image for How Much GPU Memory Does NexusQuant Actually Save?

How Much GPU Memory Does NexusQuant Actually Save?

Comments
4 min read
What I Learned Testing 12 Compression Approaches That Failed
Cover image for What I Learned Testing 12 Compression Approaches That Failed

What I Learned Testing 12 Compression Approaches That Failed

Comments
6 min read
The Math Behind E8 Lattice Quantization (with Code)
Cover image for The Math Behind E8 Lattice Quantization (with Code)

The Math Behind E8 Lattice Quantization (with Code)

Comments
6 min read
👋 Sign in for the ability to sort posts by relevant, latest, or top.