Forem

# llm

Posts

đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.
Harness Engineering in Practice: Building a 6-Agent System That Runs Itself

Harness Engineering in Practice: Building a 6-Agent System That Runs Itself

5
Comments
8 min read
Your LLM budget alerts won't save you if you can't map costs to users

Your LLM budget alerts won't save you if you can't map costs to users

Comments
1 min read
I repurposed Karpathy's LLM Wiki for product discovery. It worked surprisingly well.

I repurposed Karpathy's LLM Wiki for product discovery. It worked surprisingly well.

Comments
2 min read
Why I Built a Self-Hosted Alternative to Helicone (and What I Learned)

Why I Built a Self-Hosted Alternative to Helicone (and What I Learned)

Comments
5 min read
Your Documentation Has Two Audiences Now (And One Is an AI)

Your Documentation Has Two Audiences Now (And One Is an AI)

5
Comments
4 min read
Connecting LLMs to the Real World: A Deep Dive into OpenClaw and Nexconn APIs

Connecting LLMs to the Real World: A Deep Dive into OpenClaw and Nexconn APIs

Comments
6 min read
How to Stop Drowning in Open Model Releases and Actually Run One Locally
Cover image for How to Stop Drowning in Open Model Releases and Actually Run One Locally

How to Stop Drowning in Open Model Releases and Actually Run One Locally

1
Comments
5 min read
I Raised a “Lobster” Assistant: It Burned Tokens, Not Electricity

I Raised a “Lobster” Assistant: It Burned Tokens, Not Electricity

5
Comments
7 min read
Same GPT, Different ROI: Why Many AI Failures Are Not Model Failures

Same GPT, Different ROI: Why Many AI Failures Are Not Model Failures

Comments
3 min read
Stop Wasting Days on RAG Setup: How uv + pyseekdb Cut Your Development Time by 90%

Stop Wasting Days on RAG Setup: How uv + pyseekdb Cut Your Development Time by 90%

5
Comments
5 min read
I Fixed My LLM OOM Crashes by Shrinking the Draft Model (Speculative Decoding on Real Hardware)
Cover image for I Fixed My LLM OOM Crashes by Shrinking the Draft Model (Speculative Decoding on Real Hardware)

I Fixed My LLM OOM Crashes by Shrinking the Draft Model (Speculative Decoding on Real Hardware)

Comments
3 min read
Claude Code's Prompt Cache TTL Dropped From 1h to 5m
Cover image for Claude Code's Prompt Cache TTL Dropped From 1h to 5m

Claude Code's Prompt Cache TTL Dropped From 1h to 5m

Comments
6 min read
DeepSeek V4 Pro and Flash Hit Open Source. Should You Self-Host Now?
Cover image for DeepSeek V4 Pro and Flash Hit Open Source. Should You Self-Host Now?

DeepSeek V4 Pro and Flash Hit Open Source. Should You Self-Host Now?

Comments
7 min read
Why Your AI Character Keeps Breaking Under Pressure (And What I Built Instead of Yet Another System Prompt)

Why Your AI Character Keeps Breaking Under Pressure (And What I Built Instead of Yet Another System Prompt)

5
Comments
8 min read
Google's TurboQuant: 6x KV Cache Compression Without Retraining
Cover image for Google's TurboQuant: 6x KV Cache Compression Without Retraining

Google's TurboQuant: 6x KV Cache Compression Without Retraining

Comments
8 min read
đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.