Forem

# llm

Posts

đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.
I built an LLM eval rig in a weekend. Most of it was wrong.
Cover image for I built an LLM eval rig in a weekend. Most of it was wrong.

I built an LLM eval rig in a weekend. Most of it was wrong.

1
Comments
4 min read
How to Detect Prompt Injection in Your LLM Agent — Python, 5 Minutes

How to Detect Prompt Injection in Your LLM Agent — Python, 5 Minutes

Comments
5 min read
Deep Dive into Open Agent SDK (Part 6): Multi-LLM Providers and Runtime Controls

Deep Dive into Open Agent SDK (Part 6): Multi-LLM Providers and Runtime Controls

Comments
13 min read
Harness Engineering with Nothing but Markdown
Cover image for Harness Engineering with Nothing but Markdown

Harness Engineering with Nothing but Markdown

Comments
10 min read
GPT-5 vs Claude Sonnet 4: real per-task cost and benchmark comparison for production workloads
Cover image for GPT-5 vs Claude Sonnet 4: real per-task cost and benchmark comparison for production workloads

GPT-5 vs Claude Sonnet 4: real per-task cost and benchmark comparison for production workloads

Comments
7 min read
Skills for eval-driven agent optimization

Skills for eval-driven agent optimization

1
Comments
1 min read
62.2% on Aider Polyglot from a MacBook Pro. Then the other model we tried scored 4%. Here's what actually happened, with a working cost loop attached.
Cover image for 62.2% on Aider Polyglot from a MacBook Pro. Then the other model we tried scored 4%. Here's what actually happened, with a working cost loop attached.

62.2% on Aider Polyglot from a MacBook Pro. Then the other model we tried scored 4%. Here's what actually happened, with a working cost loop attached.

Comments
16 min read
DeepSeek-V4 Changes the Context Game for Agents — And Your Memory Architecture Should Adapt
Cover image for DeepSeek-V4 Changes the Context Game for Agents — And Your Memory Architecture Should Adapt

DeepSeek-V4 Changes the Context Game for Agents — And Your Memory Architecture Should Adapt

1
Comments 1
3 min read
What If You Compressed Your Prompts Into Chinese Emoji? (A Token-Saving Thought Experiment)
Cover image for What If You Compressed Your Prompts Into Chinese Emoji? (A Token-Saving Thought Experiment)

What If You Compressed Your Prompts Into Chinese Emoji? (A Token-Saving Thought Experiment)

Comments
3 min read
Your RAG Eval Set Is Probably Wrong. The Test That Catches It.
Cover image for Your RAG Eval Set Is Probably Wrong. The Test That Catches It.

Your RAG Eval Set Is Probably Wrong. The Test That Catches It.

Comments
7 min read
GEO / AI Search Thread

GEO / AI Search Thread

Comments
5 min read
Hybrid Search Is the Phrase You'll Hear at Every RAG Talk in 2026
Cover image for Hybrid Search Is the Phrase You'll Hear at Every RAG Talk in 2026

Hybrid Search Is the Phrase You'll Hear at Every RAG Talk in 2026

Comments
7 min read
The JSON-Mode Prompt Pattern That Survives Claude Version Bumps
Cover image for The JSON-Mode Prompt Pattern That Survives Claude Version Bumps

The JSON-Mode Prompt Pattern That Survives Claude Version Bumps

Comments
7 min read
The 3 Alerts Every LLM Team Should Have Set Up by Tomorrow
Cover image for The 3 Alerts Every LLM Team Should Have Set Up by Tomorrow

The 3 Alerts Every LLM Team Should Have Set Up by Tomorrow

Comments
7 min read
Stop Caching the Whole LLM Response. Cache the Embedding.
Cover image for Stop Caching the Whole LLM Response. Cache the Embedding.

Stop Caching the Whole LLM Response. Cache the Embedding.

Comments
8 min read
đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.