Skip to content

Forem

# evals

👋 Sign in for the ability to sort posts by relevant, latest, or top.

David Aronchick

May 5

The Loop Is Only as Good as the Metric

#ai #evals #machinelearning #data

7 min read

aasawari sahasrabuddhe

Apr 23

Why Most AI Teams Are Flying Blind: And What to Do About It

#ai #evals #genai #womenintech

13 min read

Apr 22

Wait, you guys run evals?

#ai #evals #llm

1 min read

Cover image for Evaluate LLM code generation with LLM-as-judge evaluators

Scarlett Attensil for LaunchDarkly

Mar 26

Evaluate LLM code generation with LLM-as-judge evaluators

#ai #evals #llm #agents

12 min read

Cover image for From zero evals to a working multimodal evaluation in 30 minutes using LangWatch Skills

Manouk Draisma for LangWatch

Mar 24

From zero evals to a working multimodal evaluation in 30 minutes using LangWatch Skills

#ai #agents #evals #claudecode

7 min read

Cover image for Your coding agent already knows how to test your AI agent (we just turned it into a Skill)

Mar 23

Your coding agent already knows how to test your AI agent (we just turned it into a Skill)

#agents #agentskills #evals #simulations

4 min read

Mar 30

Build an eval harness for 184 AI agent prompts with promptfoo

#promptfoo #evals #aiagents #llm

8 min read

Cover image for Self-improving Coding Agents

Mar 27

Self-improving Coding Agents

#agents #harness #ai #evals

5 min read

👋 Sign in for the ability to sort posts by relevant, latest, or top.