Forem

# evaluation

Posts

👋 Sign in for the ability to sort posts by relevant, latest, or top.
How to build a self-improving agent that updates your UI in real time
Cover image for How to build a self-improving agent that updates your UI in real time

How to build a self-improving agent that updates your UI in real time

11
Comments
9 min read
Evaluating AI Agents: Performance, Reliability, and Real-World Impact

Evaluating AI Agents: Performance, Reliability, and Real-World Impact

Comments
4 min read
Debiasing LLM Judges: Understanding and correcting AI Evaluation Bias

Debiasing LLM Judges: Understanding and correcting AI Evaluation Bias

Comments
5 min read
Case Study: How Junie Uses TeamCity to Evaluate Coding Agents
Cover image for Case Study: How Junie Uses TeamCity to Evaluate Coding Agents

Case Study: How Junie Uses TeamCity to Evaluate Coding Agents

Comments
5 min read
Evaluation Metrics for Summarization

Evaluation Metrics for Summarization

Comments
6 min read
Retrieval Metrics Demystified: From BM25 Baselines to EM@5 & Answer F1
Cover image for Retrieval Metrics Demystified: From BM25 Baselines to EM@5 & Answer F1

Retrieval Metrics Demystified: From BM25 Baselines to EM@5 & Answer F1

Comments
4 min read
Top Open Source Tools for LLM Observability in 2025
Cover image for Top Open Source Tools for LLM Observability in 2025

Top Open Source Tools for LLM Observability in 2025

2
Comments
11 min read
Code evaluation as a debugging tool
Cover image for Code evaluation as a debugging tool

Code evaluation as a debugging tool

4
Comments
3 min read
Evaluation for Regression Models in Machine Learning

Evaluation for Regression Models in Machine Learning

3
Comments
6 min read
A Checklist to Quickly Evaluate SaaS Security
Cover image for A Checklist to Quickly Evaluate SaaS Security

A Checklist to Quickly Evaluate SaaS Security

3
Comments
4 min read
The Web Design Process

The Web Design Process

8
Comments 1
3 min read
Destructuring Tweets - Episode 10 - Short && Circuit && Evaluation
Cover image for Destructuring Tweets - Episode 10 - Short && Circuit && Evaluation

Destructuring Tweets - Episode 10 - Short && Circuit && Evaluation

7
Comments
2 min read
What I learned from running a user study

What I learned from running a user study

6
Comments
7 min read
Object detection Part 5 - Evaluation and Tensorboard [Tensorflow]

Object detection Part 5 - Evaluation and Tensorboard [Tensorflow]

10
Comments
1 min read
loading...