Forem: Kenneth Mckrola

Easily benchmark all your app's endpoints at once

Kenneth Mckrola — Wed, 29 Apr 2026 21:09:11 +0000

Most "load tests" in real codebases are a curl pasted into a Slack thread. Someone runs it before a release, eyeballs the latency, and we ship. There's nothing version-controlled, nothing repeatable, and the next person to touch the service has no idea which endpoints are actually fast paths.

benchmarkr is a powerful and easy-to-use CLI and MCP tool that fixes that part of the workflow specifically. The thing I want to talk about in this post is the piece that makes it click: a YAML config that lives in your repo and describes every endpoint you care about, the same way a package.json describes your dependencies.

The config

First, install benchmarkr cli if you haven't:

brew tap mack-overflow/tap
brew install benchmarkr

for Homebrew, or for Debian

echo "deb [trusted=yes] https://apt.fury.io/mack-overflow/ /" \
  | sudo tee /etc/apt/sources.list.d/benchmarkr.list
sudo apt update
sudo apt install benchmarkr

(More installation guides available here)

Next, run benchmarkr endpoints init in your project root and you get a benchmarkr.yaml you can commit:

version: 1

endpoints:
  - name: list-users
    method: GET
    url: ${API_BASE:-http://localhost:8080}/users
    headers:
      Authorization: Bearer ${API_TOKEN}
    defaults:
      concurrency: 10
      duration_seconds: 30

  - name: search-users
    method: GET
    url: ${API_BASE}/users/search
    params:
      q: "test"
      limit: "50"
    defaults:
      concurrency: 5
      duration_seconds: 15

  - name: create-order
    method: POST
    url: ${API_BASE}/orders
    headers:
      Authorization: Bearer ${API_TOKEN}
      Content-Type: application/json
    body:
      sku: "ABC-123"
      quantity: 1
    defaults:
      concurrency: 2
      duration_seconds: 10

A few things to notice:

Env var substitution. ${API_BASE} and ${API_BASE:-default} work the way they do in shell. A sibling .env file is auto-loaded but never overrides what's already in the environment, so the same file works on a laptop, in CI, and in staging.
Defaults travel with the endpoint. create-order runs at concurrency 2 for 10 seconds because that's what makes sense for a write path. list-users runs at concurrency 10. You set this once in the file you already review.
Discovery walks up from CWD. Run the CLI from any subdirectory and it finds the file, like git does.

Running one endpoint

benchmarkr run -e list-users

That's it. Saved defaults apply. Any flag you pass on the command line wins; headers and params are merged. So when you're poking at production specifically, you can do:

benchmarkr run -e list-users \
  --header "X-Trace: debug-2026-04-28" \
  --concurrency 50

…without editing the committed file.

Running all of them

This is where the YAML pays for itself. Because every endpoint is named and self-describing, you can hand the entire file to the CLI in one shot:

benchmarkr run --all

That walks every endpoint in benchmarkr.yaml in succession, applying each endpoint's saved defaults (concurrency, duration, headers, body — the whole config). Between runs you get a [i/N] <name> header so it's obvious where you are; live p50/p95/p99 streams in for the active endpoint and a final summary prints when it finishes. --all is mutually exclusive with --url and --endpoint, and any flags you do pass (e.g. --store, --json, --rate-limit) apply to every run in the sweep.

For CI, this collapses the workflow step to one line:

# .github/workflows/perf.yml
- name: Benchmark every endpoint
  env:
    API_BASE: https://api.staging.example.com
    API_TOKEN: ${{ secrets.STAGING_API_TOKEN }}
    BENCH_CLOUD_TOKEN: ${{ secrets.BENCHMARKR_TOKEN }}
  run: benchmarkr run --all --store --json > perf-results.json

--json with --all emits an array — one entry per endpoint, with the same result shape as a single run — so you can pipe it straight into a regression check or upload it as a CI artifact:

[
  {
    "name": "list-users",
    "stop_reason": "completed",
    "duration": "30.001s",
    "stored": true,
    "result": { "requests": 12483, "p50_ms": 4, "p95_ms": 12, "p99_ms": 23, "errors_total": 0 }
  },
  {
    "name": "search-users",
    "stop_reason": "completed",
    "duration": "15.002s",
    "stored": true,
    "result": { "requests": 4127, "p50_ms": 18, "p95_ms": 47, "p99_ms": 92, "errors_total": 0 }
  },
  {
    "name": "create-order",
    "stop_reason": "completed",
    "duration": "10.001s",
    "stored": true,
    "result": { "requests": 312, "p50_ms": 41, "p95_ms": 88, "p99_ms": 121, "errors_total": 0 }
  }
]

You're not maintaining a separate list of "endpoints to benchmark" in your CI workflow and a list in your config. There's one list. Add a new endpoint to benchmarkr.yaml in the same PR that adds the route, and the next CI run picks it up automatically — no workflow edits, no shell loop to babysit.

Round-tripping with the cloud dashboard

The CLI gives you fast feedback. The dashboard gives you the long view — historical p95 charts, regression detection across versions, the kind of thing that's painful to wire up yourself.

The newest piece is import/export, so the YAML in your repo and the endpoints in the dashboard stay in sync without anyone having to maintain both:

Export from the dashboard. Open any endpoint and click Export for YAML or JSON. Or click Export all in the endpoints nav to dump every endpoint to one file you can drop into a fresh repo.
Import to the dashboard. Click Import, pick a benchmarkr.yaml, and endpoints upsert by (user, name). If the config changed, a new version is recorded — so you get a history of how each endpoint's load shape evolved.

A workflow I've been using:

Define endpoints in benchmarkr.yaml, commit them.
CI runs the loop above on every PR with --store and the cloud token, persisting results to the dashboard.
Open the endpoint in the dashboard to see the trend line for that endpoint across the last N PRs.
If somebody adds an endpoint via the dashboard UI for ad-hoc poking, Export → drop the file into the repo → it's now part of the CI matrix.

A note on the cloud dashboard

The cloud platform is currently in closed beta. We're planning to open it up to the public on a per-token basis in spring 2026 — if you'd like access at launch, you can join the waitlist.

The CLI itself is open source and works without the cloud — benchmarkr run, the YAML config, and even local result persistence don't require an account or a token. The dashboard, history charts, version pinning, and import/export are the parts gated behind beta access for now.

Why this is worth doing

The shift that matters isn't "run benchmarks in CI" — plenty of tools do that. It's having a single, reviewable file that says here are this service's endpoints and how we expect them to behave under load, sitting next to the code in the same PR.

Once that file exists:

New endpoints get a perf budget at the same moment they get a route handler.
Reviewers can see in the diff that a new write path is being benchmarked at concurrency 2, not 100, and push back if that's wrong.
CI gets a free regression signal across every endpoint, not just the one someone remembered to add to a script.
The dashboard gives you the historical view without anyone manually re-entering endpoints.

The repo already describes your API. This is just letting it benchmark itself.

benchmarkr is open source — brew install mack-overflow/tap/benchmarkr or grab it from benchmarkr. Cloud dashboard beta access opens publicly per-token in spring 2026.

Benchmarkr - cURL, built for concurrency, MCP, and real performance benchmarking

Kenneth Mckrola — Wed, 22 Apr 2026 06:48:19 +0000

Link to benchmarkr homepage
What if cURL let you easily run concurrent requests and benchmark your
endpoints? Where you had an executable that exports tools for your coding
agents (Claude, Cursor) via MCP, streams live performance updates, and
auto-exports benchmark runs to JSON or SQL?

Development speed is increasing rapidly. Testing and benchmarking are
becoming even more crucial aspects of development for testing AI-generated
and AI-assisted code. Benchmarkr lets you easily orchestrate performance
testing on your API endpoints — whether you're catching regressions, sanity-
checking a refactor, or letting an agent validate the code it just wrote

Why another HTTP tool?

There's no shortage of load-test tools — hey, ab, wrk, bombardier,
k6. They're great at what they do, but they live in a pre-agent world:

No live feedback — you wait 30 seconds, then read a wall of text.
No structured persistence — you pipe to a file, grep, repeat.
No agent integration — your coding agent can't call them without shelling out and parsing free-form output.

Benchmarkr is a single Go binary that does three things well with little out of the box configuration required:

Runs concurrent HTTP benchmarks with live metrics in the terminal.
Stores runs as JSON files, Postgres, or MySQL — configured once, reused forever.
Ships an MCP server so Claude Code, Cursor, and any other MCP-compatible agent can benchmark your endpoints by name.

Github Repo
Docs

Install

# macOS / Linux
brew tap mack-overflow/tap
brew install benchmarkr

# Debian / Ubuntu
echo "deb [trusted=yes] https://apt.fury.io/mack-overflow/ /" \
  | sudo tee /etc/apt/sources.list.d/benchmarkr.list
sudo apt update && sudo apt install benchmarkr

# RHEL / Fedora
sudo tee /etc/yum.repos.d/benchmarkr.repo <<EOF
[benchmarkr]
name=Benchmarkr
baseurl=https://yum.fury.io/mack-overflow/
enabled=1
gpgcheck=0
EOF
sudo yum install benchmarkr

Or grab a binary from the releases page if you'd
rather skip a package manager.

Your first benchmark

The smallest useful command:

benchmarkr run --url https://api.example.com/health

That fires a single worker for 10 seconds at a GET endpoint. You'll see requests, errors, and P50/P95 update live, then a final summary with throughput, latency percentiles, status-code breakdown, response sizes, and cache hit/miss counts.

Add concurrency and duration:

benchmarkr run \
  --url https://api.example.com/users \
  --concurrency 50 \
  --duration 30

POST with headers and a body:

benchmarkr run \
  --url https://api.example.com/users \
  --method POST \
  --header "Authorization: Bearer tok_xxx" \
  --header "Content-Type: application/json" \
  --body '{"name":"test"}'

Rate-limit so you don't accidentally DDoS staging:

benchmarkr run \
  --url https://api.example.com/search \
  --concurrency 5 \
  --duration 20 \
  --rate-limit 100    # max 100 req/s

Bypass the CDN cache to measure origin latency:

benchmarkr run \
  --url https://cdn.example.com/asset.js \
  --cache-mode bypass \
  --duration 10

The MCP server — letting agents benchmark for you

This is the part I'm most excited about. Install the MCP companion binary:

brew install mack-overflow/tap/benchmarkr-mcp
Then wire it into your agent. For Claude Code, drop a .mcp.json in
your project root:

{
  "mcpServers": {
    "benchmarkr": {
      "command": "benchmarkr-mcp"
    }
  }
}

And a short CLAUDE.md so the agent prefers it over reinventing the wheel:

# Benchmarking

Use the benchmarkr MCP tools (run_benchmark, get_benchmark_status,
stop_benchmark, list_endpoints) for all API benchmarking tasks.
Do not install or use external tools like hey, ab, or bombardier.

Cursor is the same config under ~/.cursor/mcp.json plus a .cursorrules.

Now you can say things like:

"Benchmark the /api/users endpoint at 20 concurrent workers for 30
seconds and tell me if the P95 is above 200ms."

…and the agent calls the run_benchmark tool directly, reads the
structured result, and answers in the context of your actual code. No more
"let me write a shell script for you" dance.

Where this is heading

A few things on the roadmap I'm looking for feedback on:

compare_endpoints — run two URLs side-by-side and diff the metrics.
regression_test — assert against a previous run's P95/P99 and return a pass/fail the agent can reason about.
Scenario files — YAML-defined multi-step flows (login → fetch → post) instead of single-URL runs.