Forem: Tobrun Van Nuland

I Benchmarked How Claude Code Consumes APIs. MCP Won and It Wasn't Close.

Tobrun Van Nuland — Fri, 27 Feb 2026 20:16:16 +0000

There's been a lot of noise lately in the community about MCPs being overhyped. They take too much context, they can be replaced with a spec, CLIs are more effective, etc. But all of those claims didn't come with any proof, so I decided to measure it.

I used a benchmark harness that runs an AI coding agent against the same API task six different ways, captures every tool call through hooks, classifies each one, and compares the results. I ran it against two completely different APIs, 36 total runs, and the data tells a clear story.

The Setup

The task is simple. For the first API: convert a dataset to another representation and return the result. For the second: generate a large PNG and save it to disk. Same task, six different interfaces:

no-context — zero guidance, just the task
openapi-spec — the full OpenAPI YAML spec
openapi-mcp — the API exposed as an MCP tool via FastMCP
generated-python — a hand-crafted Python client library
vibe-cli — a minimal argparse CLI wrapping the API
pypi-sdk — told to use the official SDK from PyPI

Each scenario runs the agent in headless mode with --max-turns 10. Agent hooks capture every tool call as JSONL telemetry: what tool was used, what the input was, whether it succeeded, and a regex classifier tags each call by interface type and error category. Three iterations per scenario, per API. No cherry-picking.

The Numbers

Conversion API

Scenario	Success	Avg Turns	Avg Cost	vs MCP
openapi-mcp	3/3	2.0	$0.03	1.0x
vibe-cli	3/3	3.0	$0.06	1.9x
pypi-sdk	3/3	4.0	$0.07	2.4x
generated-python	3/3	4.3	$0.11	3.7x
no-context	3/3	6.3	$0.12	4.0x
openapi-spec	2/3	8.3	$0.16	5.6x

Image API

Scenario	Success	Avg Turns	Avg Cost	vs MCP
openapi-mcp	3/3	2.0	$0.03	1.0x
no-context	3/3	2.0	$0.04	1.3x
vibe-cli	3/3	4.0	$0.07	2.2x
openapi-spec	3/3	3.7	$0.07	2.3x
generated-python	3/3	6.3	$0.14	4.7x
pypi-sdk	2/3	9.7	$0.21	7.1x

MCP wins both benchmarks. 100% success rate, 2 turns every time, perfectly deterministic across all iterations. Everything else is 2x to 7x more expensive.

Why MCP Wins

Looking at the raw telemetry makes it obvious. Here's what happens when an agent tries to call an HTTP API with no context:

Bash  curl -s "https://api.example.com/v1/resource?q=1600%20..."
Bash  echo "Token set: ${API_TOKEN:+yes}"
Bash  curl -s "https://api.example.com/v1/resource?q=1600+..."
Bash  curl -s --get "https://api.example.com/v1/resource..."
Bash  TOKEN="$API_TOKEN" && curl -sv "https://api.example.com/..."
Bash  printenv API_TOKEN | wc -c
Bash  printenv API_TOKEN | cat -A | head -1
Bash  TOKEN=$(printenv API_TOKEN) && curl -s "https://api..."

Eight tool calls. The agent is building URLs by hand, fighting shell expansion of the access token, trying different encoding schemes for spaces and commas, debugging why the token isn't being passed correctly. It gets there eventually, but it burns turns figuring out the plumbing.

Here's MCP:

mcp__conversion_api__convert_dataset  input=dataset.json format=csv

One call. Done. The agent doesn't construct URLs, doesn't handle auth, doesn't encode parameters, doesn't parse response formats. It calls a typed function with structured arguments and gets structured data back.

MCP eliminates every source of friction: URL construction, authentication handling, parameter encoding, API version discovery, response parsing. The agent goes straight from intent to result.

But MCP Isn't the Whole Story

Here's where I want to push back against both sides of the debate. Yes, MCP dominates in a clean greenfield setup. But clean greenfield isn't where most work happens.

CLIs compose. A CLI like convert-dataset --input dataset.json pipes naturally into other tools. An agent can chain commands or redirect output to a file. MCP tools return structured data into the conversation context. That data has to go somewhere, and when you're chaining multiple operations, it starts bloating the context window. The vibe-cli scenario consistently came in second place because the agent reads the script once, runs it, and the output stays in the terminal where it belongs.

CLIs evolve with your project. This is the angle that matters most. When you're actively developing, your CLI is a living artifact. You add a flag, the agent discovers it, uses it. The feedback loop is immediate. An MCP server is more of a fixed contract; you define the tool interface upfront and the agent consumes it as-is. That rigidity is a feature in production but a constraint during development.

The "no-context" result for the Image API is telling. The Image API is simple: a single URL with a path parameter, and the agent nailed it in 2 turns with zero guidance. For simple APIs, MCP doesn't add much because the agent's built-in knowledge is already sufficient. The value of MCP scales with API complexity.

OpenAPI specs can hurt more than they help. This surprised me. Giving the agent a full OpenAPI YAML for the Conversion API actually produced the worst results of any scenario: 2/3 success rate, 5.6x the cost of MCP. The agent spent turns reading the spec, then still struggled with the same curl/token issues. The spec added information without reducing ambiguity.

What I'd Actually Recommend

After running 36 experiments and staring at the telemetry, my mental model is this:

Use MCP for stable, well-defined APIs. If you have an API that doesn't change often and you want deterministic, minimal-cost agent interactions, wrap it in MCP.

Use CLIs for APIs you're actively building. If the interface is still evolving, a CLI gives you a faster iteration loop.

Don't bother with generated client libraries for agent consumption. The generated-python scenario was consistently one of the most expensive.

Don't give agents raw OpenAPI specs for complex APIs. Either wrap the API in MCP (which encodes the spec into a typed tool) or write a CLI (which encodes it into flags).

Deploying vLLM on your Linux Server

Tobrun Van Nuland — Mon, 16 Feb 2026 04:43:32 +0000

🚀 Deploying vLLM on Your Linux Server

Running vLLM as a persistent, reliable background service is one of the best ways to expose a fast local LLM API on your Linux machine.

This guide walks through:

Installing dependencies
Creating a virtual environment
Setting up a systemd service
Running vLLM from a fixed directory (/home/nurbot/ws/models)
Checking logs and debugging
Enabling auto-start on boot

🧰 1. Install System Dependencies

sudo apt-get update
sudo apt-get install -y python3-pip python3-venv docker.io

Docker is optional but useful if you want containerized workflows.

🎮 2. Verify NVIDIA GPU Support (Optional but Recommended)

Check whether the machine has working NVIDIA drivers:

nvidia-smi

If the command is missing, install drivers before running GPU-backed vLLM.

🐍 3. Create the vLLM Virtual Environment

We place it in /opt/vllm-env:

sudo python3 -m venv /opt/vllm-env
sudo chown -R $USER:$USER /opt/vllm-env
source /opt/vllm-env/bin/activate

Install vLLM + OpenAI API compatibility:

pip install vllm openai

📁 4. Configure where vLLM Runs From

We want vLLM to run from:

/home/nurbot/ws/models

This directory will contain the start_vllm.sh script.

Ensure the start script is executable:

chmod +x /home/nurbot/ws/models/infrastructure/scripts/start_vllm.sh

🧩 5. Create the Systemd Service

Create the service file:

sudo nano /etc/systemd/system/vllm.service

Paste:

[Unit]
Description=vLLM Inference Server
After=network.target

[Service]
Type=simple
User=nurbot
WorkingDirectory=/home/nurbot/ws/models
ExecStart=/home/nurbot/ws/models/infrastructure/scripts/start_vllm.sh
Restart=always
Environment=MODEL_NAME=facebook/opt-125m

[Install]
WantedBy=multi-user.target

Then reload systemd:

sudo systemctl daemon-reload

▶️ 6. Starting, Stopping, and Enabling the Service

Start vLLM:

sudo systemctl start vllm

Check its status:

systemctl status vllm

Enable auto-start on boot:

sudo systemctl enable vllm

📡 7. Checking Logs

To see the real-time logs from vLLM:

journalctl -u vllm -f

To see historical logs:

journalctl -u vllm

To see recent errors:

journalctl -u vllm -xe

🛠 8. Troubleshooting

Service says “failed”

Run:

systemctl status vllm
journalctl -u vllm -xe

Common issues:

Wrong ExecStart path
Missing execute permission
Python crash inside vLLM
GPU not available / out of memory

🎯 Conclusion

You now have a fully functional vLLM OpenAI-compatible server running as a background service on Linux. It's stable, auto-starts on reboot, logs to systemd, and uses a clean virtual environment with GPU acceleration.

Building Interactive Programs inside Claude Code

Tobrun Van Nuland — Sat, 14 Feb 2026 06:45:44 +0000

This is something I've been discovering as I go, and I thought it was worth sharing more broadly. The pattern is simple but surprisingly powerful: build a CLI that Claude can reason about, and let it decide how to invoke it based on your natural language prompt.

I stumbled into this while building an Android QA agent, you describe a test scenario in natural language and Claude executes it on a device but the patterns I found apply far beyond mobile testing. They're general-purpose building blocks for making any CLI tool feel like an intelligent, interactive program.

The Pattern: Claude as Your CLI's User

The idea is to build a simple CLI and put Claude in front of it. Your CLI doesn't need to be smart. It just needs to accept flags and do its job. The intelligence lives in a skill, a markdown file that tells Claude how to map natural language to CLI invocations.

In my case, the CLI wraps adb and records commands. Claude uses it like a human would, except it reads a skill file first to decide which flags to pass. The user never thinks about flags. They just describe what they want.

Prompt-Driven Feature Activation

This is where it gets interesting. Instead of exposing flags to the user, you teach Claude to detect intent from the prompt and activate features automatically.

The skill file is just a markdown document with simple rules:

Check the user's prompt for any of these keywords (case-insensitive): "track performance", "frame rate", "fps", "rendering".
If any keyword matches, add --perf to the command.

That's the entire mechanism. Claude reads the skill, scans the user's prompt, and adjusts the CLI invocation. The user says "measure performance while scrolling through the list" and the right flags get passed: no documentation to read, no syntax to remember.

You can stack these. In my project, saying "track performance and enable tracing" activates two independent features from a single sentence. Each feature has its own keyword list in the skill file, and Claude composes them naturally.

The underlying CLI stays simple: it accepts --perf and --trace flags, writes the config to a lock file, and the teardown script reads that lock file to know what to capture. The skill layer is what turns this mechanical flag-passing into something that feels conversational.

Human-in-the-Loop Decisions

Claude Code's `AskUserQuestionTool lets you build programs that pause for user input when they hit a genuine ambiguity and continue autonomously when there's nothing to ask.

For example: my tool needs to know which Android device to target. If one device is connected, it just picks it. If there are multiple, it shows a dropdown and asks. This is a pattern you can apply anywhere: selecting a deploy target, choosing a database, picking a branch. The tool stays autonomous by default but defers to the user exactly when it should.

Session Control: Skills Start It, Hooks Guarantee the Stop

A useful pattern for any tool that needs setup and teardown: use a skill to start the process, and a Claude Code hook to guarantee cleanup.

The skill tells Claude to call a start script before doing any work. This script creates a lock file that tracks the session state. When Claude finishes, it calls a stop script that reads the lock file, does the teardown, and cleans up.

But what if the user hits Ctrl+C, or Claude forgets? A Stop hook in .claude/settings.json catches that.

The lock file does double duty: it's a mutex preventing overlapping sessions, and a state store telling the stop script what to clean up. If Claude already stopped gracefully, the lock file is gone and the hook is a no-op. This pattern works for anything with lifecycle management — recording sessions, server processes, temporary resources.

Building the Tool from Within

Here's the part that still surprises me. I build this tool from the same Claude Code session I use to run it. Claude is smart enough to distinguish between "run this test on the device" and "add a new feature to the tool."

I haven't manually created any of the skill files in this project. They've all been generated by Claude as a byproduct of iterating on the CLI. You describe a behavior, Claude implements the script, then writes the skill that teaches itself how to use it. It's a self-reinforcing cycle.

The Takeaway

Technically, everything I've described maps to existing Claude Code features: skills, hooks, AskUserQuestionTool. But the way you arrive at them matters. You don't design a skill spec upfront. You build a CLI interactively, discover the interaction patterns through use, and let the skills emerge.

The recipe:

Build a simple CLI that accepts flags and does one thing well
Write a skill that maps natural language keywords to those flags
Use AskUserQuestion for genuine ambiguities that need human input
Add a hook for lifecycle guarantees (cleanup, finalization)
Iterate from within, let Claude build the next feature while you use the current one

If you're looking for an idea, think about a manual process that could benefit from automation. Pulling data from JIRA, running a deployment checklist, performing QA on a mobile device, auditing accessibility,.. anything where you follow a series of steps that a CLI could drive.

Build the CLI first, keep it simple. Then let Claude use it. You'll be surprised how quickly the skills emerge from real usage, and how naturally the tool evolves when your primary user can reason about what it does.

The project I built with this approach is open source at github.com/tobrun/android-qa-agent.

Mobile Push Notifications With Opencode

Tobrun Van Nuland — Sat, 24 Jan 2026 06:53:55 +0000

Lately, I’ve been very deliberately splitting my work into two distinct modes.

The first is a more curated, quality-driven workflow where I use coding agents with line-by-line review. This is the mode I rely on when correctness and maintainability matter most, and it’s where I primarily work with Claude Code.

The second mode is closer to vibe-coding: experimenting with more speculative ideas, exploring “crazy” concepts, and building small proof-of-concepts quickly. For this, I run local LLMs on my own server and connect via SSH to execute Opencode directly on the machine.

By leveraging https://github.com/code-yeongyu/oh-my-opencode
, I can run long-running plans autonomously. The downside, however, is that I often don’t notice when an agent has finished executing. To solve this, I put together a minimal setup that sends me a push notification whenever an Opencode coding agent becomes idle.

So I ended up building opencode-notify.

What opencode-notify Does

opencode-notify is a tiny OpenCode plugin that sends push notifications to your phone using Pushover when a session finishes.

Installation

1. Set up Pushover

Install Pushover on your phone
Create an account and note your User Key
Create an application and note the API Token

2. Install the plugin

mkdir -p ~/.config/opencode/plugins
curl -o ~/.config/opencode/plugins/opencode-notify.js \
  https://raw.githubusercontent.com/tobrun/opencode-notify/main/index.js

3. Set environment variables

Add to your shell profile:

export PUSHOVER_APP_TOKEN="your-app-token"
export PUSHOVER_USER_KEY="your-user-key"

4. Restart OpenCode

Done. The plugin loads automatically.

Configuration

Variable	Required	Description
`PUSHOVER_APP_TOKEN`	Yes	Pushover application token
`PUSHOVER_USER_KEY`	Yes	Pushover user key
`OPENCODE_NOTIFY`	No	Set to `0` to disable

Repo: https://github.com/tobrun/opencode-notify

Stop Typing ssh user@ip

Tobrun Van Nuland — Fri, 23 Jan 2026 18:05:12 +0000

I'n facepalming that I'm only learning this now but my whole life I've been connecting to other computers with:

ssh admin@192.168.1.2

Over. And over. And over again.

If that’s you also: no judgment, but we can do better.

This is one of those once you know it, you can’t unsee it.

The right solution: `~/.ssh/config`

Instead of aliases, scripts, or shell hacks, let SSH itself do the work.

Create (or edit) your SSH config:

code ~/.ssh/config

Add this:

Host server
    HostName 192.168.1.2
    User admin

From now on:

ssh server

That’s it.

No usernames. No IPs. No mental overhead.

Bonus: auto‑attach tmux on login

If you live in tmux (and if you don’t… you probably will), this is gold:

Host server
    HostName 192.168.1.2
    User admin
    RequestTTY yes
    RemoteCommand tmux attach || tmux new

This instantly drops you into your session. Disconnect your computer, reconnect later — state intact.

This is especially nice for long‑running agents or builds.

Configure Local LLM with OpenCode

Tobrun Van Nuland — Fri, 16 Jan 2026 18:49:34 +0000

Add any OpenAI compatible endpoint to OpenCode

OpenCode doesn’t currently expose a simple “bring your own endpoint” option in its UI. Instead, it ships with a predefined list of cloud providers.

OpenCode fully supports OpenAI-compatible APIs, which means you can plug in any compatible endpoint: including vLLM, LM Studio, Ollama (with a proxy), or your own custom server.

This post shows how to wire up a local vLLM server as a provider, but the same approach works for any OpenAI-compatible endpoint.

Prerequisites

OpenCode installed and working
A running OpenAI-compatible endpoint (for example: a local vLLM server on http://<host>:8000/v1)

vLLM exposes a /v1 API that matches OpenAI’s Chat Completions API, which makes it an ideal drop-in backend.

Step 1 – Register the provider in OpenCode auth

OpenCode stores provider authentication details in:

~/.local/share/opencode/auth.json

If the file does not exist yet, create it.

Add the following entry:

{
  "vllm": {
    "type": "api",
    "key": "sk-local"
  }
}

Notes

vLLM does not require an API key, but OpenCode expects one to exist.
Any placeholder value works (sk-local is a common convention).
If auth.json already exists, merge the vllm block into the existing JSON.

Step 2 – Define the OpenAI-compatible provider

Now define the provider itself in:

~/.config/opencode/opencode.json

Create the file if it doesn’t exist.

{
  "$schema": "https://opencode.ai/config.json",
  "provider": {
    "vllm": {
      "npm": "@ai-sdk/openai-compatible",
      "name": "vLLM (local)",
      "options": {
        "baseURL": "http://100.108.174.26:8000/v1"
      },
      "models": {
        "Qwen3-Coder-30B-A3B-Instruct": {
          "name": "My vLLM model"
        }
      }
    }
  },
  "model": "vllm/Qwen3-Coder-30B-A3B-Instruct",
  "small_model": "vllm/Qwen3-Coder-30B-A3B-Instruct"
}

Key fields explained

@ai-sdk/openai-compatible Tells OpenCode to treat this provider as OpenAI-compatible.
baseURL Must point to the /v1 endpoint of your server.
models The key must exactly match the model ID exposed by the backend.
model / small_model Sets the default model used by OpenCode.

Selecting your model

After these steps, restart OpenCode if it’s running.

You can now use:

/model

Your custom provider and model will appear in the selection list.