<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Tobrun Van Nuland</title>
    <description>The latest articles on Forem by Tobrun Van Nuland (@tobrun).</description>
    <link>https://forem.com/tobrun</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3715059%2F4675b451-a843-4dff-a547-1f2ef72097b5.jpg</url>
      <title>Forem: Tobrun Van Nuland</title>
      <link>https://forem.com/tobrun</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/tobrun"/>
    <language>en</language>
    <item>
      <title>I Benchmarked How Claude Code Consumes APIs. MCP Won and It Wasn't Close.</title>
      <dc:creator>Tobrun Van Nuland</dc:creator>
      <pubDate>Fri, 27 Feb 2026 20:16:16 +0000</pubDate>
      <link>https://forem.com/tobrun/i-benchmarked-how-claude-code-consumes-apis-mcp-won-and-it-wasnt-close-4k1</link>
      <guid>https://forem.com/tobrun/i-benchmarked-how-claude-code-consumes-apis-mcp-won-and-it-wasnt-close-4k1</guid>
      <description>&lt;p&gt;There's been a lot of noise lately in the community about MCPs being overhyped. They take too much context, they can be replaced with a spec, CLIs are more effective, etc. But all of those claims didn't come with any proof, so I decided to measure it.&lt;/p&gt;

&lt;p&gt;I used a benchmark harness that runs an AI coding agent against the same API task six different ways, captures every tool call through hooks, classifies each one, and compares the results. I ran it against two completely different APIs, 36 total runs, and the data tells a clear story.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Setup
&lt;/h2&gt;

&lt;p&gt;The task is simple. For the first API: convert a dataset to another representation and return the result. For the second: generate a large PNG and save it to disk. Same task, six different interfaces:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;no-context&lt;/strong&gt; — zero guidance, just the task&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;openapi-spec&lt;/strong&gt; — the full OpenAPI YAML spec&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;openapi-mcp&lt;/strong&gt; — the API exposed as an MCP tool via FastMCP&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;generated-python&lt;/strong&gt; — a hand-crafted Python client library&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;vibe-cli&lt;/strong&gt; — a minimal argparse CLI wrapping the API&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;pypi-sdk&lt;/strong&gt; — told to use the official SDK from PyPI&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Each scenario runs the agent in headless mode with &lt;code&gt;--max-turns 10&lt;/code&gt;. Agent hooks capture every tool call as JSONL telemetry: what tool was used, what the input was, whether it succeeded, and a regex classifier tags each call by interface type and error category. Three iterations per scenario, per API. No cherry-picking.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Numbers
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Conversion API
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Scenario&lt;/th&gt;
&lt;th&gt;Success&lt;/th&gt;
&lt;th&gt;Avg Turns&lt;/th&gt;
&lt;th&gt;Avg Cost&lt;/th&gt;
&lt;th&gt;vs MCP&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;openapi-mcp&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;3/3&lt;/td&gt;
&lt;td&gt;2.0&lt;/td&gt;
&lt;td&gt;$0.03&lt;/td&gt;
&lt;td&gt;1.0x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;vibe-cli&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;3/3&lt;/td&gt;
&lt;td&gt;3.0&lt;/td&gt;
&lt;td&gt;$0.06&lt;/td&gt;
&lt;td&gt;1.9x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;pypi-sdk&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;3/3&lt;/td&gt;
&lt;td&gt;4.0&lt;/td&gt;
&lt;td&gt;$0.07&lt;/td&gt;
&lt;td&gt;2.4x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;generated-python&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;3/3&lt;/td&gt;
&lt;td&gt;4.3&lt;/td&gt;
&lt;td&gt;$0.11&lt;/td&gt;
&lt;td&gt;3.7x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;no-context&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;3/3&lt;/td&gt;
&lt;td&gt;6.3&lt;/td&gt;
&lt;td&gt;$0.12&lt;/td&gt;
&lt;td&gt;4.0x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;openapi-spec&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;2/3&lt;/td&gt;
&lt;td&gt;8.3&lt;/td&gt;
&lt;td&gt;$0.16&lt;/td&gt;
&lt;td&gt;5.6x&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Image API
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Scenario&lt;/th&gt;
&lt;th&gt;Success&lt;/th&gt;
&lt;th&gt;Avg Turns&lt;/th&gt;
&lt;th&gt;Avg Cost&lt;/th&gt;
&lt;th&gt;vs MCP&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;openapi-mcp&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;3/3&lt;/td&gt;
&lt;td&gt;2.0&lt;/td&gt;
&lt;td&gt;$0.03&lt;/td&gt;
&lt;td&gt;1.0x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;no-context&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;3/3&lt;/td&gt;
&lt;td&gt;2.0&lt;/td&gt;
&lt;td&gt;$0.04&lt;/td&gt;
&lt;td&gt;1.3x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;vibe-cli&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;3/3&lt;/td&gt;
&lt;td&gt;4.0&lt;/td&gt;
&lt;td&gt;$0.07&lt;/td&gt;
&lt;td&gt;2.2x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;openapi-spec&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;3/3&lt;/td&gt;
&lt;td&gt;3.7&lt;/td&gt;
&lt;td&gt;$0.07&lt;/td&gt;
&lt;td&gt;2.3x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;generated-python&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;3/3&lt;/td&gt;
&lt;td&gt;6.3&lt;/td&gt;
&lt;td&gt;$0.14&lt;/td&gt;
&lt;td&gt;4.7x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;pypi-sdk&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;2/3&lt;/td&gt;
&lt;td&gt;9.7&lt;/td&gt;
&lt;td&gt;$0.21&lt;/td&gt;
&lt;td&gt;7.1x&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;MCP wins both benchmarks. 100% success rate, 2 turns every time, perfectly deterministic across all iterations. Everything else is 2x to 7x more expensive.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why MCP Wins
&lt;/h2&gt;

&lt;p&gt;Looking at the raw telemetry makes it obvious. Here's what happens when an agent tries to call an HTTP API with no context:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Bash  curl -s "https://api.example.com/v1/resource?q=1600%20..."
Bash  echo "Token set: ${API_TOKEN:+yes}"
Bash  curl -s "https://api.example.com/v1/resource?q=1600+..."
Bash  curl -s --get "https://api.example.com/v1/resource..."
Bash  TOKEN="$API_TOKEN" &amp;amp;&amp;amp; curl -sv "https://api.example.com/..."
Bash  printenv API_TOKEN | wc -c
Bash  printenv API_TOKEN | cat -A | head -1
Bash  TOKEN=$(printenv API_TOKEN) &amp;amp;&amp;amp; curl -s "https://api..."
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Eight tool calls. The agent is building URLs by hand, fighting shell expansion of the access token, trying different encoding schemes for spaces and commas, debugging why the token isn't being passed correctly. It gets there eventually, but it burns turns figuring out the plumbing.&lt;/p&gt;

&lt;p&gt;Here's MCP:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;mcp__conversion_api__convert_dataset  input=dataset.json format=csv
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;One call. Done. The agent doesn't construct URLs, doesn't handle auth, doesn't encode parameters, doesn't parse response formats. It calls a typed function with structured arguments and gets structured data back.&lt;/p&gt;

&lt;p&gt;MCP eliminates every source of friction: URL construction, authentication handling, parameter encoding, API version discovery, response parsing. The agent goes straight from intent to result.&lt;/p&gt;

&lt;h2&gt;
  
  
  But MCP Isn't the Whole Story
&lt;/h2&gt;

&lt;p&gt;Here's where I want to push back against both sides of the debate. Yes, MCP dominates in a clean greenfield setup. But clean greenfield isn't where most work happens.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;CLIs compose.&lt;/strong&gt; A CLI like &lt;code&gt;convert-dataset --input dataset.json&lt;/code&gt; pipes naturally into other tools. An agent can chain commands or redirect output to a file. MCP tools return structured data into the conversation context. That data has to go somewhere, and when you're chaining multiple operations, it starts bloating the context window. The vibe-cli scenario consistently came in second place because the agent reads the script once, runs it, and the output stays in the terminal where it belongs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;CLIs evolve with your project.&lt;/strong&gt; This is the angle that matters most. When you're actively developing, your CLI is a living artifact. You add a flag, the agent discovers it, uses it. The feedback loop is immediate. An MCP server is more of a fixed contract; you define the tool interface upfront and the agent consumes it as-is. That rigidity is a feature in production but a constraint during development.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The "no-context" result for the Image API is telling.&lt;/strong&gt; The Image API is simple: a single URL with a path parameter, and the agent nailed it in 2 turns with zero guidance. For simple APIs, MCP doesn't add much because the agent's built-in knowledge is already sufficient. The value of MCP scales with API complexity.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;OpenAPI specs can hurt more than they help.&lt;/strong&gt; This surprised me. Giving the agent a full OpenAPI YAML for the Conversion API actually produced the worst results of any scenario: 2/3 success rate, 5.6x the cost of MCP. The agent spent turns reading the spec, then still struggled with the same curl/token issues. The spec added information without reducing ambiguity.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I'd Actually Recommend
&lt;/h2&gt;

&lt;p&gt;After running 36 experiments and staring at the telemetry, my mental model is this:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use MCP for stable, well-defined APIs.&lt;/strong&gt; If you have an API that doesn't change often and you want deterministic, minimal-cost agent interactions, wrap it in MCP.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use CLIs for APIs you're actively building.&lt;/strong&gt; If the interface is still evolving, a CLI gives you a faster iteration loop.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Don't bother with generated client libraries for agent consumption.&lt;/strong&gt; The generated-python scenario was consistently one of the most expensive.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Don't give agents raw OpenAPI specs for complex APIs.&lt;/strong&gt; Either wrap the API in MCP (which encodes the spec into a typed tool) or write a CLI (which encodes it into flags).&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>productivity</category>
      <category>mcp</category>
    </item>
    <item>
      <title>Deploying vLLM on your Linux Server</title>
      <dc:creator>Tobrun Van Nuland</dc:creator>
      <pubDate>Mon, 16 Feb 2026 04:43:32 +0000</pubDate>
      <link>https://forem.com/tobrun/deploying-vllm-on-your-linux-server-4pcf</link>
      <guid>https://forem.com/tobrun/deploying-vllm-on-your-linux-server-4pcf</guid>
      <description>&lt;h1&gt;
  
  
  🚀 Deploying vLLM on Your Linux Server
&lt;/h1&gt;

&lt;p&gt;Running &lt;strong&gt;vLLM&lt;/strong&gt; as a persistent, reliable background service is one of the best ways to expose a fast local LLM API on your Linux machine.&lt;br&gt;&lt;br&gt;
This guide walks through:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Installing dependencies&lt;/li&gt;
&lt;li&gt;Creating a virtual environment&lt;/li&gt;
&lt;li&gt;Setting up a &lt;strong&gt;systemd&lt;/strong&gt; service&lt;/li&gt;
&lt;li&gt;Running vLLM from a fixed directory (&lt;code&gt;/home/nurbot/ws/models&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Checking logs and debugging&lt;/li&gt;
&lt;li&gt;Enabling auto-start on boot&lt;/li&gt;
&lt;/ul&gt;


&lt;h1&gt;
  
  
  🧰 1. Install System Dependencies
&lt;/h1&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;apt-get update
&lt;span class="nb"&gt;sudo &lt;/span&gt;apt-get &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-y&lt;/span&gt; python3-pip python3-venv docker.io
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Docker is optional but useful if you want containerized workflows.&lt;/p&gt;


&lt;h1&gt;
  
  
  🎮 2. Verify NVIDIA GPU Support (Optional but Recommended)
&lt;/h1&gt;

&lt;p&gt;Check whether the machine has working NVIDIA drivers:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;nvidia-smi
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If the command is missing, install drivers before running GPU-backed vLLM.&lt;/p&gt;




&lt;h1&gt;
  
  
  🐍 3. Create the vLLM Virtual Environment
&lt;/h1&gt;

&lt;p&gt;We place it in &lt;code&gt;/opt/vllm-env&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;python3 &lt;span class="nt"&gt;-m&lt;/span&gt; venv /opt/vllm-env
&lt;span class="nb"&gt;sudo chown&lt;/span&gt; &lt;span class="nt"&gt;-R&lt;/span&gt; &lt;span class="nv"&gt;$USER&lt;/span&gt;:&lt;span class="nv"&gt;$USER&lt;/span&gt; /opt/vllm-env
&lt;span class="nb"&gt;source&lt;/span&gt; /opt/vllm-env/bin/activate
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Install vLLM + OpenAI API compatibility:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;vllm openai
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h1&gt;
  
  
  📁 4. Configure where vLLM Runs From
&lt;/h1&gt;

&lt;p&gt;We want vLLM to run from:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/home/nurbot/ws/models
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This directory will contain the &lt;code&gt;start_vllm.sh&lt;/code&gt; script.&lt;/p&gt;

&lt;p&gt;Ensure the start script is executable:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;chmod&lt;/span&gt; +x /home/nurbot/ws/models/infrastructure/scripts/start_vllm.sh
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h1&gt;
  
  
  🧩 5. Create the Systemd Service
&lt;/h1&gt;

&lt;p&gt;Create the service file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;nano /etc/systemd/system/vllm.service
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Paste:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ini"&gt;&lt;code&gt;&lt;span class="nn"&gt;[Unit]&lt;/span&gt;
&lt;span class="py"&gt;Description&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;vLLM Inference Server&lt;/span&gt;
&lt;span class="py"&gt;After&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;network.target&lt;/span&gt;

&lt;span class="nn"&gt;[Service]&lt;/span&gt;
&lt;span class="py"&gt;Type&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;simple&lt;/span&gt;
&lt;span class="py"&gt;User&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;nurbot&lt;/span&gt;
&lt;span class="py"&gt;WorkingDirectory&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;/home/nurbot/ws/models&lt;/span&gt;
&lt;span class="py"&gt;ExecStart&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;/home/nurbot/ws/models/infrastructure/scripts/start_vllm.sh&lt;/span&gt;
&lt;span class="py"&gt;Restart&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;always&lt;/span&gt;
&lt;span class="py"&gt;Environment&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;MODEL_NAME=facebook/opt-125m&lt;/span&gt;

&lt;span class="nn"&gt;[Install]&lt;/span&gt;
&lt;span class="py"&gt;WantedBy&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;multi-user.target&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then reload systemd:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;systemctl daemon-reload
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h1&gt;
  
  
  ▶️ 6. Starting, Stopping, and Enabling the Service
&lt;/h1&gt;

&lt;p&gt;Start vLLM:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;systemctl start vllm
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Check its status:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;systemctl status vllm
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Enable auto-start on boot:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;systemctl &lt;span class="nb"&gt;enable &lt;/span&gt;vllm
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h1&gt;
  
  
  📡 7. Checking Logs
&lt;/h1&gt;

&lt;p&gt;To see the real-time logs from vLLM:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;journalctl &lt;span class="nt"&gt;-u&lt;/span&gt; vllm &lt;span class="nt"&gt;-f&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To see historical logs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;journalctl &lt;span class="nt"&gt;-u&lt;/span&gt; vllm
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To see recent errors:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;journalctl &lt;span class="nt"&gt;-u&lt;/span&gt; vllm &lt;span class="nt"&gt;-xe&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h1&gt;
  
  
  🛠 8. Troubleshooting
&lt;/h1&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Service says “failed”&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;systemctl status vllm
journalctl &lt;span class="nt"&gt;-u&lt;/span&gt; vllm &lt;span class="nt"&gt;-xe&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Common issues:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Wrong &lt;code&gt;ExecStart&lt;/code&gt; path&lt;/li&gt;
&lt;li&gt;Missing execute permission&lt;/li&gt;
&lt;li&gt;Python crash inside vLLM&lt;/li&gt;
&lt;li&gt;GPU not available / out of memory&lt;/li&gt;
&lt;/ul&gt;




&lt;h1&gt;
  
  
  🎯 Conclusion
&lt;/h1&gt;

&lt;p&gt;You now have a fully functional &lt;strong&gt;vLLM OpenAI-compatible server&lt;/strong&gt; running as a background service on Linux. It's stable, auto-starts on reboot, logs to systemd, and uses a clean virtual environment with GPU acceleration.&lt;/p&gt;

</description>
      <category>llm</category>
      <category>ai</category>
      <category>linux</category>
      <category>infrastructure</category>
    </item>
    <item>
      <title>Building Interactive Programs inside Claude Code</title>
      <dc:creator>Tobrun Van Nuland</dc:creator>
      <pubDate>Sat, 14 Feb 2026 06:45:44 +0000</pubDate>
      <link>https://forem.com/tobrun/building-interactive-programs-inside-claude-code-3ca5</link>
      <guid>https://forem.com/tobrun/building-interactive-programs-inside-claude-code-3ca5</guid>
      <description>&lt;p&gt;This is something I've been discovering as I go, and I thought it was worth sharing more broadly. The pattern is simple but surprisingly powerful: build a CLI that Claude can reason about, and let it decide how to invoke it based on your natural language prompt.&lt;/p&gt;

&lt;p&gt;I stumbled into this while building an Android QA agent, you describe a test scenario in natural language and Claude executes it on a device but the patterns I found apply far beyond mobile testing. They're general-purpose building blocks for making any CLI tool feel like an intelligent, interactive program.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Pattern: Claude as Your CLI's User
&lt;/h2&gt;

&lt;p&gt;The idea is to build a simple CLI and put Claude in front of it. Your CLI doesn't need to be smart. It just needs to accept flags and do its job. The intelligence lives in a skill, a markdown file that tells Claude how to map natural language to CLI invocations.&lt;/p&gt;

&lt;p&gt;In my case, the CLI wraps &lt;code&gt;adb&lt;/code&gt; and records commands. Claude uses it like a human would, except it reads a skill file first to decide which flags to pass. The user never thinks about flags. They just describe what they want.&lt;/p&gt;

&lt;h2&gt;
  
  
  Prompt-Driven Feature Activation
&lt;/h2&gt;

&lt;p&gt;This is where it gets interesting. Instead of exposing flags to the user, you teach Claude to detect intent from the prompt and activate features automatically.&lt;/p&gt;

&lt;p&gt;The skill file is just a markdown document with simple rules:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Check the user's prompt for any of these keywords (case-insensitive): "track performance", "frame rate", "fps", "rendering".&lt;br&gt;
If any keyword matches, add &lt;code&gt;--perf&lt;/code&gt; to the command.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fruq7p2xqjadk343p7lbr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fruq7p2xqjadk343p7lbr.png" alt=" " width="800" height="376"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;That's the entire mechanism. Claude reads the skill, scans the user's prompt, and adjusts the CLI invocation. The user says "measure performance while scrolling through the list" and the right flags get passed: no documentation to read, no syntax to remember.&lt;/p&gt;

&lt;p&gt;You can stack these. In my project, saying "track performance and enable tracing" activates two independent features from a single sentence. Each feature has its own keyword list in the skill file, and Claude composes them naturally.&lt;/p&gt;

&lt;p&gt;The underlying CLI stays simple: it accepts &lt;code&gt;--perf&lt;/code&gt; and &lt;code&gt;--trace&lt;/code&gt; flags, writes the config to a lock file, and the teardown script reads that lock file to know what to capture. The skill layer is what turns this mechanical flag-passing into something that feels conversational.&lt;/p&gt;

&lt;h2&gt;
  
  
  Human-in-the-Loop Decisions
&lt;/h2&gt;

&lt;p&gt;Claude Code's `AskUserQuestionTool lets you build programs that pause for user input when they hit a genuine ambiguity and continue autonomously when there's nothing to ask.&lt;/p&gt;

&lt;p&gt;For example: my tool needs to know which Android device to target. If one device is connected, it just picks it. If there are multiple, it shows a dropdown and asks. This is a pattern you can apply anywhere: selecting a deploy target, choosing a database, picking a branch. The tool stays autonomous by default but defers to the user exactly when it should.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8f2ylmhy6t8dsx63liwi.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8f2ylmhy6t8dsx63liwi.png" alt=" " width="800" height="356"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Session Control: Skills Start It, Hooks Guarantee the Stop
&lt;/h2&gt;

&lt;p&gt;A useful pattern for any tool that needs setup and teardown: use a skill to start the process, and a Claude Code hook to guarantee cleanup.&lt;/p&gt;

&lt;p&gt;The skill tells Claude to call a start script before doing any work. This script creates a lock file that tracks the session state. When Claude finishes, it calls a stop script that reads the lock file, does the teardown, and cleans up.&lt;/p&gt;

&lt;p&gt;But what if the user hits Ctrl+C, or Claude forgets? A &lt;code&gt;Stop&lt;/code&gt; hook in &lt;code&gt;.claude/settings.json&lt;/code&gt; catches that.&lt;/p&gt;

&lt;p&gt;The lock file does double duty: it's a mutex preventing overlapping sessions, and a state store telling the stop script what to clean up. If Claude already stopped gracefully, the lock file is gone and the hook is a no-op. This pattern works for anything with lifecycle management — recording sessions, server processes, temporary resources.&lt;/p&gt;

&lt;h2&gt;
  
  
  Building the Tool from Within
&lt;/h2&gt;

&lt;p&gt;Here's the part that still surprises me. I build this tool from the same Claude Code session I use to run it. Claude is smart enough to distinguish between "run this test on the device" and "add a new feature to the tool."&lt;/p&gt;

&lt;p&gt;I haven't manually created any of the skill files in this project. They've all been generated by Claude as a byproduct of iterating on the CLI. You describe a behavior, Claude implements the script, then writes the skill that teaches itself how to use it. It's a self-reinforcing cycle.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Takeaway
&lt;/h2&gt;

&lt;p&gt;Technically, everything I've described maps to existing Claude Code features: skills, hooks, &lt;code&gt;AskUserQuestionTool&lt;/code&gt;. But the way you arrive at them matters. You don't design a skill spec upfront. You build a CLI interactively, discover the interaction patterns through use, and let the skills emerge.&lt;/p&gt;

&lt;p&gt;The recipe:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Build a simple CLI&lt;/strong&gt; that accepts flags and does one thing well&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Write a skill&lt;/strong&gt; that maps natural language keywords to those flags&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use &lt;code&gt;AskUserQuestion&lt;/code&gt;&lt;/strong&gt; for genuine ambiguities that need human input&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Add a hook&lt;/strong&gt; for lifecycle guarantees (cleanup, finalization)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Iterate from within&lt;/strong&gt;, let Claude build the next feature while you use the current one&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If you're looking for an idea, think about a manual process that could benefit from automation. Pulling data from JIRA, running a deployment checklist, performing QA on a mobile device, auditing accessibility,.. anything where you follow a series of steps that a CLI could drive.&lt;/p&gt;

&lt;p&gt;Build the CLI first, keep it simple. Then let Claude use it. You'll be surprised how quickly the skills emerge from real usage, and how naturally the tool evolves when your primary user can reason about what it does.&lt;/p&gt;

&lt;p&gt;The project I built with this approach is open source at github.com/tobrun/android-qa-agent.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>claudecode</category>
      <category>programming</category>
      <category>productivity</category>
    </item>
    <item>
      <title>Mobile Push Notifications With Opencode</title>
      <dc:creator>Tobrun Van Nuland</dc:creator>
      <pubDate>Sat, 24 Jan 2026 06:53:55 +0000</pubDate>
      <link>https://forem.com/tobrun/mobile-push-notifications-with-opencode-1agg</link>
      <guid>https://forem.com/tobrun/mobile-push-notifications-with-opencode-1agg</guid>
      <description>&lt;p&gt;Lately, I’ve been very deliberately splitting my work into two distinct modes.&lt;/p&gt;

&lt;p&gt;The first is a more curated, quality-driven workflow where I use coding agents with line-by-line review. This is the mode I rely on when correctness and maintainability matter most, and it’s where I primarily work with Claude Code.&lt;/p&gt;

&lt;p&gt;The second mode is closer to vibe-coding: experimenting with more speculative ideas, exploring “crazy” concepts, and building small proof-of-concepts quickly. For this, I run local LLMs on my own server and connect via SSH to execute Opencode directly on the machine.&lt;/p&gt;

&lt;p&gt;By leveraging &lt;a href="https://github.com/code-yeongyu/oh-my-opencode" rel="noopener noreferrer"&gt;https://github.com/code-yeongyu/oh-my-opencode&lt;/a&gt;&lt;br&gt;
, I can run long-running plans autonomously. The downside, however, is that I often don’t notice when an agent has finished executing. To solve this, I put together a minimal setup that sends me a push notification whenever an Opencode coding agent becomes idle.&lt;/p&gt;

&lt;p&gt;So I ended up building &lt;strong&gt;opencode-notify&lt;/strong&gt;.&lt;/p&gt;


&lt;h2&gt;
  
  
  What opencode-notify Does
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;opencode-notify&lt;/code&gt; is a tiny OpenCode plugin that sends &lt;strong&gt;push notifications to your phone&lt;/strong&gt; using Pushover when a session finishes.&lt;/p&gt;


&lt;h2&gt;
  
  
  Installation
&lt;/h2&gt;
&lt;h3&gt;
  
  
  1. Set up Pushover
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Install &lt;a href="https://pushover.net/" rel="noopener noreferrer"&gt;Pushover&lt;/a&gt; on your phone&lt;/li&gt;
&lt;li&gt;Create an account and note your &lt;strong&gt;User Key&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://pushover.net/apps/build" rel="noopener noreferrer"&gt;Create an application&lt;/a&gt; and note the &lt;strong&gt;API Token&lt;/strong&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h3&gt;
  
  
  2. Install the plugin
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;mkdir&lt;/span&gt; &lt;span class="nt"&gt;-p&lt;/span&gt; ~/.config/opencode/plugins
curl &lt;span class="nt"&gt;-o&lt;/span&gt; ~/.config/opencode/plugins/opencode-notify.js &lt;span class="se"&gt;\&lt;/span&gt;
  https://raw.githubusercontent.com/tobrun/opencode-notify/main/index.js
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h3&gt;
  
  
  3. Set environment variables
&lt;/h3&gt;

&lt;p&gt;Add to your shell profile:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;PUSHOVER_APP_TOKEN&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"your-app-token"&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;PUSHOVER_USER_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"your-user-key"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  4. Restart OpenCode
&lt;/h3&gt;

&lt;p&gt;Done. The plugin loads automatically.&lt;/p&gt;




&lt;h2&gt;
  
  
  Configuration
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Variable&lt;/th&gt;
&lt;th&gt;Required&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;PUSHOVER_APP_TOKEN&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Pushover application token&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;PUSHOVER_USER_KEY&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;td&gt;Pushover user key&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;OPENCODE_NOTIFY&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Set to &lt;code&gt;0&lt;/code&gt; to disable&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;p&gt;Repo: &lt;a href="https://github.com/tobrun/opencode-notify" rel="noopener noreferrer"&gt;https://github.com/tobrun/opencode-notify&lt;/a&gt;&lt;/p&gt;

</description>
      <category>opencode</category>
      <category>vibecoding</category>
      <category>llm</category>
      <category>productivity</category>
    </item>
    <item>
      <title>Stop Typing ssh user@ip</title>
      <dc:creator>Tobrun Van Nuland</dc:creator>
      <pubDate>Fri, 23 Jan 2026 18:05:12 +0000</pubDate>
      <link>https://forem.com/tobrun/stop-typing-ssh-userip-13lb</link>
      <guid>https://forem.com/tobrun/stop-typing-ssh-userip-13lb</guid>
      <description>&lt;p&gt;I'n facepalming that I'm only learning this now but my whole life I've been connecting to other computers with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ssh admin@192.168.1.2
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Over. And over. And over again.&lt;/p&gt;

&lt;p&gt;If that’s you also: no judgment, but we can do better.&lt;/p&gt;

&lt;p&gt;This is one of those &lt;em&gt;once you know it, you can’t unsee it&lt;/em&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  The right solution: &lt;code&gt;~/.ssh/config&lt;/code&gt;
&lt;/h2&gt;

&lt;p&gt;Instead of aliases, scripts, or shell hacks, let &lt;strong&gt;SSH itself&lt;/strong&gt; do the work.&lt;/p&gt;

&lt;p&gt;Create (or edit) your SSH config:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;code ~/.ssh/config
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Add this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Host server
    HostName 192.168.1.2
    User admin
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;From now on:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ssh server
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That’s it.&lt;/p&gt;

&lt;p&gt;No usernames. No IPs. No mental overhead.&lt;/p&gt;




&lt;h2&gt;
  
  
  Bonus: auto‑attach tmux on login
&lt;/h2&gt;

&lt;p&gt;If you live in tmux (and if you don’t… you probably will), this is gold:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Host server
    HostName 192.168.1.2
    User admin
    RequestTTY yes
    RemoteCommand tmux attach || tmux new
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This instantly drops you into your session. Disconnect your computer, reconnect later — state intact.&lt;/p&gt;

&lt;p&gt;This is &lt;em&gt;especially&lt;/em&gt; nice for long‑running agents or builds.&lt;/p&gt;

</description>
      <category>ssh</category>
      <category>linux</category>
      <category>devtools</category>
      <category>productivity</category>
    </item>
    <item>
      <title>Configure Local LLM with OpenCode</title>
      <dc:creator>Tobrun Van Nuland</dc:creator>
      <pubDate>Fri, 16 Jan 2026 18:49:34 +0000</pubDate>
      <link>https://forem.com/tobrun/configure-local-llm-with-opencode-1gdb</link>
      <guid>https://forem.com/tobrun/configure-local-llm-with-opencode-1gdb</guid>
      <description>&lt;h2&gt;
  
  
  Add any OpenAI compatible endpoint to OpenCode
&lt;/h2&gt;

&lt;p&gt;OpenCode doesn’t currently expose a simple “bring your own endpoint” option in its UI. Instead, it ships with a predefined list of cloud providers.  &lt;/p&gt;

&lt;p&gt;OpenCode fully supports &lt;strong&gt;OpenAI-compatible APIs&lt;/strong&gt;, which means you can plug in &lt;em&gt;any&lt;/em&gt; compatible endpoint: including &lt;strong&gt;vLLM&lt;/strong&gt;, LM Studio, Ollama (with a proxy), or your own custom server.&lt;/p&gt;

&lt;p&gt;This post shows how to wire up a &lt;strong&gt;local vLLM server&lt;/strong&gt; as a provider, but the same approach works for &lt;em&gt;any&lt;/em&gt; OpenAI-compatible endpoint.&lt;/p&gt;




&lt;h2&gt;
  
  
  Prerequisites
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;OpenCode installed and working&lt;/li&gt;
&lt;li&gt;A running OpenAI-compatible endpoint
(for example: a local vLLM server on &lt;code&gt;http://&amp;lt;host&amp;gt;:8000/v1&lt;/code&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;vLLM exposes a &lt;code&gt;/v1&lt;/code&gt; API that matches OpenAI’s Chat Completions API, which makes it an ideal drop-in backend.&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 1 – Register the provider in OpenCode auth
&lt;/h2&gt;

&lt;p&gt;OpenCode stores provider authentication details in:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;~/.local/share/opencode/auth.json
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If the file does not exist yet, create it.&lt;/p&gt;

&lt;p&gt;Add the following entry:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"vllm"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"api"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"key"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"sk-local"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Notes
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;vLLM does &lt;strong&gt;not&lt;/strong&gt; require an API key, but OpenCode expects one to exist.&lt;/li&gt;
&lt;li&gt;Any placeholder value works (&lt;code&gt;sk-local&lt;/code&gt; is a common convention).&lt;/li&gt;
&lt;li&gt;If &lt;code&gt;auth.json&lt;/code&gt; already exists, merge the &lt;code&gt;vllm&lt;/code&gt; block into the existing JSON.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Step 2 – Define the OpenAI-compatible provider
&lt;/h2&gt;

&lt;p&gt;Now define the provider itself in:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;~/.config/opencode/opencode.json
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Create the file if it doesn’t exist.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"$schema"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://opencode.ai/config.json"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"provider"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"vllm"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"npm"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"@ai-sdk/openai-compatible"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"vLLM (local)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"options"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"baseURL"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"http://100.108.174.26:8000/v1"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"models"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"Qwen3-Coder-30B-A3B-Instruct"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"My vLLM model"&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"vllm/Qwen3-Coder-30B-A3B-Instruct"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"small_model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"vllm/Qwen3-Coder-30B-A3B-Instruct"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Key fields explained
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;@ai-sdk/openai-compatible&lt;/code&gt;&lt;/strong&gt;
Tells OpenCode to treat this provider as OpenAI-compatible.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;baseURL&lt;/code&gt;&lt;/strong&gt;
Must point to the &lt;code&gt;/v1&lt;/code&gt; endpoint of your server.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;models&lt;/code&gt;&lt;/strong&gt;
The key must exactly match the model ID exposed by the backend.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;model&lt;/code&gt; / &lt;code&gt;small_model&lt;/code&gt;&lt;/strong&gt;
Sets the default model used by OpenCode.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Selecting your model
&lt;/h2&gt;

&lt;p&gt;After these steps, restart OpenCode if it’s running.&lt;/p&gt;

&lt;p&gt;You can now use:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/model
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Your custom provider and model will appear in the selection list.&lt;/p&gt;

</description>
      <category>opencode</category>
      <category>llm</category>
      <category>linux</category>
    </item>
  </channel>
</rss>
