<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Stanley Yang</title>
    <description>The latest articles on Forem by Stanley Yang (@stanleycyang).</description>
    <link>https://forem.com/stanleycyang</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3799116%2Fe8913bb1-3c3f-4141-b65b-ce126873bbb6.jpeg</url>
      <title>Forem: Stanley Yang</title>
      <link>https://forem.com/stanleycyang</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/stanleycyang"/>
    <language>en</language>
    <item>
      <title>How to Benchmark LLMs From Your Terminal in One Command</title>
      <dc:creator>Stanley Yang</dc:creator>
      <pubDate>Mon, 02 Mar 2026 05:44:38 +0000</pubDate>
      <link>https://forem.com/stanleycyang/how-to-benchmark-llms-from-your-terminal-in-one-command-20gj</link>
      <guid>https://forem.com/stanleycyang/how-to-benchmark-llms-from-your-terminal-in-one-command-20gj</guid>
      <description>&lt;h2&gt;
  
  
  How to Benchmark LLMs From Your Terminal in One Command
&lt;/h2&gt;

&lt;p&gt;With 40+ LLMs worth considering in 2026, picking the right model for your project means actually comparing them. In this tutorial, I'll show you how to benchmark LLMs directly from your terminal using &lt;a href="https://github.com/stanleycyang/yardstiq" rel="noopener noreferrer"&gt;yardstiq&lt;/a&gt;, an open-source CLI tool.&lt;/p&gt;

&lt;p&gt;No web UI, no notebooks, no setup. Just one command.&lt;/p&gt;

&lt;h2&gt;
  
  
  Prerequisites
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Node.js 18+&lt;/li&gt;
&lt;li&gt;At least one API key (OpenAI, Anthropic, Google, etc.) or &lt;a href="https://vercel.com/ai-gateway" rel="noopener noreferrer"&gt;Vercel AI Gateway&lt;/a&gt; key&lt;/li&gt;
&lt;li&gt;Optional: &lt;a href="https://ollama.com" rel="noopener noreferrer"&gt;Ollama&lt;/a&gt; for local models&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Step 1: Run Your First Comparison
&lt;/h2&gt;

&lt;p&gt;No install needed. npx handles it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx yardstiq &lt;span class="s2"&gt;"Explain the difference between TCP and UDP"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-m&lt;/span&gt; claude-sonnet &lt;span class="nt"&gt;-m&lt;/span&gt; gpt-4o
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you rather install it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm i &lt;span class="nt"&gt;-g&lt;/span&gt; yardstiq
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;yardstiq will prompt you for API keys on first run (&lt;code&gt;yardstiq setup&lt;/code&gt;), or you can set them as environment variables:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;ANTHROPIC_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;sk-ant-...
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;OPENAI_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;sk-...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or use a single key for all models:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;AI_GATEWAY_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;your_gateway_key
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You'll see responses stream side-by-side in real time, followed by a performance table:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Model              Time     TTFT     Tokens     Tok/sec   Cost
Claude Sonnet ⚡   1.24s    432ms    18→86      69.4 t/s  $0.0013
GPT-4o             1.89s    612ms    18→91      48.1 t/s  $0.0010
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Step 2: Compare More Models
&lt;/h2&gt;

&lt;p&gt;Add as many models as you want with &lt;code&gt;-m&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx yardstiq &lt;span class="s2"&gt;"Write a Python function to merge two sorted lists"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-m&lt;/span&gt; claude-sonnet &lt;span class="nt"&gt;-m&lt;/span&gt; gpt-4o &lt;span class="nt"&gt;-m&lt;/span&gt; gemini-flash &lt;span class="nt"&gt;-m&lt;/span&gt; deepseek
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This sends the same prompt to all four models simultaneously and streams all responses in parallel.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 3: Add an AI Judge
&lt;/h2&gt;

&lt;p&gt;Want an objective evaluation? Add &lt;code&gt;--judge&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx yardstiq &lt;span class="s2"&gt;"Implement a thread-safe singleton in Java"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-m&lt;/span&gt; claude-sonnet &lt;span class="nt"&gt;-m&lt;/span&gt; gpt-4o &lt;span class="nt"&gt;-m&lt;/span&gt; gemini-pro &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--judge&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The judge (defaults to a strong model) evaluates each response and gives scored verdicts with reasoning. You can customize it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx yardstiq &lt;span class="s2"&gt;"Write unit tests for this function"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-m&lt;/span&gt; claude-sonnet &lt;span class="nt"&gt;-m&lt;/span&gt; gpt-4o &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--judge&lt;/span&gt; &lt;span class="nt"&gt;--judge-model&lt;/span&gt; gpt-4.1 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--judge-criteria&lt;/span&gt; &lt;span class="s2"&gt;"Focus on edge case coverage and test readability"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Step 4: Compare Local Models
&lt;/h2&gt;

&lt;p&gt;If you have Ollama running, prefix models with &lt;code&gt;local:&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx yardstiq &lt;span class="s2"&gt;"Explain CORS in simple terms"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-m&lt;/span&gt; &lt;span class="nb"&gt;local&lt;/span&gt;:llama3.2 &lt;span class="nt"&gt;-m&lt;/span&gt; &lt;span class="nb"&gt;local&lt;/span&gt;:mistral
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Mix local and cloud for cost comparison:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx yardstiq &lt;span class="s2"&gt;"Parse this JSON and extract emails"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-m&lt;/span&gt; &lt;span class="nb"&gt;local&lt;/span&gt;:llama3.2 &lt;span class="nt"&gt;-m&lt;/span&gt; claude-haiku &lt;span class="nt"&gt;-m&lt;/span&gt; gpt-4o-mini
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Step 5: Use System Prompts and File Input
&lt;/h2&gt;

&lt;p&gt;Add context with system prompts:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx yardstiq &lt;span class="s2"&gt;"Review this code for security issues"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-s&lt;/span&gt; &lt;span class="s2"&gt;"You are a senior security engineer"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-m&lt;/span&gt; claude-sonnet &lt;span class="nt"&gt;-m&lt;/span&gt; gpt-4o
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Read prompts from files:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx yardstiq &lt;span class="nt"&gt;-f&lt;/span&gt; ./my-prompt.txt &lt;span class="nt"&gt;-m&lt;/span&gt; claude-sonnet &lt;span class="nt"&gt;-m&lt;/span&gt; gpt-4o
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Pipe from stdin:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cat &lt;/span&gt;code.py | npx yardstiq &lt;span class="nt"&gt;-s&lt;/span&gt; &lt;span class="s2"&gt;"Find bugs in this code"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-m&lt;/span&gt; claude-sonnet &lt;span class="nt"&gt;-m&lt;/span&gt; gpt-4o
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Step 6: Export Results
&lt;/h2&gt;

&lt;p&gt;Save comparisons in different formats:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# JSON (great for scripting and analysis)&lt;/span&gt;
npx yardstiq &lt;span class="s2"&gt;"Explain monads"&lt;/span&gt; &lt;span class="nt"&gt;-m&lt;/span&gt; claude-sonnet &lt;span class="nt"&gt;-m&lt;/span&gt; gpt-4o &lt;span class="nt"&gt;--json&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; results.json

&lt;span class="c"&gt;# Markdown (for documentation)&lt;/span&gt;
npx yardstiq &lt;span class="s2"&gt;"Explain monads"&lt;/span&gt; &lt;span class="nt"&gt;-m&lt;/span&gt; claude-sonnet &lt;span class="nt"&gt;-m&lt;/span&gt; gpt-4o &lt;span class="nt"&gt;--markdown&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; comparison.md

&lt;span class="c"&gt;# HTML (self-contained report with dark theme)&lt;/span&gt;
npx yardstiq &lt;span class="s2"&gt;"Explain monads"&lt;/span&gt; &lt;span class="nt"&gt;-m&lt;/span&gt; claude-sonnet &lt;span class="nt"&gt;-m&lt;/span&gt; gpt-4o &lt;span class="nt"&gt;--html&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; report.html
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Step 7: Run Benchmark Suites
&lt;/h2&gt;

&lt;p&gt;For systematic evaluation, create a YAML benchmark file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# coding-benchmark.yaml&lt;/span&gt;
&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;coding-eval&lt;/span&gt;
&lt;span class="na"&gt;prompts&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;category&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;algorithms&lt;/span&gt;
    &lt;span class="na"&gt;text&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Implement&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;a&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;LRU&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;cache&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;in&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Python&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;with&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;O(1)&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;operations"&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;category&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;debugging&lt;/span&gt;  
    &lt;span class="na"&gt;text&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Find&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;and&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;fix&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;the&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;race&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;condition&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;in&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;this&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;Go&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;code:&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;..."&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;category&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;refactoring&lt;/span&gt;
    &lt;span class="na"&gt;text&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Refactor&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;this&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;200-line&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;function&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;into&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;clean,&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;testable&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;modules"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Run it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx yardstiq benchmark run ./coding-benchmark.yaml &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-m&lt;/span&gt; claude-sonnet &lt;span class="nt"&gt;-m&lt;/span&gt; gpt-4o &lt;span class="nt"&gt;-m&lt;/span&gt; deepseek &lt;span class="nt"&gt;-m&lt;/span&gt; codestral
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This runs every prompt against every model and gives you aggregate scores.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 8: Save and Review History
&lt;/h2&gt;

&lt;p&gt;Save comparisons for later:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx yardstiq &lt;span class="s2"&gt;"Explain quicksort"&lt;/span&gt; &lt;span class="nt"&gt;-m&lt;/span&gt; claude-sonnet &lt;span class="nt"&gt;-m&lt;/span&gt; gpt-4o &lt;span class="nt"&gt;--save&lt;/span&gt; quicksort-compare

&lt;span class="c"&gt;# Later...&lt;/span&gt;
npx yardstiq &lt;span class="nb"&gt;history &lt;/span&gt;list
npx yardstiq &lt;span class="nb"&gt;history &lt;/span&gt;show quicksort-compare
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Useful Patterns
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Model selection for a project:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Test your actual use case across budget tiers&lt;/span&gt;
npx yardstiq &lt;span class="nt"&gt;-f&lt;/span&gt; ./real-prompt.txt &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-m&lt;/span&gt; claude-sonnet &lt;span class="nt"&gt;-m&lt;/span&gt; claude-haiku &lt;span class="nt"&gt;-m&lt;/span&gt; gpt-4o &lt;span class="nt"&gt;-m&lt;/span&gt; gpt-4o-mini &lt;span class="nt"&gt;-m&lt;/span&gt; deepseek &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--judge&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Quick cost comparison:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Same task, different price points&lt;/span&gt;
npx yardstiq &lt;span class="s2"&gt;"Summarize this article: ..."&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-m&lt;/span&gt; claude-haiku &lt;span class="nt"&gt;-m&lt;/span&gt; gpt-4o-mini &lt;span class="nt"&gt;-m&lt;/span&gt; gemini-flash-lite &lt;span class="nt"&gt;-m&lt;/span&gt; &lt;span class="nb"&gt;local&lt;/span&gt;:llama3.2
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Tuning temperature:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx yardstiq &lt;span class="s2"&gt;"Write a creative product name for a sleep tracking app"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-m&lt;/span&gt; claude-sonnet &lt;span class="nt"&gt;-m&lt;/span&gt; gpt-4o &lt;span class="nt"&gt;-t&lt;/span&gt; 0.9
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Wrapping Up
&lt;/h2&gt;

&lt;p&gt;yardstiq gives you a fast feedback loop for model comparison without leaving your terminal. It's not a replacement for rigorous evaluation frameworks, but it covers 90% of the "which model should I use?" decisions.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Repo:&lt;/strong&gt; &lt;a href="https://github.com/stanleycyang/yardstiq" rel="noopener noreferrer"&gt;github.com/stanleycyang/yardstiq&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;npm:&lt;/strong&gt; &lt;a href="https://npmjs.com/package/yardstiq" rel="noopener noreferrer"&gt;npmjs.com/package/yardstiq&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;License:&lt;/strong&gt; MIT&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>cli</category>
      <category>tutorial</category>
    </item>
  </channel>
</rss>
