<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Manas Mudbari</title>
    <description>The latest articles on Forem by Manas Mudbari (@manasmudbari).</description>
    <link>https://forem.com/manasmudbari</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3482465%2F83858ffd-59ec-4d05-885e-87ba3587c3a3.JPG</url>
      <title>Forem: Manas Mudbari</title>
      <link>https://forem.com/manasmudbari</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/manasmudbari"/>
    <language>en</language>
    <item>
      <title>Can AI Remember the Market? Teaching LLMs to Detect When the Rules Change</title>
      <dc:creator>Manas Mudbari</dc:creator>
      <pubDate>Sun, 08 Mar 2026 17:06:02 +0000</pubDate>
      <link>https://forem.com/manasmudbari/can-ai-remember-the-market-teaching-llms-to-detect-when-the-rules-change-3g5f</link>
      <guid>https://forem.com/manasmudbari/can-ai-remember-the-market-teaching-llms-to-detect-when-the-rules-change-3g5f</guid>
      <description>&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;We built a memory system for LLMs to track Bitcoin market regimes. The LLM can't predict tomorrow's price any better than a coin flip (nobody can, honestly). But it &lt;em&gt;can&lt;/em&gt; detect major market regime changes with zero false alarms, and unlike every statistical method, it tells you &lt;em&gt;why&lt;/em&gt; the regime changed in plain English. That explainability is the real contribution.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  The Problem: AI Models Have Amnesia
&lt;/h2&gt;

&lt;p&gt;Imagine you trained an AI model to predict Bitcoin prices during the 2020-2021 bull run, a period when institutional investors were piling in, central banks were printing money, and everything was going up. The model learns the rules of that world pretty well.&lt;/p&gt;

&lt;p&gt;Then 2022 arrives. The Fed starts aggressively hiking interest rates. Crypto exchange FTX collapses. Luna implodes. The entire market enters a prolonged bear phase.&lt;/p&gt;

&lt;p&gt;Your model, still operating on the old rules, has no idea what hit it.&lt;/p&gt;

&lt;p&gt;This is called &lt;strong&gt;concept drift&lt;/strong&gt;: when the underlying patterns that a model learned no longer reflect reality. It's one of the most underappreciated problems in applied machine learning, especially in financial markets where the "rules" can change overnight.&lt;/p&gt;

&lt;p&gt;Traditional fixes are crude: either retrain the model on new data (expensive and reactive), or use statistical alarms that fire when something looks statistically unusual (they tell you &lt;em&gt;that&lt;/em&gt; something changed, but never &lt;em&gt;why&lt;/em&gt;).&lt;/p&gt;




&lt;h2&gt;
  
  
  Our Idea: Give the AI a Memory
&lt;/h2&gt;

&lt;p&gt;Large language models (LLMs) like GPT-4 have been trained on enormous amounts of text, including financial news, market commentary, earnings reports, and macroeconomic analysis. They already "know" things like "when the Fed raises rates, risk assets tend to fall" or "Bitcoin historically rallies in the months before a halving."&lt;/p&gt;

&lt;p&gt;What if we could structure that knowledge into a formal &lt;strong&gt;memory system&lt;/strong&gt; that the LLM consults before making predictions? Instead of treating every 24-hour window as if it exists in isolation, the model would have context: &lt;em&gt;what regime is the market in right now, what has happened before in similar conditions, and what did the model itself predict recently?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;That's the core idea of this paper. We built four types of adaptive memory and tested them on seven years of Bitcoin data.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Four Memory Types
&lt;/h2&gt;

&lt;p&gt;Think of each memory type as a different "cheat sheet" the AI gets to read before making its prediction:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Regime Memory&lt;/strong&gt;&lt;br&gt;
The AI is told what "mode" the market is currently in (e.g., "Macro Bear Market") and what characteristics define that mode. Like giving a student a study guide that says: "Right now we're in a period defined by Fed tightening, exchange failures, and risk-off sentiment."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. News Memory&lt;/strong&gt;&lt;br&gt;
Recent headlines are ranked by importance and fed to the model, along with any major events that happened during the same calendar window in previous years. Think of it as saying: "Here are the most important things happening right now, and here's what happened at this time of year historically."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Similarity Memory&lt;/strong&gt;&lt;br&gt;
The current market conditions (price momentum, volatility, volume) are compared against every similar-looking period in the past seven years. The top five most similar historical windows are retrieved, along with what actually happened next. Essentially: "The last five times the market looked like this, here's what followed."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Relative Memory&lt;/strong&gt;&lt;br&gt;
The AI is shown a log of its own recent predictions: how accurate it's been, whether it's been systematically biased toward UP or DOWN, and what its last seven predictions were. This lets it self-correct: "I've been wrong six times in a row predicting UP, maybe I should reconsider."&lt;/p&gt;




&lt;h2&gt;
  
  
  Two Tests
&lt;/h2&gt;

&lt;p&gt;We ran the system on two tasks:&lt;/p&gt;

&lt;h3&gt;
  
  
  Task 1: Predict Tomorrow's Price Direction
&lt;/h3&gt;

&lt;p&gt;Given the last 7 days of Bitcoin price data, predict whether the price will be higher or lower 24 hours from now.&lt;/p&gt;

&lt;h3&gt;
  
  
  Task 2: Detect Regime Changes
&lt;/h3&gt;

&lt;p&gt;Given the current market conditions and recent news, determine whether Bitcoin has transitioned into a fundamentally new market regime.&lt;/p&gt;

&lt;p&gt;We tested against 6 real historical regime transitions that occurred between 2017 and 2024.&lt;/p&gt;




&lt;h2&gt;
  
  
  What We Found
&lt;/h2&gt;

&lt;h3&gt;
  
  
  On Price Prediction: Nobody Wins
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Method&lt;/th&gt;
&lt;th&gt;Accuracy&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;LSTM (traditional neural net)&lt;/td&gt;
&lt;td&gt;50.8%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;LLM with no memory&lt;/td&gt;
&lt;td&gt;50.1%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;LLM + Similarity Memory&lt;/td&gt;
&lt;td&gt;51.3%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;LLM + News Memory&lt;/td&gt;
&lt;td&gt;48.6%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;LLM + Regime Memory&lt;/td&gt;
&lt;td&gt;47.1%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;LLM + Relative Memory&lt;/td&gt;
&lt;td&gt;49.0%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Every single method lands within a coin-flip range of 50%. This is actually an important and honest result, confirming that short-term Bitcoin price prediction is genuinely hard regardless of how sophisticated your model is. Nobody has cracked this, and we didn't pretend to either.&lt;/p&gt;

&lt;p&gt;The statistical analysis confirmed that none of the differences between methods are statistically significant. In plain terms: the margin of error swallows all the differences.&lt;/p&gt;

&lt;h3&gt;
  
  
  On Regime Detection: The LLM Has a Unique Edge
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Method&lt;/th&gt;
&lt;th&gt;Detected&lt;/th&gt;
&lt;th&gt;False Alarm Rate&lt;/th&gt;
&lt;th&gt;Can Explain Why?&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;CUSUM (statistical)&lt;/td&gt;
&lt;td&gt;5/6 (83%)&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;LLM&lt;/td&gt;
&lt;td&gt;3/6 (50%)&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;0%&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Yes&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;BinSeg (statistical)&lt;/td&gt;
&lt;td&gt;2/6 (33%)&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Bollinger Bands&lt;/td&gt;
&lt;td&gt;1/6 (17%)&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The LLM doesn't win on raw detection rate; CUSUM beats it handily. But two things stand out:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Zero false alarms.&lt;/strong&gt; The LLM never incorrectly flagged a regime change when there wasn't one. It only raised its hand when it was genuinely confident.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;It can explain itself.&lt;/strong&gt; When CUSUM fires, it just says "something changed." When the LLM fires, it says things like: &lt;em&gt;"Fed tightening beginning in Q1 2022, combined with the collapse of the Terra/Luna ecosystem in May, has fundamentally altered risk appetite. The current regime shows classic bear market characteristics: declining volume, high correlation with equities, and consistent negative news flow from exchange insolvencies."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;That explanation has real practical value. A risk manager doesn't just want to know the alarm went off; they want to know why, so they can decide what to do.&lt;/p&gt;




&lt;h2&gt;
  
  
  Where the LLM Struggled
&lt;/h2&gt;

&lt;p&gt;The most interesting failure was the &lt;strong&gt;Institutional Accumulation&lt;/strong&gt; regime (April 2019 to February 2020). This was a quiet period of slow, steady accumulation by institutional players like Grayscale, with no dramatic headlines, no price explosions, and no obvious trigger.&lt;/p&gt;

&lt;p&gt;The LLM scored 0% on detecting this transition. It relies heavily on news hooks and dramatic price movements. Slow, structural, low-noise regime changes are essentially invisible to it.&lt;/p&gt;

&lt;p&gt;This reveals a genuine limitation: &lt;strong&gt;LLMs reason from narrative, and quiet regimes have no narrative.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The Bigger Picture
&lt;/h2&gt;

&lt;p&gt;The paper makes a case that LLMs and traditional statistical methods are &lt;strong&gt;complementary, not competing&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use CUSUM as a cheap, fast first-stage detector (it's great at catching that &lt;em&gt;something&lt;/em&gt; changed)&lt;/li&gt;
&lt;li&gt;Use the LLM as a second stage to interpret &lt;em&gt;what&lt;/em&gt; changed and &lt;em&gt;why&lt;/em&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Neither alone is the full answer. Together, they cover each other's weaknesses.&lt;/p&gt;




&lt;h2&gt;
  
  
  What We Released
&lt;/h2&gt;

&lt;p&gt;Everything is open source:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The full Bitcoin OHLCV dataset (2017-2024) with labeled regimes&lt;/li&gt;
&lt;li&gt;50 annotated news events&lt;/li&gt;
&lt;li&gt;All model code, prompts, and raw LLM responses&lt;/li&gt;
&lt;li&gt;A reproducibility checklist so anyone can replicate every number in the paper&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The total API cost to run every LLM experiment in the paper was about &lt;strong&gt;$4.40&lt;/strong&gt;. The entire pipeline is accessible to any individual researcher without institutional compute budgets.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;The full paper is available on engrXiv. Code and data are on &lt;a href="https://github.com/manasmudbari/bitcoin-llm-regime-analysis" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>llm</category>
      <category>ai</category>
      <category>bitcoin</category>
      <category>datascience</category>
    </item>
    <item>
      <title>Building a Multi-Provider LLM Benchmark with Automated GitHub Actions</title>
      <dc:creator>Manas Mudbari</dc:creator>
      <pubDate>Sun, 25 Jan 2026 17:10:29 +0000</pubDate>
      <link>https://forem.com/manasmudbari/building-a-multi-provider-llm-benchmark-with-automated-github-actions-hk6</link>
      <guid>https://forem.com/manasmudbari/building-a-multi-provider-llm-benchmark-with-automated-github-actions-hk6</guid>
      <description>&lt;p&gt;A core problem we tackled when building realtime LLM based signal analysis is LLM token efficiency: when you're feeding time-series data (stock prices, IoT sensors, blockchain events) into LLMs, the serialization format matters. A lot.&lt;/p&gt;

&lt;p&gt;We needed hard numbers to prove it. So we built an automated benchmark system that runs every two weeks, tests four data formats across four major LLM providers, and publishes live results on our website.&lt;/p&gt;

&lt;p&gt;Here's how we built it.&lt;/p&gt;




&lt;h2&gt;
  
  
  Proving Token Efficiency at Scale
&lt;/h2&gt;

&lt;p&gt;Time-series data is structurally simple but verbose. JSON, the industry default, repeats keys on every row. CSV is better, but still repeats full timestamps and values. For LLMs, this repetition directly translates to tokens—and tokens cost money.&lt;/p&gt;

&lt;p&gt;We developed &lt;strong&gt;TSLN&lt;/strong&gt; (Time-Series Lean Notation), a format that exploits temporal regularity and delta encoding to reduce token count by up to 87%. But claiming efficiency isn't enough. We needed:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Reproducible benchmarks&lt;/strong&gt; across multiple LLM providers&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Automated execution&lt;/strong&gt; so results stay current&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Public transparency&lt;/strong&gt; so developers can verify our claims&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The result: an open-source benchmark suite that runs automatically via GitHub Actions and displays live results on &lt;a href="https://www.turboline.ai#benchmark" rel="noopener noreferrer"&gt;turboline.ai&lt;/a&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Architecture Overview
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;┌─────────────────────────────────────────────────────┐
│  GitHub Actions (Bi-weekly cron + manual trigger)   │
│  ┌───────────────────────────────────────────────┐  │
│  │  1. Checkout repo                             │  │
│  │  2. Install Python deps (openai, anthropic...) │  │
│  │  3. Run benchmark script                      │  │
│  │  4. Commit results to public/data/*.json      │  │
│  │  5. Push to main branch                       │  │
│  └───────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────┘
                        │
                        │ Git push triggers Railway
                        ▼
┌─────────────────────────────────────────────────────┐
│  Railway (Automated CI/CD)                          │
│  ┌───────────────────────────────────────────────┐  │
│  │  1. Detect commit to main                     │  │
│  │  2. Build Next.js site                        │  │
│  │  3. Deploy to production                      │  │
│  └───────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────┘
                        │
                        ▼
          Website auto-loads /data/benchmark-results.json
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Key components:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Python benchmark runner&lt;/strong&gt; (&lt;code&gt;benchmark/run_full_benchmark.py&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GitHub Actions workflow&lt;/strong&gt; (&lt;code&gt;.github/workflows/run-benchmark.yml&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Next.js frontend&lt;/strong&gt; (React component with Recharts visualization)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Railway deployment&lt;/strong&gt; (automatic on git push)&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  The Benchmark Script
&lt;/h2&gt;

&lt;p&gt;The core script tests four serialization formats:&lt;/p&gt;

&lt;h3&gt;
  
  
  Data Formats Tested
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;JSON&lt;/strong&gt; - Baseline format with full object notation&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CSV&lt;/strong&gt; - Header row with comma-separated values&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;TSLN&lt;/strong&gt; - Time-Series Lean Notation (our format)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;TOON&lt;/strong&gt; - Token-Oriented Object Notation (pipe-delimited)&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Sample Data Generation
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;generate_sample_data&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;format_name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Generate 100 data points in different formats.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;format_name&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;json&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;data&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
                &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;timestamp&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;2024-01-01T09:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;zfill&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;:00Z&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
                 &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;value&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;150.0&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
                &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="p"&gt;})&lt;/span&gt;

    &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;format_name&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;csv&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;rows&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;timestamp,value&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="n"&gt;rows&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;extend&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
            &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;2024-01-01T09:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;zfill&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;:00Z,&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="mf"&gt;150.0&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;])&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rows&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;format_name&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tsln&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# Delta-encoded compact format
&lt;/span&gt;        &lt;span class="n"&gt;values&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;150.0&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;t:2024-01-01T09:00:00Z|i:60|v:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;,&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;values&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;format_name&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;toon&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# Pipe-delimited format
&lt;/span&gt;        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;timestamp|value&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
            &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;2024-01-01T09:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;zfill&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;:00Z|&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="mf"&gt;150.0&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Token Counting &amp;amp; Cost Calculation
&lt;/h3&gt;

&lt;p&gt;Each benchmark:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Generates sample data&lt;/strong&gt; (100 stock price data points)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Counts tokens&lt;/strong&gt; using a simple heuristic (~4 chars/token)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Calculates costs&lt;/strong&gt; using provider-specific pricing:

&lt;ul&gt;
&lt;li&gt;OpenAI GPT-4o-mini: $0.15/1M tokens&lt;/li&gt;
&lt;li&gt;Anthropic Claude Haiku: $0.80/1M tokens&lt;/li&gt;
&lt;li&gt;Google Gemini 1.5 Flash: $0.075/1M tokens&lt;/li&gt;
&lt;li&gt;Deepseek: $0.14/1M tokens
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;run_single_benchmark&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;provider&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
                        &lt;span class="n"&gt;format_name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Analyze this time-series data and summarize trends.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;full_prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;input_tokens&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;estimate_tokens&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;full_prompt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Provider-specific cost rates
&lt;/span&gt;    &lt;span class="n"&gt;cost_per_1m_tokens&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;openai&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-4o-mini&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.15&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;anthropic&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claude-haiku-4-5-20251001&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.8&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;google&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gemini-1.5-flash&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.075&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;deepseek&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;deepseek-chat&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.14&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="n"&gt;rate&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cost_per_1m_tokens&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;provider&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;cost_usd&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;input_tokens&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;1_000_000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;rate&lt;/span&gt;
    &lt;span class="n"&gt;cost_per_100k_datapoints&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;cost_usd&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;provider&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;provider&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;model&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;format&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;format_name&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;upper&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;input_tokens&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;input_tokens&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cost_per_100k_datapoints&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;cost_per_100k_datapoints&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="c1"&gt;# ... more metadata
&lt;/span&gt;    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Summary Statistics
&lt;/h3&gt;

&lt;p&gt;After running all combinations (4 formats × 4 providers = 16 tests), we aggregate results:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;calculate_summary&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Calculate per-format averages and savings vs JSON.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;format_groups&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;fmt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;format&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;fmt&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;format_groups&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;format_groups&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;fmt&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
        &lt;span class="n"&gt;format_groups&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;fmt&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;format_stats&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;fmt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;items&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;format_groups&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;items&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
        &lt;span class="n"&gt;avg_tokens&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;input_tokens&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;items&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;items&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;avg_cost&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cost_per_100k_datapoints&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;items&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;items&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;format_stats&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;fmt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;()]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;avg_input_tokens&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;avg_tokens&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;avg_cost_per_100k&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;avg_cost&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sample_count&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;items&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;savings_vs_json_percent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.0&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="c1"&gt;# Calculate savings relative to JSON baseline
&lt;/span&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;json&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;format_stats&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;json_cost&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;format_stats&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;json&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;avg_cost_per_100k&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;fmt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;stats&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;format_stats&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;items&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;fmt&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;json&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;savings&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;json_cost&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;stats&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;avg_cost_per_100k&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;json_cost&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;
                &lt;span class="n"&gt;stats&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;savings_vs_json_percent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;savings&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;format_stats&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  GitHub Actions Automation
&lt;/h2&gt;

&lt;p&gt;The workflow runs on a bi-weekly schedule but can also be triggered manually:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Run Benchmark Bi-weekly&lt;/span&gt;

&lt;span class="na"&gt;on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;schedule&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# Every 2 weeks on Sunday at 00:00 UTC&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;cron&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;0&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;0&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;*/14&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;*&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;0'&lt;/span&gt;
  &lt;span class="na"&gt;workflow_dispatch&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;  &lt;span class="c1"&gt;# Manual trigger via GitHub UI&lt;/span&gt;

&lt;span class="na"&gt;permissions&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; 
  &lt;span class="na"&gt;contents&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;write&lt;/span&gt;  &lt;span class="c1"&gt;# Required to commit results&lt;/span&gt;

&lt;span class="na"&gt;jobs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;run-benchmark&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;runs-on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ubuntu-latest&lt;/span&gt;

    &lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Checkout repository&lt;/span&gt;
        &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/checkout@v4&lt;/span&gt;
        &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; 
          &lt;span class="na"&gt;token&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ secrets.GITHUB_TOKEN }}&lt;/span&gt;
          &lt;span class="na"&gt;persist-credentials&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;

      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Set up Python&lt;/span&gt;
        &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/setup-python@v5&lt;/span&gt;
        &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;python-version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;3.11'&lt;/span&gt;

      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Install Python dependencies&lt;/span&gt;
        &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
          &lt;span class="s"&gt;pip install openai anthropic google-generativeai&lt;/span&gt;

      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Run benchmark script&lt;/span&gt;
        &lt;span class="na"&gt;env&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;OPENAI_API_KEY&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ secrets.OPENAI_API_KEY }}&lt;/span&gt;
          &lt;span class="na"&gt;ANTHROPIC_API_KEY&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ secrets.ANTHROPIC_API_KEY }}&lt;/span&gt;
          &lt;span class="na"&gt;GOOGLE_API_KEY&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ secrets.GOOGLE_API_KEY }}&lt;/span&gt;
          &lt;span class="na"&gt;DEEPSEEK_API_KEY&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ secrets.DEEPSEEK_API_KEY }}&lt;/span&gt;
        &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
          &lt;span class="s"&gt;python benchmark/run_full_benchmark.py&lt;/span&gt;

      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Commit and push results&lt;/span&gt;
        &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
          &lt;span class="s"&gt;git config --local user.email "github-actions[bot]@users.noreply.github.com"&lt;/span&gt;
          &lt;span class="s"&gt;git config --local user.name "GitHub Actions"&lt;/span&gt;
          &lt;span class="s"&gt;git add public/data/benchmark-results.json&lt;/span&gt;
          &lt;span class="s"&gt;git diff --staged --quiet || git commit -m "Update benchmark results [automated]"&lt;/span&gt;
          &lt;span class="s"&gt;git push&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Key Features
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;API Secrets Management&lt;/strong&gt;: API keys are stored as GitHub repository secrets and injected as environment variables during workflow execution.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Conditional Commits&lt;/strong&gt;: The &lt;code&gt;git diff --staged --quiet ||&lt;/code&gt; pattern ensures we only commit when results actually change.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Automated Deployment&lt;/strong&gt;: After pushing to main, Railway automatically detects the change and redeploys the Next.js site within ~2 minutes.&lt;/p&gt;




&lt;h2&gt;
  
  
  Real Results from Latest Run
&lt;/h2&gt;

&lt;p&gt;Here's what our latest benchmark (January 20, 2026) shows for &lt;strong&gt;100 stock price data points&lt;/strong&gt;:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Format&lt;/th&gt;
&lt;th&gt;Avg Tokens&lt;/th&gt;
&lt;th&gt;Cost/100k Points&lt;/th&gt;
&lt;th&gt;Savings vs JSON&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;JSON&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;1,397&lt;/td&gt;
&lt;td&gt;$0.0404&lt;/td&gt;
&lt;td&gt;— (baseline)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;CSV&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;698&lt;/td&gt;
&lt;td&gt;$0.0202&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;50.0%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;TOON&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;698&lt;/td&gt;
&lt;td&gt;$0.0202&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;50.0%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;TSLN&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;177&lt;/td&gt;
&lt;td&gt;$0.0052&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;87.3%&lt;/strong&gt; ✨&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Key findings:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;TSLN uses 87.3% fewer tokens&lt;/strong&gt; than JSON across all providers&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CSV and TOON are equivalent&lt;/strong&gt; at ~50% savings (both avoid JSON's key repetition)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Savings are consistent&lt;/strong&gt; across OpenAI, Anthropic, Google, and Deepseek&lt;/li&gt;
&lt;li&gt;For 100k data points, JSON costs &lt;strong&gt;~$4&lt;/strong&gt; while TSLN costs &lt;strong&gt;~$0.52&lt;/strong&gt; (average across providers)&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Frontend Visualization
&lt;/h2&gt;

&lt;p&gt;The benchmark results are visualized on our homepage using React + Recharts:&lt;/p&gt;

&lt;h3&gt;
  
  
  Features
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Provider Tabs&lt;/strong&gt;: Switch between aggregated view and provider-specific breakdowns (OpenAI, Anthropic, Google, Deepseek).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Interactive Table&lt;/strong&gt;: Shows format comparison with highlighting for best performance.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cost Comparison Chart&lt;/strong&gt;: Bar chart using Recharts with color-coded formats:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;🔴 JSON (red) - baseline&lt;/li&gt;
&lt;li&gt;🟠 CSV (orange)&lt;/li&gt;
&lt;li&gt;🔵 TOON (blue)
&lt;/li&gt;
&lt;li&gt;🟢 TSLN (green) - most efficient&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Stats Cards&lt;/strong&gt;: Display best format, max savings %, and test success rate.&lt;/p&gt;

&lt;h3&gt;
  
  
  Implementation Snippet
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight tsx"&gt;&lt;code&gt;&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;default&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;LLMBenchmark&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;setData&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;useState&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;BenchmarkData&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;activeTab&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;setActiveTab&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;useState&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;average&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

  &lt;span class="nf"&gt;useEffect&lt;/span&gt;&lt;span class="p"&gt;(()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// Load from static JSON generated by GitHub Actions&lt;/span&gt;
    &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/data/benchmark-results.json&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
      &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;then&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;res&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
      &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;then&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;setData&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="p"&gt;[])&lt;/span&gt;

  &lt;span class="c1"&gt;// Calculate provider-specific or averaged stats&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;getProviderStats&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;providerId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;providerId&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;average&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;summary&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;format_stats&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="c1"&gt;// Filter and aggregate by provider&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;providerResults&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;results&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
      &lt;span class="nx"&gt;r&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;provider&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="nx"&gt;providerId&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="c1"&gt;// ... aggregate by format&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="k"&gt;return &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nt"&gt;section&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="cm"&gt;/* Provider tabs */&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;
      &lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nt"&gt;div&lt;/span&gt; &lt;span class="na"&gt;className&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;"flex gap-2"&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
        &lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nx"&gt;PROVIDERS&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;provider&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
          &lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nt"&gt;button&lt;/span&gt; &lt;span class="na"&gt;onClick&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;setActiveTab&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;provider&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
            &lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nx"&gt;provider&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;logo&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;Image&lt;/span&gt; &lt;span class="na"&gt;src&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nx"&gt;provider&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;logo&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt; &lt;span class="p"&gt;/&amp;gt;&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;
            &lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nx"&gt;provider&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;
          &lt;span class="p"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="nt"&gt;button&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
        &lt;span class="p"&gt;))&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;
      &lt;span class="p"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="nt"&gt;div&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;

      &lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="cm"&gt;/* Results table and chart */&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;
      &lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;ResponsiveContainer&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
        &lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;BarChart&lt;/span&gt; &lt;span class="na"&gt;data&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nx"&gt;chartData&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
          &lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;Bar&lt;/span&gt; &lt;span class="na"&gt;dataKey&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;"cost"&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
            &lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nx"&gt;chartData&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;entry&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
              &lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;Cell&lt;/span&gt; &lt;span class="na"&gt;fill&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nx"&gt;formatColors&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;entry&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;format&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt; &lt;span class="p"&gt;/&amp;gt;&lt;/span&gt;
            &lt;span class="p"&gt;))&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;
          &lt;span class="p"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="nc"&gt;Bar&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
        &lt;span class="p"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="nc"&gt;BarChart&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="p"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="nc"&gt;ResponsiveContainer&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="p"&gt;&amp;lt;/&lt;/span&gt;&lt;span class="nt"&gt;section&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
  &lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  CI/CD Pipeline: GitHub Actions → Railway
&lt;/h2&gt;

&lt;p&gt;Our deployment pipeline is fully automated:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. GitHub Actions Runs Benchmark
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Trigger&lt;/strong&gt;: Cron schedule (bi-weekly) or manual dispatch&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Actions&lt;/strong&gt;: Run Python benchmark, commit JSON results&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Output&lt;/strong&gt;: &lt;code&gt;public/data/benchmark-results.json&lt;/code&gt; committed to main&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2. Railway Detects Git Push
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Connected to GitHub&lt;/strong&gt;: Railway monitors our main branch&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Auto-build&lt;/strong&gt;: Detects commit, runs &lt;code&gt;npm run build&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Auto-deploy&lt;/strong&gt;: Ships new build to production&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  3. Next.js Loads Static JSON
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Static file&lt;/strong&gt;: Results JSON is in &lt;code&gt;/public&lt;/code&gt;, served directly&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Client-side fetch&lt;/strong&gt;: React component loads on mount&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fast &amp;amp; cacheable&lt;/strong&gt;: No backend needed for benchmark data&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This architecture is &lt;strong&gt;serverless-friendly&lt;/strong&gt;: the benchmark results are just static JSON, so we avoid database costs and API rate limits.&lt;/p&gt;




&lt;h2&gt;
  
  
  TypeScript Type Safety
&lt;/h2&gt;

&lt;p&gt;We maintain type definitions shared between Python output and TypeScript frontend:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// lib/benchmark-types.ts&lt;/span&gt;
&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kr"&gt;interface&lt;/span&gt; &lt;span class="nx"&gt;BenchmarkResult&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nl"&gt;provider&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;
  &lt;span class="nx"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;
  &lt;span class="nx"&gt;format&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;
  &lt;span class="nx"&gt;input_tokens&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;
  &lt;span class="nx"&gt;output_tokens&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;
  &lt;span class="nx"&gt;cost_usd&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;
  &lt;span class="nx"&gt;cost_per_100k_datapoints&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;
  &lt;span class="nx"&gt;timestamp&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;
  &lt;span class="nx"&gt;success&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;boolean&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kr"&gt;interface&lt;/span&gt; &lt;span class="nx"&gt;FormatSummaryStats&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nl"&gt;avg_input_tokens&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;
  &lt;span class="nx"&gt;avg_cost_per_100k&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;
  &lt;span class="nx"&gt;sample_count&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;
  &lt;span class="nx"&gt;savings_vs_json_percent&lt;/span&gt;&lt;span class="p"&gt;?:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kr"&gt;interface&lt;/span&gt; &lt;span class="nx"&gt;BenchmarkData&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nl"&gt;benchmark_date&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;
  &lt;span class="nx"&gt;job_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;
  &lt;span class="nx"&gt;config&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nl"&gt;formats&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;[];&lt;/span&gt; &lt;span class="nl"&gt;providers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;[];&lt;/span&gt; &lt;span class="cm"&gt;/* ... */&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="nl"&gt;results&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;BenchmarkResult&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt;
  &lt;span class="nx"&gt;summary&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nl"&gt;format_stats&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Record&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;FormatSummaryStats&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;
    &lt;span class="nx"&gt;best_format&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;
    &lt;span class="nx"&gt;max_savings_percent&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This ensures the Python output schema matches what the React component expects.&lt;/p&gt;




&lt;h2&gt;
  
  
  Live Visualization Deep Dive
&lt;/h2&gt;

&lt;p&gt;The frontend uses &lt;strong&gt;Framer Motion&lt;/strong&gt; for animations and &lt;strong&gt;Recharts&lt;/strong&gt; for data visualization:&lt;/p&gt;

&lt;h3&gt;
  
  
  Provider Switching
&lt;/h3&gt;

&lt;p&gt;Users can toggle between:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Average&lt;/strong&gt; - Aggregated stats across all providers&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;OpenAI&lt;/strong&gt; - GPT-4o-mini specific results&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Anthropic&lt;/strong&gt; - Claude Haiku specific results&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Google&lt;/strong&gt; - Gemini Flash specific results&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Deepseek&lt;/strong&gt; - Deepseek Chat specific results&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When switching tabs, we filter &lt;code&gt;data.results&lt;/code&gt; by provider and recalculate format averages dynamically:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight tsx"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;getProviderStats&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;providerId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;providerId&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;average&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;summary&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;format_stats&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;providerResults&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;results&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="nx"&gt;r&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;provider&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="nx"&gt;providerId&lt;/span&gt;
  &lt;span class="p"&gt;)&lt;/span&gt;

  &lt;span class="c1"&gt;// Group by format and calculate averages&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;formatGroups&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;
  &lt;span class="nx"&gt;providerResults&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;forEach&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;fmt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;format&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;toLowerCase&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;formatGroups&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;fmt&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="nx"&gt;formatGroups&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;fmt&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="nx"&gt;formatGroups&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;fmt&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;push&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="p"&gt;})&lt;/span&gt;

  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nb"&gt;Object&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;entries&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;formatGroups&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;reduce&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;stats&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;fmt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;results&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;stats&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;fmt&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;avg_input_tokens&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;avg&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;results&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;input_tokens&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
      &lt;span class="na"&gt;avg_cost_per_100k&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;avg&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;results&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;cost_per_100k_datapoints&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
      &lt;span class="na"&gt;savings_vs_json_percent&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="cm"&gt;/* calculated vs JSON */&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;stats&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="p"&gt;{})&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Color Coding
&lt;/h3&gt;

&lt;p&gt;Each format has a semantic color:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;🔴 &lt;strong&gt;JSON (red)&lt;/strong&gt; - Most expensive baseline&lt;/li&gt;
&lt;li&gt;🟠 &lt;strong&gt;CSV (orange)&lt;/strong&gt; - Moderate efficiency&lt;/li&gt;
&lt;li&gt;🔵 &lt;strong&gt;TOON (blue)&lt;/strong&gt; - Moderate efficiency&lt;/li&gt;
&lt;li&gt;🟢 &lt;strong&gt;TSLN (green)&lt;/strong&gt; - Highest efficiency&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Responsive Design
&lt;/h3&gt;

&lt;p&gt;The chart uses &lt;code&gt;ResponsiveContainer&lt;/code&gt; from Recharts to adapt to mobile/tablet/desktop. Table columns stack on smaller screens.&lt;/p&gt;




&lt;h2&gt;
  
  
  Lessons Learned
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. &lt;strong&gt;Static JSON &amp;gt; Database for Benchmark Results&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;We initially considered storing results in a database, but realized:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Results only update bi-weekly&lt;/li&gt;
&lt;li&gt;No user-specific data&lt;/li&gt;
&lt;li&gt;Static files are faster and free&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2. &lt;strong&gt;GitHub Actions Auto-Commit is Powerful&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;The pattern of running a script, committing output, and pushing back to the repo unlocks:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Automated data pipelines&lt;/li&gt;
&lt;li&gt;Version-controlled results&lt;/li&gt;
&lt;li&gt;GitOps-style transparency&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  3. &lt;strong&gt;Railway's GitHub Integration is Seamless&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;We didn't write a single line of deploy config. Railway just watches our main branch and redeploys on every push. Perfect for small teams.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. &lt;strong&gt;Token Efficiency Compounds Quickly&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;At 100 data points, TSLN saves ~$0.035 per run. At 10,000 data points, that becomes $3.50 per run. For production systems ingesting millions of time-series events, the savings are &lt;strong&gt;material&lt;/strong&gt;.&lt;/p&gt;

</description>
      <category>llm</category>
      <category>githubactions</category>
      <category>python</category>
    </item>
    <item>
      <title>A Token-Efficient Way to Send Time-Series Data into LLMs</title>
      <dc:creator>Manas Mudbari</dc:creator>
      <pubDate>Wed, 31 Dec 2025 21:14:08 +0000</pubDate>
      <link>https://forem.com/manasmudbari/a-token-efficient-way-to-send-time-series-data-into-llms-2h80</link>
      <guid>https://forem.com/manasmudbari/a-token-efficient-way-to-send-time-series-data-into-llms-2h80</guid>
      <description>&lt;p&gt;If you’ve ever pushed time-series data (metrics, logs, network streams, sensor readings) into an LLM, you’ve probably noticed that even small datasets can get expensive and slow very quickly.&lt;/p&gt;

&lt;p&gt;Not because the data is huge, but because of how it gets tokenized.&lt;/p&gt;

&lt;p&gt;This post is about a representation experiment we’ve been running to reduce that overhead, what didn’t work, and what seems to help.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem We Ran Into
&lt;/h2&gt;

&lt;p&gt;Time-series data is repetitive by nature:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;timestamps move forward predictably&lt;/li&gt;
&lt;li&gt;values often change slowly&lt;/li&gt;
&lt;li&gt;schema repeats on every row&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Humans immediately see the pattern but LLMs don’t.&lt;/p&gt;

&lt;p&gt;Most LLM tokenizers are optimized for natural language, not numerical streams. Two numbers that look almost identical to us can tokenize very differently. Repeating structure (timestamps, keys, braces) quietly eats context and cost.&lt;/p&gt;

&lt;p&gt;At small scale, it’s annoying, but at scale, it becomes an infrastructure problem.&lt;/p&gt;

&lt;h2&gt;
  
  
  What We Tried (and What Didn’t Help Much)
&lt;/h2&gt;

&lt;p&gt;Before building anything new, we tried existing formats:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;JSON (baseline)&lt;/li&gt;
&lt;li&gt;CSV (more compact, but still verbose)&lt;/li&gt;
&lt;li&gt;TOON (interesting idea, but still text that gets re-tokenized)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In practice, TOON didn’t materially reduce token usage once everything was still passed as plain text into an LLM. The structure was different, but the tokenizer behavior didn’t improve much.&lt;/p&gt;

&lt;p&gt;That was the key realization: compression alone isn’t the problem — tokenization is.&lt;/p&gt;

&lt;h2&gt;
  
  
  Math was our intuition
&lt;/h2&gt;

&lt;p&gt;If you’ve taken calculus or physics, this will feel familiar.&lt;/p&gt;

&lt;p&gt;Think about motion:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Position → where something is&lt;/li&gt;
&lt;li&gt;Velocity → how position changes&lt;/li&gt;
&lt;li&gt;Acceleration → how velocity changes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Now map that to time-series data:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;raw values = position&lt;/li&gt;
&lt;li&gt;differences between values = velocity (delta)&lt;/li&gt;
&lt;li&gt;differences between deltas = acceleration (delta-of-delta)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Most real-world time-series data has low acceleration. Values drift; timestamps tick forward regularly.&lt;/p&gt;

&lt;p&gt;So instead of repeating full values and timestamps, we started experimenting with representing changes.&lt;/p&gt;

&lt;h2&gt;
  
  
  TSLN: A Token-Aware Representation
&lt;/h2&gt;

&lt;p&gt;That experiment turned into &lt;strong&gt;TSLN (Time-Series Lean Notation)&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;At a high level, it’s a text-based serialization that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;stores deltas instead of repeating full values&lt;/li&gt;
&lt;li&gt;stores delta-of-delta for regular timestamps&lt;/li&gt;
&lt;li&gt;declares schema once instead of repeating it&lt;/li&gt;
&lt;li&gt;stays human-readable and streamable&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The key difference from “just compression” is that it’s designed to be tokenization-aware. Smaller, bounded numbers and less repeated syntax lead to far fewer tokens once the model sees the input.&lt;/p&gt;

&lt;p&gt;In early benchmarks, the same datasets used up to ~80% fewer tokens compared to JSON. That directly translated into lower cost and better effective context windows when calling LLMs.&lt;/p&gt;

&lt;h2&gt;
  
  
  Code and Next Steps
&lt;/h2&gt;

&lt;p&gt;We’ve open-sourced &lt;a href="https://github.com/turboline-ai/tsln-golang" rel="noopener noreferrer"&gt;Go&lt;/a&gt; and &lt;a href="https://github.com/turboline-ai/tsln-node" rel="noopener noreferrer"&gt;Node.js&lt;/a&gt; implementations under the MIT license so it’s easy to experiment or drop into existing pipelines.&lt;/p&gt;

&lt;p&gt;I’m currently expanding the benchmarks across more datasets, tokenizers, and workloads, and plan to publish a more formal preprint once that’s done.&lt;/p&gt;

&lt;p&gt;If you work with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;time-series data&lt;/li&gt;
&lt;li&gt;LLM pipelines&lt;/li&gt;
&lt;li&gt;serialization or streaming systems&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I’d genuinely love feedback — especially edge cases, comparisons we should run, or prior art I may have missed.&lt;/p&gt;

</description>
      <category>llm</category>
      <category>iot</category>
      <category>timeseries</category>
      <category>datascience</category>
    </item>
    <item>
      <title>I Benchmarked LLM APIs on Live BGP Streams. Here’s What Actually Matters.</title>
      <dc:creator>Manas Mudbari</dc:creator>
      <pubDate>Sun, 28 Dec 2025 17:01:36 +0000</pubDate>
      <link>https://forem.com/manasmudbari/i-benchmarked-llm-apis-on-live-bgp-streams-heres-what-actually-matters-2c3c</link>
      <guid>https://forem.com/manasmudbari/i-benchmarked-llm-apis-on-live-bgp-streams-heres-what-actually-matters-2c3c</guid>
      <description>&lt;p&gt;Most LLM benchmarks are polite.&lt;/p&gt;

&lt;p&gt;They run clean prompts on static text, measure token speed, and declare a winner. That’s fine if you’re building a chatbot. It’s almost useless if you’re building a real-time system.&lt;/p&gt;

&lt;p&gt;I wanted to see what happens when LLMs are exposed to something messier: live, high-velocity network telemetry.&lt;/p&gt;

&lt;p&gt;So I wired multiple LLM APIs directly into a live BGP stream and measured how they behaved when the data never stopped.&lt;/p&gt;

&lt;p&gt;This post is about what broke, what worked, and why “smartest model” is often the wrong question.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Setup (Simple, No Tricks)
&lt;/h2&gt;

&lt;p&gt;The data source was a live BGP feed from RIPE RIS:&lt;/p&gt;

&lt;p&gt;WebSocket endpoint:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;wss://ris-live.ripe.net/v1/ws/?client=turbomart-test&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Subscription message:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;{
  "type": "ris_subscribe",
  "data": { "host": "rrc21" }
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That gives you a continuous firehose of routing updates. No batching. No backpressure help.&lt;/p&gt;

&lt;p&gt;Each update was sent to five LLM APIs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;OpenAI&lt;/li&gt;
&lt;li&gt;Anthropic&lt;/li&gt;
&lt;li&gt;Azure OpenAI&lt;/li&gt;
&lt;li&gt;Gemini&lt;/li&gt;
&lt;li&gt;Grok&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Same prompts. Same parameters. No model-specific tuning.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;System prompt:&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;You are an expert network engineer who analyzes BGP feeds for a living…&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;User prompt:&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Summarize the following BGP update in under 140 characters for a real-time network alert. Include ASN owner, prefix, and region if known.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;If a model failed, truncated output, or rambled, that was considered part of the result.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I Measured (The Stuff That Actually Hurts in Production)
&lt;/h2&gt;

&lt;p&gt;I didn’t care about abstract “intelligence.” I cared about things that break pipelines:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Time to First Token (TTFT)&lt;/li&gt;
&lt;li&gt;Total completion latency&lt;/li&gt;
&lt;li&gt;Tokens in vs tokens out&lt;/li&gt;
&lt;li&gt;Compression ratio (output tokens divided by input tokens)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These metrics determine:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;how stale your alerts are&lt;/li&gt;
&lt;li&gt;whether your buffers explode&lt;/li&gt;
&lt;li&gt;whether you burn money on filler text&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;The Averages (Across All Samples)&lt;/p&gt;

&lt;p&gt;Here’s what the numbers looked like when averaged per provider:&lt;/p&gt;

&lt;p&gt;OpenAI&lt;br&gt;
TTFT ~830 ms&lt;br&gt;
Total latency ~1.8 s&lt;br&gt;
Tokens out ~45&lt;br&gt;
Compression ~0.01&lt;/p&gt;

&lt;p&gt;Anthropic&lt;br&gt;
TTFT ~2.1 s&lt;br&gt;
Total latency ~6.3 s&lt;br&gt;
Tokens out ~137&lt;br&gt;
Compression ~0.03&lt;/p&gt;

&lt;p&gt;Azure OpenAI&lt;br&gt;
TTFT ~2.8 s&lt;br&gt;
Total latency ~2.8 s&lt;br&gt;
Tokens out ~9,400&lt;br&gt;
Compression ~1.85&lt;/p&gt;

&lt;p&gt;Gemini&lt;br&gt;
TTFT ~3.0 s&lt;br&gt;
Total latency ~3.4 s&lt;br&gt;
Tokens out ~9,600&lt;br&gt;
Compression ~1.84&lt;/p&gt;

&lt;p&gt;Grok&lt;br&gt;
TTFT ~19 s&lt;br&gt;
Total latency ~19.7 s&lt;br&gt;
Tokens out ~33&lt;br&gt;
Compression ~0.01&lt;/p&gt;

&lt;p&gt;Even without context, some things should already look alarming.&lt;/p&gt;


&lt;h2&gt;
  
  
  Model-by-Model: What Actually Happened
&lt;/h2&gt;
&lt;h3&gt;
  
  
  OpenAI
&lt;/h3&gt;

&lt;p&gt;OpenAI behaved exactly how you’d want in a streaming system:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;fast first token&lt;/li&gt;
&lt;li&gt;short, clean summaries&lt;/li&gt;
&lt;li&gt;almost no wasted output&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It followed the prompt closely and didn’t try to be clever. That’s a feature, not a bug.&lt;/p&gt;

&lt;p&gt;If you’re building dashboards, alerts, or anything user-facing in real time, OpenAI was the most predictable option.&lt;/p&gt;

&lt;p&gt;

&lt;iframe src="https://player.vimeo.com/video/1149869786" width="710" height="399"&gt;
&lt;/iframe&gt;


&lt;/p&gt;




&lt;h3&gt;
  
  
  Anthropic
&lt;/h3&gt;

&lt;p&gt;Anthropic did something different.&lt;/p&gt;

&lt;p&gt;It didn’t just summarize updates. It tried to interpret them. Sometimes it flagged anomalies. Sometimes it suggested what might be happening.&lt;/p&gt;

&lt;p&gt;That extra reasoning came at a cost:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;slower responses&lt;/li&gt;
&lt;li&gt;significantly more tokens&lt;/li&gt;
&lt;li&gt;longer completions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is not an alerting engine. It’s closer to a junior analyst reading the feed.&lt;/p&gt;

&lt;p&gt;Great for offline analysis. Dangerous for live alerts.&lt;/p&gt;

&lt;p&gt;

&lt;iframe src="https://player.vimeo.com/video/1149869998" width="710" height="399"&gt;
&lt;/iframe&gt;


&lt;/p&gt;




&lt;h3&gt;
  
  
  Azure OpenAI
&lt;/h3&gt;

&lt;p&gt;Azure OpenAI struggled in this setup.&lt;/p&gt;

&lt;p&gt;It often behaved as if it only partially understood the incoming data. Output was verbose, repetitive, and sometimes ignored the summarization constraint entirely.&lt;/p&gt;

&lt;p&gt;The compression ratio tells the story: output was often larger than input.&lt;/p&gt;

&lt;p&gt;That’s a red flag in any streaming system.&lt;/p&gt;

&lt;p&gt;I suspect this can be fixed with tighter controls, but out of the box it wasn’t stream-safe.&lt;/p&gt;

&lt;p&gt;

&lt;iframe src="https://player.vimeo.com/video/1149869956" width="710" height="399"&gt;
&lt;/iframe&gt;


&lt;/p&gt;




&lt;h3&gt;
  
  
  Gemini
&lt;/h3&gt;

&lt;p&gt;Gemini responses were usually fast enough, but often incomplete.&lt;/p&gt;

&lt;p&gt;Some outputs were truncated. Others were short but low-value. Many wasted tokens without adding useful signal.&lt;/p&gt;

&lt;p&gt;It felt optimized for short Q&amp;amp;A, not for interpreting structured telemetry.&lt;/p&gt;

&lt;p&gt;If you’re processing logs or metrics streams, Gemini isn’t there yet.&lt;/p&gt;

&lt;p&gt;

&lt;iframe src="https://player.vimeo.com/video/1149869909" width="710" height="399"&gt;
&lt;/iframe&gt;


&lt;/p&gt;




&lt;h3&gt;
  
  
  Grok
&lt;/h3&gt;

&lt;p&gt;Grok was the strangest.&lt;/p&gt;

&lt;p&gt;Responses were extremely slow to start, but very short once they arrived. Often it just signaled that something changed, without explaining what or why.&lt;/p&gt;

&lt;p&gt;Think of it as a “delta detector,” not a summarizer.&lt;/p&gt;

&lt;p&gt;If your use case is “ping me when anything changes,” maybe.&lt;br&gt;
If you need explanation, no.&lt;/p&gt;

&lt;p&gt;

&lt;iframe src="https://player.vimeo.com/video/1149869836" width="710" height="399"&gt;
&lt;/iframe&gt;


&lt;/p&gt;




&lt;h2&gt;
  
  
  The Big Lesson
&lt;/h2&gt;

&lt;p&gt;LLM APIs are not interchangeable components.&lt;/p&gt;

&lt;p&gt;They encode assumptions about:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;how fast answers should arrive&lt;/li&gt;
&lt;li&gt;how verbose responses should be&lt;/li&gt;
&lt;li&gt;how much reasoning is appropriate&lt;/li&gt;
&lt;li&gt;how strictly prompts should be followed&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In real-time systems:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;latency beats intelligence&lt;/li&gt;
&lt;li&gt;consistency beats creativity&lt;/li&gt;
&lt;li&gt;token efficiency beats verbosity&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;An answer that arrives late is indistinguishable from noise.&lt;/p&gt;




&lt;h2&gt;
  
  
  If You’re Building a Streaming System
&lt;/h2&gt;

&lt;p&gt;Based on this experiment:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use OpenAI for real-time alerts and dashboards&lt;/li&gt;
&lt;li&gt;Use Anthropic for offline analysis or investigations&lt;/li&gt;
&lt;li&gt;Be cautious with Azure OpenAI unless you tightly constrain it&lt;/li&gt;
&lt;li&gt;Avoid Gemini for structured stream summarization&lt;/li&gt;
&lt;li&gt;Use Grok only if you care about “something changed,” not details&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>llm</category>
      <category>ai</category>
      <category>analytics</category>
    </item>
    <item>
      <title>Building a Terminal UI Broke My Brain</title>
      <dc:creator>Manas Mudbari</dc:creator>
      <pubDate>Mon, 15 Dec 2025 18:22:19 +0000</pubDate>
      <link>https://forem.com/manasmudbari/building-a-terminal-ui-broke-my-brain-hpc</link>
      <guid>https://forem.com/manasmudbari/building-a-terminal-ui-broke-my-brain-hpc</guid>
      <description>&lt;p&gt;I’ve spent most of my career building things for the browser.&lt;/p&gt;

&lt;p&gt;If something looks wrong, you open DevTools.&lt;br&gt;
If spacing is off, you inspect the DOM.&lt;br&gt;
If the layout is cursed, you tweak CSS until it stops yelling at you.&lt;/p&gt;

&lt;p&gt;So naturally, I thought building a Terminal UI (TUI) would be… similar.&lt;/p&gt;

&lt;p&gt;It was not!&lt;/p&gt;

&lt;p&gt;This post is part of me building in public while working on a project called &lt;a href="https://github.com/turboline-ai/turbostream" rel="noopener noreferrer"&gt;TurboStream&lt;/a&gt; — a developer tool that lets you connect high-velocity WebSocket streams (blockchains, BGP, finance feeds, etc.) to LLMs without draining tokens or crashing your system. &lt;/p&gt;

&lt;p&gt;Think:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;WebSocket → Cache → Triggers → LLM → Short human-readable alerts.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The backend was the easy part.&lt;br&gt;
The Terminal UI nearly ended me.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqdy63ikmn2nj1hthrvms.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqdy63ikmn2nj1hthrvms.png" alt="TUI Dashboard" width="800" height="482"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Coming From the Browser World
&lt;/h2&gt;

&lt;p&gt;In the web world:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Figma → HTML/CSS is mostly mechanical&lt;/li&gt;
&lt;li&gt;Layout is visual&lt;/li&gt;
&lt;li&gt;Debugging is interactive&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In Bubble Tea land:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Layout is math&lt;/li&gt;
&lt;li&gt;Padding is vibes&lt;/li&gt;
&lt;li&gt;“Why is this box 2 columns wider?” is a philosophical question&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;There’s no DOM inspector.&lt;br&gt;
There’s no “hover to see bounding box.”&lt;br&gt;
You change one &lt;code&gt;lipgloss.Style()&lt;/code&gt; and suddenly everything shifts.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Hardest Screen: AI Analysis
&lt;/h2&gt;

&lt;p&gt;The screen that caused the most pain was the AI Analysis panel.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxbzwfdq8rttsk3qxyh8t.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxbzwfdq8rttsk3qxyh8t.png" alt="AI Analysis Window" width="800" height="483"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Conceptually, it’s simple:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;show LLM context size&lt;/li&gt;
&lt;li&gt;show token usage&lt;/li&gt;
&lt;li&gt;show generation timing&lt;/li&gt;
&lt;li&gt;stream output text&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In reality:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;content height changes constantly&lt;/li&gt;
&lt;li&gt;widths need to stay aligned with sibling panels&lt;/li&gt;
&lt;li&gt;scrolling text + borders + padding fight each other&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I kept ending up with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;truncated text&lt;/li&gt;
&lt;li&gt;panels overflowing by 1 character&lt;/li&gt;
&lt;li&gt;borders misaligned depending on terminal width&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It looked fine at one size, then completely broke when resized.&lt;/p&gt;

&lt;p&gt;If you’ve used Bubble Tea, you know the feeling:&lt;br&gt;
&lt;strong&gt;“This should work… why doesn’t it?”&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  What I Learned (So Far)
&lt;/h2&gt;

&lt;p&gt;A few lessons that finally started to click:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Stop thinking in pixels
&lt;/h3&gt;

&lt;p&gt;Terminal layout is about constraints, not visuals.&lt;br&gt;
Everything is rows × columns. Nothing is free.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Measure everything explicitly
&lt;/h3&gt;

&lt;p&gt;If you don’t calculate width/height yourself, Bubble Tea will happily surprise you.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Borders lie
&lt;/h3&gt;

&lt;p&gt;Borders and padding count.&lt;br&gt;
That “one extra column” is always your fault.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Debugging TUIs requires instrumentation
&lt;/h3&gt;

&lt;p&gt;I started adding:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;temporary background colors&lt;/li&gt;
&lt;li&gt;width/height labels inside boxes&lt;/li&gt;
&lt;li&gt;fake content to stress layouts&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It felt ugly — but it worked.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I’m Doing Next
&lt;/h2&gt;

&lt;p&gt;To make this sane long-term, my next steps are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Create a layout debug mode&lt;/strong&gt;: Toggleable overlays that show component boundaries and sizes.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Write small layout test harnesses:&lt;/strong&gt; Instead of debugging inside the full app, isolate one view at a time.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Standardize layout contracts&lt;/strong&gt;: Every panel declares what it needs instead of “figuring it out.”&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Accept that TUI UX ≠ Web UX&lt;/strong&gt;: Different medium, different rules&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>cli</category>
      <category>tui</category>
    </item>
  </channel>
</rss>
