<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: divyaprakash D</title>
    <description>The latest articles on Forem by divyaprakash D (@divyaprakash_d_2d5d085bd4).</description>
    <link>https://forem.com/divyaprakash_d_2d5d085bd4</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1761017%2Ff2cb9107-da61-4982-977d-2099598f1e5d.jpg</url>
      <title>Forem: divyaprakash D</title>
      <link>https://forem.com/divyaprakash_d_2d5d085bd4</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/divyaprakash_d_2d5d085bd4"/>
    <language>en</language>
    <item>
      <title>Vitreus: Local-First Spreadsheet Intelligence with Gemma 4</title>
      <dc:creator>divyaprakash D</dc:creator>
      <pubDate>Sun, 24 May 2026 23:03:30 +0000</pubDate>
      <link>https://forem.com/divyaprakash_d_2d5d085bd4/vitreus-local-first-spreadsheet-intelligence-with-gemma-4-5aio</link>
      <guid>https://forem.com/divyaprakash_d_2d5d085bd4/vitreus-local-first-spreadsheet-intelligence-with-gemma-4-5aio</guid>
      <description>&lt;p&gt;&lt;em&gt;This is a submission for the &lt;a href="https://dev.to/challenges/google-gemma-2026-05-06"&gt;Gemma 4 Challenge: Build with Gemma 4&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;TL;DR:&lt;/strong&gt; Vitreus is a spreadsheet agent that lets you ask natural-language questions of CSV and XLSX workbooks, then uses Gemma 4 to return an auditable JSON action manifest. The manifest can highlight rows, write values, and add formulas; Vitreus then applies those changes to a CSV snapshot or a real &lt;code&gt;.xlsx&lt;/code&gt; file with colors and formulas preserved.&lt;/p&gt;

&lt;p&gt;I built it because spreadsheets are where a lot of real business logic lives: budgets, project trackers, sales forecasts, invoices, HR reviews, and messy exported reports. But the moment you ask an AI assistant to "just update the sheet," you run into a trust problem: what exactly did it change, and why?&lt;/p&gt;

&lt;p&gt;Vitreus answers that by splitting the job in two:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Gemma 4 reasons about the workbook.&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;A deterministic driver applies only structured, reviewable actions.&lt;/strong&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The model never mutates the workbook directly.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I Built
&lt;/h2&gt;

&lt;p&gt;Vitreus is a local-first spreadsheet intelligence tool for developers, analysts, and teams who want AI help inside sensitive workbook workflows without giving the model unrestricted control.&lt;/p&gt;

&lt;p&gt;The current project includes:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Capability&lt;/th&gt;
&lt;th&gt;Status&lt;/th&gt;
&lt;th&gt;Why it matters&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;CSV workbook snapshots&lt;/td&gt;
&lt;td&gt;Working&lt;/td&gt;
&lt;td&gt;Fast, portable test format for spreadsheet data&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;XLSX input and output&lt;/td&gt;
&lt;td&gt;Working&lt;/td&gt;
&lt;td&gt;Preserves cell values, formulas, and highlight colors&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;One-shot &lt;code&gt;analyze --output&lt;/code&gt; flow&lt;/td&gt;
&lt;td&gt;Working&lt;/td&gt;
&lt;td&gt;No separate "generate JSON" and "apply JSON" steps&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;JSON action manifests&lt;/td&gt;
&lt;td&gt;Working&lt;/td&gt;
&lt;td&gt;Every model action is inspectable before execution&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Local Gemma 4 via Ollama&lt;/td&gt;
&lt;td&gt;Supported&lt;/td&gt;
&lt;td&gt;Private local inference path&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Google AI Studio backend&lt;/td&gt;
&lt;td&gt;Supported&lt;/td&gt;
&lt;td&gt;API-key path for users without local GPU access&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Deterministic fallback planner&lt;/td&gt;
&lt;td&gt;Working&lt;/td&gt;
&lt;td&gt;CI and demos can run without a model&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Chart/receipt image payload prep&lt;/td&gt;
&lt;td&gt;Working&lt;/td&gt;
&lt;td&gt;Foundation for multimodal spreadsheet workflows&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Bash and Nushell command references&lt;/td&gt;
&lt;td&gt;Working&lt;/td&gt;
&lt;td&gt;Easy manual testing on Linux/Nushell systems&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;At a high level:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;CSV/XLSX workbook
    |
    v
WorkbookSnapshot
    |
    v
compact JSON sheet context
    |
    v
Gemma 4 planner
    |
    v
JSON manifest: highlight / write_value / formula
    |
    v
InMemoryCalcDriver
    |
    v
CSV + sidecar JSON, or XLSX with colors/formulas
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is not a chatbot bolted onto a spreadsheet. It is a small, testable agent pipeline where the model is responsible for reasoning and the application is responsible for safe execution.&lt;/p&gt;




&lt;h2&gt;
  
  
  Demo
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Demo 1: Ask Gemma 4 to find budget problems
&lt;/h3&gt;

&lt;p&gt;Input file: &lt;code&gt;examples/sample_workbook.csv&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;It contains employee/project rows with columns like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Name,Department,Q1_Target,Q1_Actual,Q2_Target,Q2_Actual,Score,Status,Budget,Spent,Notes
Ada Lovelace,Research,120000,128000,130000,135000,94,On Track,200000,184000,
Alan Turing,Security,90000,87000,95000,91000,72,Under Review,110000,135000,
...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;uv run vitreus analyze examples/sample_workbook.csv &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="s2"&gt;"Highlight all rows where Spent exceeds Budget, and for each over-budget row write OVER BUDGET in the Notes column"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--backend&lt;/span&gt; google
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Actual output from the live API-key run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"primary"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"gemma4:31b"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"drafter"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"gemma4:4b"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"rationale"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Identified rows where Spent exceeds Budget: Alan Turing (Row 3) and Dennis Ritchie (Row 9). Applied highlights to these rows and updated the Notes column to 'OVER BUDGET'."&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"actions"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"highlight"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"range"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Sheet1!A3:K3"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"color"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"#f97316"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"reason"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Spent (135000) exceeds Budget (110000)"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"write_value"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"cell"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Sheet1!K3"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"value"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"OVER BUDGET"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"reason"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Spent exceeds Budget"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"highlight"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"range"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Sheet1!A9:K9"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"color"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"#f97316"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"reason"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Spent (108000) exceeds Budget (90000)"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"write_value"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"cell"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Sheet1!K9"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"value"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"OVER BUDGET"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"reason"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Spent exceeds Budget"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is the core Vitreus pattern: Gemma 4 does the semantic reasoning, but the output is still machine-checkable.&lt;/p&gt;

&lt;h3&gt;
  
  
  Demo 2: One command, real XLSX output
&lt;/h3&gt;

&lt;p&gt;CSV is useful, but CSV cannot store background colors or Excel formulas. So Vitreus supports one-shot XLSX export:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;uv run vitreus analyze examples/sample_workbook.csv &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="s2"&gt;"Highlight rows where Spent exceeds Budget in orange, write OVER BUDGET in the Notes column"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--backend&lt;/span&gt; google &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--output&lt;/span&gt; /tmp/vitreus_result.xlsx
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The result is a real Excel workbook. Open it in LibreOffice Calc or Excel and the highlighted rows are actually colored.&lt;/p&gt;

&lt;h3&gt;
  
  
  Demo 3: XLSX in, XLSX out
&lt;/h3&gt;

&lt;p&gt;I also created a larger workbook for testing:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;examples/test_workbook.xlsx
├── Sales        25 data rows, quota and commission formulas
├── Expenses     24 data rows, annual budget/actual formulas
└── HR_Reviews   18 data rows, rating/bonus formulas
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;uv run vitreus analyze examples/test_workbook.xlsx &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="s2"&gt;"In the Expenses sheet, highlight rows where Annual_Actual exceeds Annual_Budget"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--backend&lt;/span&gt; google &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--sheet&lt;/span&gt; Expenses &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--output&lt;/span&gt; /tmp/vitreus_expenses_result.xlsx
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Recent smoke-test result:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"applied"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"saved"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"/tmp/vitreus_xlsx_test.xlsx"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"errors"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[]}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Code
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;https://github.com/divyaprakash0426/vitreus
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Important files:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;File&lt;/th&gt;
&lt;th&gt;Responsibility&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;core/driver.py&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Workbook snapshots, CSV/XLSX loading, XLSX saving, manifest execution&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;core/reasoning.py&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Gemma 4 model policy, Ollama backend, Google AI Studio backend, manifest parsing&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;core/vision.py&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Image metadata and multimodal prompt payload preparation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;interfaces/cli.py&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Typer CLI commands: &lt;code&gt;models&lt;/code&gt;, &lt;code&gt;analyze&lt;/code&gt;, &lt;code&gt;apply-manifest&lt;/code&gt;, &lt;code&gt;vision&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;examples/sample_workbook.csv&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Small CSV test dataset&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;examples/test_workbook.xlsx&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Larger multi-sheet XLSX test workbook&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;examples/test_commands.sh&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Bash demo/test command reference&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;examples/test_commands.nu&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Nushell demo/test command reference&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;tests/&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;37 automated tests covering backends, CLI, driver, output, and vision&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Try it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/divyaprakash0426/vitreus.git
&lt;span class="nb"&gt;cd &lt;/span&gt;vitreus

uv &lt;span class="nb"&gt;sync&lt;/span&gt; &lt;span class="nt"&gt;--extra&lt;/span&gt; dev
uv run pytest &lt;span class="nt"&gt;-q&lt;/span&gt;
uv run vitreus models
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Run with Google AI Studio:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;uv &lt;span class="nb"&gt;sync&lt;/span&gt; &lt;span class="nt"&gt;--extra&lt;/span&gt; integrations
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;GEMINI_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"your_key_here"&lt;/span&gt;

uv run vitreus analyze examples/sample_workbook.csv &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="s2"&gt;"Flag rows where Spent exceeds Budget and write OVER BUDGET in Notes"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--backend&lt;/span&gt; google
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Run locally with Ollama:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;uv &lt;span class="nb"&gt;sync&lt;/span&gt; &lt;span class="nt"&gt;--extra&lt;/span&gt; integrations
ollama pull gemma4:31b

uv run vitreus analyze examples/sample_workbook.csv &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="s2"&gt;"Summarise department-level spending risks"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--backend&lt;/span&gt; ollama
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Use the lighter local model:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;uv run vitreus analyze examples/sample_workbook.csv &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="s2"&gt;"Highlight rows that need review"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--backend&lt;/span&gt; ollama &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--model&lt;/span&gt; gemma4:4b
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  How I Used Gemma 4
&lt;/h2&gt;

&lt;p&gt;Vitreus is designed around &lt;strong&gt;Gemma 4 31B Dense&lt;/strong&gt; as the primary model.&lt;/p&gt;

&lt;p&gt;Spreadsheet intelligence is a long-context reasoning task. A useful spreadsheet agent needs to:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Read many rows without losing the column semantics.&lt;/li&gt;
&lt;li&gt;Understand user intent expressed in plain English.&lt;/li&gt;
&lt;li&gt;Compare related fields like &lt;code&gt;Budget&lt;/code&gt; and &lt;code&gt;Spent&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Decide which cells or rows need attention.&lt;/li&gt;
&lt;li&gt;Generate valid spreadsheet references like &lt;code&gt;Sheet1!A3:K3&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Explain why each action is needed.&lt;/li&gt;
&lt;li&gt;Return strict JSON instead of prose.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That is why I chose the 31B Dense model as the default planner. It is the best fit for "read this workbook, understand the pattern, and produce a reliable action plan."&lt;/p&gt;

&lt;p&gt;The project still supports smaller Gemma 4 models as a deliberate secondary path:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model family&lt;/th&gt;
&lt;th&gt;Role in Vitreus&lt;/th&gt;
&lt;th&gt;Why&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Gemma 4 31B Dense&lt;/td&gt;
&lt;td&gt;Primary planner&lt;/td&gt;
&lt;td&gt;Best fit for long-context workbook reasoning&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Gemma 4 4B&lt;/td&gt;
&lt;td&gt;Drafter / edge assistant&lt;/td&gt;
&lt;td&gt;Lower latency for quick previews and constrained hardware&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Gemma 4 26B MoE&lt;/td&gt;
&lt;td&gt;Future throughput path&lt;/td&gt;
&lt;td&gt;Useful when many independent workbook requests need efficient routing&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The model policy is encoded directly in the project:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@dataclass&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;frozen&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;GemmaModelChoice&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;primary&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;drafter&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;rationale&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;

    &lt;span class="nd"&gt;@classmethod&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;default&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cls&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;GemmaModelChoice&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;cls&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;primary&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gemma4:31b&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;drafter&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gemma4:4b&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;rationale&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Gemma 4 31B Dense is the default because Vitreus needs local, &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;long-context workbook reasoning and stronger multimodal planning; &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Gemma 4 4B remains useful as a low-latency drafter on edge hardware.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The prompt requires JSON only:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;You are Vitreus, a spreadsheet intelligence agent running gemma4:31b.
Analyze the spreadsheet data below and respond with ONLY a valid JSON manifest.

Task: Highlight rows where Spent exceeds Budget.

Required JSON response shape:
{
  "model": {"primary": "gemma4:31b", "drafter": "gemma4:4b", "rationale": "..."},
  "actions": [
    {
      "type": "highlight|write_value|formula",
      "range": "Sheet1!A1:B2",
      "cell": "Sheet1!C2",
      "value": "...",
      "formula": "=SUM(A1:A10)",
      "color": "#f97316",
      "reason": "why this action is needed"
    }
  ]
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That contract is the heart of the project. Gemma 4 is not asked to "edit a spreadsheet." It is asked to produce a plan that Vitreus can inspect and execute.&lt;/p&gt;




&lt;h2&gt;
  
  
  How It Works Internally
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. WorkbookSnapshot: turn sheets into model context
&lt;/h3&gt;

&lt;p&gt;Vitreus reads CSV and XLSX files into a simple in-memory representation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@dataclass&lt;/span&gt;
&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;WorkbookSnapshot&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;sheets&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Any&lt;/span&gt;&lt;span class="p"&gt;]]]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The snapshot can load:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;WorkbookSnapshot&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_csv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;scores.csv&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;WorkbookSnapshot&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_xlsx&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;workbook.xlsx&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sheet_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Expenses&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;WorkbookSnapshot&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_file&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;workbook.xlsx&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sheet_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Sales&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then it exports the requested range as compact JSON:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"Name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Alan Turing"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"Budget"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;110000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"Spent"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;135000&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"Name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Dennis Ritchie"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"Budget"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;90000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"Spent"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;108000&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This gives Gemma 4 useful semantic structure: headers become keys, rows become records, and the model does not have to infer everything from raw cell coordinates.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Backends: local-first, cloud-optional
&lt;/h3&gt;

&lt;p&gt;Vitreus supports three execution modes:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Backend&lt;/th&gt;
&lt;th&gt;Command&lt;/th&gt;
&lt;th&gt;Use case&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Fallback&lt;/td&gt;
&lt;td&gt;default&lt;/td&gt;
&lt;td&gt;CI, demos, no model required&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Ollama&lt;/td&gt;
&lt;td&gt;&lt;code&gt;--backend ollama&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Local Gemma 4 inference&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Google AI Studio&lt;/td&gt;
&lt;td&gt;&lt;code&gt;--backend google&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;API-key run when local GPU is unavailable&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The backends are small adapter classes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;OllamaBackend&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;ollama&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;chat&lt;/span&gt;
        &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;}])&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;GoogleAIBackend&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;google&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;genai&lt;/span&gt;
        &lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;genai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;models&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate_content&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;contents&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The lazy imports are intentional. The core package can be installed and tested without Ollama, Google GenAI, LibreOffice, or cloud credentials.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Manifest execution: structured actions only
&lt;/h3&gt;

&lt;p&gt;The executor supports three action types:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"highlight"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"range"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Sheet1!A3:K3"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"color"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"#f97316"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"reason"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"..."&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"write_value"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"cell"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Sheet1!K3"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"value"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"OVER BUDGET"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"formula"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"cell"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Sheet1!J11"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"formula"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"=SUM(J2:J10)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"reason"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"..."&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Unsupported action types are rejected instead of silently ignored. This keeps the model inside a narrow, auditable capability boundary.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. CSV vs XLSX: the boring detail that mattered
&lt;/h3&gt;

&lt;p&gt;CSV cannot store background colors or formulas as spreadsheet formulas. Early testing made that painfully obvious: a manifest could say "highlight this row," but a CSV output could only store text.&lt;/p&gt;

&lt;p&gt;So Vitreus handles both formats explicitly:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Output&lt;/th&gt;
&lt;th&gt;Behavior&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;.csv&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Saves values and writes highlights to &lt;code&gt;&amp;lt;name&amp;gt;_highlights.json&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;.xlsx&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Saves values, formulas, and real cell background colors&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;That means users get a clear warning for CSV:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;CSV format cannot store cell colors or formulas.
write_value changes are saved in result.csv
Highlight colors -&amp;gt; result_highlights.json
Tip: use --output result.xlsx to preserve everything in one file.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And they get real spreadsheet formatting when they choose XLSX.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Technical Problems That Shaped the Project
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Problem 1: "AI changed my spreadsheet" is not good enough
&lt;/h3&gt;

&lt;p&gt;If a spreadsheet agent directly mutates a workbook, the user has to trust a black box.&lt;/p&gt;

&lt;p&gt;The fix was the manifest contract. Every action contains:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the action type,&lt;/li&gt;
&lt;li&gt;the exact cell or range,&lt;/li&gt;
&lt;li&gt;the value/formula/color,&lt;/li&gt;
&lt;li&gt;and the reason.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That makes it possible to log, review, diff, test, or reject model output before execution.&lt;/p&gt;

&lt;h3&gt;
  
  
  Problem 2: Local-first should not mean "local-only"
&lt;/h3&gt;

&lt;p&gt;My preferred path is Ollama with &lt;code&gt;gemma4:31b&lt;/code&gt;, but not every developer has a GPU available. I hit this myself while testing away from my GPU profile.&lt;/p&gt;

&lt;p&gt;So Vitreus supports both:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Local&lt;/span&gt;
uv run vitreus analyze examples/sample_workbook.csv &lt;span class="s2"&gt;"..."&lt;/span&gt; &lt;span class="nt"&gt;--backend&lt;/span&gt; ollama

&lt;span class="c"&gt;# API key&lt;/span&gt;
uv run vitreus analyze examples/sample_workbook.csv &lt;span class="s2"&gt;"..."&lt;/span&gt; &lt;span class="nt"&gt;--backend&lt;/span&gt; google
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The model interface stays the same. Only the backend changes.&lt;/p&gt;

&lt;h3&gt;
  
  
  Problem 3: Multi-sheet XLSX files are the real spreadsheet format
&lt;/h3&gt;

&lt;p&gt;CSV was useful for early tests, but real workbooks have sheets, formulas, styles, and business structure.&lt;/p&gt;

&lt;p&gt;The latest version added:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;WorkbookSnapshot&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_xlsx&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sheet_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Expenses&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;WorkbookSnapshot&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_xlsx&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;all_sheets&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;WorkbookSnapshot&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_file&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sheet_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Sales&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The CLI now accepts &lt;code&gt;.xlsx&lt;/code&gt; as input:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;uv run vitreus analyze examples/test_workbook.xlsx &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="s2"&gt;"In the Sales sheet, highlight reps below quota"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--sheet&lt;/span&gt; Sales &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--output&lt;/span&gt; /tmp/sales_review.xlsx
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Problem 4: Tests need to run without secret keys or local models
&lt;/h3&gt;

&lt;p&gt;The project has 37 automated tests. They cover:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;backend adapter construction,&lt;/li&gt;
&lt;li&gt;CLI flags,&lt;/li&gt;
&lt;li&gt;missing API key behavior,&lt;/li&gt;
&lt;li&gt;manifest parsing,&lt;/li&gt;
&lt;li&gt;CSV save behavior,&lt;/li&gt;
&lt;li&gt;XLSX values,&lt;/li&gt;
&lt;li&gt;XLSX formulas,&lt;/li&gt;
&lt;li&gt;XLSX cell colors,&lt;/li&gt;
&lt;li&gt;XLSX input loading,&lt;/li&gt;
&lt;li&gt;and the multi-sheet test workbook shape.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The deterministic fallback planner is not a replacement for Gemma 4. It exists so the execution pipeline can be tested without depending on a network call or a local model.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why This Is a Good Gemma 4 Use Case
&lt;/h2&gt;

&lt;p&gt;Gemma 4 is doing real work here. It is not decorative.&lt;/p&gt;

&lt;p&gt;The model is responsible for the part that is hard to encode as rules:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;understanding workbook headers,&lt;/li&gt;
&lt;li&gt;mapping natural-language requests to spreadsheet operations,&lt;/li&gt;
&lt;li&gt;comparing values across columns,&lt;/li&gt;
&lt;li&gt;deciding which rows need attention,&lt;/li&gt;
&lt;li&gt;generating formulas,&lt;/li&gt;
&lt;li&gt;and explaining each action.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The surrounding application does the parts software should do:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;loading files,&lt;/li&gt;
&lt;li&gt;constraining the action schema,&lt;/li&gt;
&lt;li&gt;validating JSON,&lt;/li&gt;
&lt;li&gt;applying known operations,&lt;/li&gt;
&lt;li&gt;preserving output formats,&lt;/li&gt;
&lt;li&gt;and keeping the workflow auditable.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That division is what makes the project useful. Gemma 4 supplies reasoning; Vitreus supplies guardrails.&lt;/p&gt;




&lt;h2&gt;
  
  
  Current Limitations
&lt;/h2&gt;

&lt;p&gt;Vitreus is already useful for CSV/XLSX workflows, but there are areas I would keep improving:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Area&lt;/th&gt;
&lt;th&gt;Current state&lt;/th&gt;
&lt;th&gt;Next step&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;LibreOffice live control&lt;/td&gt;
&lt;td&gt;Adapter planned from blueprint&lt;/td&gt;
&lt;td&gt;Wire PyUNO to a running Calc socket&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Multimodal receipts/charts&lt;/td&gt;
&lt;td&gt;Payload prep implemented&lt;/td&gt;
&lt;td&gt;Feed images into Gemma 4 multimodal backend&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Multi-sheet reasoning&lt;/td&gt;
&lt;td&gt;Sheet-specific input works&lt;/td&gt;
&lt;td&gt;Add whole-workbook summarization&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Formula safety&lt;/td&gt;
&lt;td&gt;Formula strings are written&lt;/td&gt;
&lt;td&gt;Add formula linting and policy controls&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Review UI&lt;/td&gt;
&lt;td&gt;Terminal-first&lt;/td&gt;
&lt;td&gt;Add a small manifest review screen&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The design intentionally keeps these as separable layers. The workbook reader, reasoning engine, and manifest executor can evolve independently.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I Learned
&lt;/h2&gt;

&lt;p&gt;The biggest lesson was that a good spreadsheet agent is less about "letting AI use Excel" and more about designing a trustworthy boundary between reasoning and execution.&lt;/p&gt;

&lt;p&gt;Gemma 4 is capable enough to understand messy tabular context and produce useful spreadsheet plans. But the application still needs to say:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;here is the allowed action vocabulary,&lt;/li&gt;
&lt;li&gt;here is the exact JSON shape,&lt;/li&gt;
&lt;li&gt;here is how output will be applied,&lt;/li&gt;
&lt;li&gt;and here is what happens when a format cannot represent an action.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is the difference between an impressive demo and a tool I would trust with a real workbook.&lt;/p&gt;




&lt;h2&gt;
  
  
  Acknowledgements
&lt;/h2&gt;

&lt;p&gt;Built with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Gemma 4 31B Dense as the primary reasoning model&lt;/li&gt;
&lt;li&gt;Gemma 4 4B as the drafter/edge model path&lt;/li&gt;
&lt;li&gt;Ollama for local model execution&lt;/li&gt;
&lt;li&gt;Google AI Studio for API-key testing&lt;/li&gt;
&lt;li&gt;Typer for the CLI&lt;/li&gt;
&lt;li&gt;openpyxl for XLSX input/output&lt;/li&gt;
&lt;li&gt;pytest for the test suite&lt;/li&gt;
&lt;li&gt;LibreOffice Calc as the target spreadsheet environment&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>devchallenge</category>
      <category>gemmachallenge</category>
      <category>gemma</category>
    </item>
    <item>
      <title>What I Learned Building a Local-First Spreadsheet Agent with Gemma 4</title>
      <dc:creator>divyaprakash D</dc:creator>
      <pubDate>Sun, 24 May 2026 22:41:57 +0000</pubDate>
      <link>https://forem.com/divyaprakash_d_2d5d085bd4/what-i-learned-building-a-local-first-spreadsheet-agent-with-gemma-4-ijb</link>
      <guid>https://forem.com/divyaprakash_d_2d5d085bd4/what-i-learned-building-a-local-first-spreadsheet-agent-with-gemma-4-ijb</guid>
      <description>&lt;p&gt;&lt;em&gt;This is a submission for the &lt;a href="https://dev.to/challenges/google-gemma-2026-05-06"&gt;Gemma 4 Challenge: Write About Gemma 4&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;I built a small project called &lt;strong&gt;Vitreus&lt;/strong&gt; to answer a practical question:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;What does a useful local-first AI agent look like when the data is not a chat history, but a spreadsheet?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Spreadsheets are deceptively hard. They look simple because they are rows and columns, but real spreadsheet work mixes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;business rules,&lt;/li&gt;
&lt;li&gt;messy headers,&lt;/li&gt;
&lt;li&gt;formulas,&lt;/li&gt;
&lt;li&gt;visual formatting,&lt;/li&gt;
&lt;li&gt;multiple sheets,&lt;/li&gt;
&lt;li&gt;hidden assumptions,&lt;/li&gt;
&lt;li&gt;and private data.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That made the project a good test case for Gemma 4. I did not want to use the model as a novelty wrapper. I wanted Gemma 4 to do the part that is genuinely hard: read tabular context, understand the user's intent, and produce a safe action plan.&lt;/p&gt;

&lt;p&gt;This post is the technical write-up behind that build: how I chose the Gemma 4 model, how I wired local and API backends, and the pattern I recommend if you want to let an LLM work with sensitive documents.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Core Idea: Reasoning Is Not Execution
&lt;/h2&gt;

&lt;p&gt;The most important design decision came before any code:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Gemma 4 should reason about the spreadsheet, but it should not directly mutate the spreadsheet.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Instead of giving the model a tool like "edit cell" and letting it operate freely, Vitreus asks Gemma 4 to return a JSON manifest:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"primary"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"gemma4:31b"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"drafter"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"gemma4:4b"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"rationale"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Why this model and these actions were selected."&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"actions"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"highlight"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"range"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Sheet1!A3:K3"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"color"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"#f97316"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"reason"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Spent exceeds Budget."&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"write_value"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"cell"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Sheet1!K3"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"value"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"OVER BUDGET"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"reason"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"The row needs review."&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The application then applies only the actions it understands:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;highlight&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;write_value&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;formula&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This separation makes the system easier to trust. The model can still be creative and semantic, but the executor stays deterministic and testable.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why Gemma 4 31B Dense Was the Right Default
&lt;/h2&gt;

&lt;p&gt;The Gemma 4 family spans different hardware and throughput needs:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Gemma 4 variant&lt;/th&gt;
&lt;th&gt;Best fit&lt;/th&gt;
&lt;th&gt;How I think about it&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;2B / 4B small models&lt;/td&gt;
&lt;td&gt;Edge, browser, mobile, fast drafts&lt;/td&gt;
&lt;td&gt;Great for low-latency helpers and constrained devices&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;31B Dense&lt;/td&gt;
&lt;td&gt;Local/server reasoning&lt;/td&gt;
&lt;td&gt;Best fit when the task needs stronger reasoning over larger context&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;26B MoE&lt;/td&gt;
&lt;td&gt;High-throughput reasoning&lt;/td&gt;
&lt;td&gt;Attractive when many requests need efficient expert routing&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;For Vitreus, I picked &lt;strong&gt;Gemma 4 31B Dense&lt;/strong&gt; as the primary model.&lt;/p&gt;

&lt;p&gt;Why? Because spreadsheet work is context-heavy.&lt;/p&gt;

&lt;p&gt;A spreadsheet request like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Highlight all rows where Spent exceeds Budget, and for each over-budget row
write OVER BUDGET in the Notes column.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;requires the model to:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Find the relevant columns.&lt;/li&gt;
&lt;li&gt;Compare values row by row.&lt;/li&gt;
&lt;li&gt;Preserve the row identity.&lt;/li&gt;
&lt;li&gt;Produce valid spreadsheet ranges.&lt;/li&gt;
&lt;li&gt;Write a clear reason for each action.&lt;/li&gt;
&lt;li&gt;Return strict JSON without extra prose.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That is not just classification. It is structured reasoning over tabular data.&lt;/p&gt;

&lt;p&gt;The 4B model still has a useful role. I treat it as a &lt;strong&gt;drafter&lt;/strong&gt;: good for previews, edge use, or lightweight suggestions. But for the final workbook plan, the 31B Dense model is the safer default.&lt;/p&gt;

&lt;p&gt;The model policy in Vitreus is explicit:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@dataclass&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;frozen&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;GemmaModelChoice&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;primary&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;drafter&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;rationale&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;

    &lt;span class="nd"&gt;@classmethod&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;default&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cls&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;GemmaModelChoice&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;cls&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;primary&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gemma4:31b&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;drafter&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gemma4:4b&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;rationale&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Gemma 4 31B Dense is the default because Vitreus needs local, &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;long-context workbook reasoning and stronger multimodal planning; &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Gemma 4 4B remains useful as a low-latency drafter on edge hardware.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That rationale is not just documentation. It is returned in the manifest so the user can see which model policy was used.&lt;/p&gt;




&lt;h2&gt;
  
  
  Running Gemma 4 Locally with Ollama
&lt;/h2&gt;

&lt;p&gt;The local path is the preferred Vitreus path because spreadsheets often contain sensitive data: salaries, invoices, forecasts, customer exports, sales reports, and internal review notes.&lt;/p&gt;

&lt;p&gt;Install dependencies:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;uv &lt;span class="nb"&gt;sync&lt;/span&gt; &lt;span class="nt"&gt;--extra&lt;/span&gt; integrations
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Pull the local model:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ollama pull gemma4:31b
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Run Vitreus with local Gemma 4:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;uv run vitreus analyze examples/sample_workbook.csv &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="s2"&gt;"Summarise department-level spending risks"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--backend&lt;/span&gt; ollama
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Use the smaller drafter model when you want a lighter local run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;uv run vitreus analyze examples/sample_workbook.csv &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="s2"&gt;"Highlight rows that need review"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--backend&lt;/span&gt; ollama &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--model&lt;/span&gt; gemma4:4b
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The backend adapter is deliberately tiny:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;OllamaBackend&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gemma4:31b&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;ollama&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;chat&lt;/span&gt;
        &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;}])&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This keeps the dependency boundary clean. The rest of the project does not need to know whether the model is local or remote.&lt;/p&gt;




&lt;h2&gt;
  
  
  Running Gemma 4 with an API Key
&lt;/h2&gt;

&lt;p&gt;Local inference is ideal for privacy, but it is not always practical. I was not always running in my GPU profile while testing, so I also added a Google AI Studio backend.&lt;/p&gt;

&lt;p&gt;Set the key:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;GEMINI_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"your_key_here"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Run with the cloud backend:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;uv run vitreus analyze examples/sample_workbook.csv &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="s2"&gt;"Highlight the top 3 performers by Score in green"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--backend&lt;/span&gt; google
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or pass the key inline:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;uv run vitreus analyze examples/sample_workbook.csv &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="s2"&gt;"Flag rows where Spent exceeds Budget"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--backend&lt;/span&gt; google &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--api-key&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$GEMINI_API_KEY&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The adapter:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;GoogleAIBackend&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gemma-4-31b-it&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;api_key&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;google&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;genai&lt;/span&gt;
        &lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;genai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;models&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate_content&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;contents&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This gives users two deployment modes:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Mode&lt;/th&gt;
&lt;th&gt;Command&lt;/th&gt;
&lt;th&gt;When to use&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Local&lt;/td&gt;
&lt;td&gt;&lt;code&gt;--backend ollama&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Sensitive data, local GPU/CPU available&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;API key&lt;/td&gt;
&lt;td&gt;&lt;code&gt;--backend google&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;No local model available, quick testing&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The prompt and manifest contract stay the same in both modes.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Prompt Contract
&lt;/h2&gt;

&lt;p&gt;The prompt is intentionally strict. It tells Gemma 4 that it is a spreadsheet intelligence agent and that the output must be JSON only:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;You are Vitreus, a spreadsheet intelligence agent running gemma4:31b.
Analyze the spreadsheet data below and respond with ONLY a valid JSON manifest
(no markdown, no explanation).

Task: Highlight rows where Spent exceeds Budget.

Sheet data:
[
  {
    "Name": "Alan Turing",
    "Budget": 110000,
    "Spent": 135000,
    "Notes": ""
  }
]

Required JSON response shape:
{
  "model": {"primary": "gemma4:31b", "drafter": "gemma4:4b", "rationale": "..."},
  "actions": [
    {
      "type": "highlight|write_value|formula",
      "range": "Sheet1!A1:B2",
      "cell": "Sheet1!C2",
      "value": "...",
      "formula": "=SUM(A1:A10)",
      "color": "#f97316",
      "reason": "why this action is needed"
    }
  ]
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The response is parsed as JSON:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;parse_manifest&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Any&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="n"&gt;fenced&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;```

(?:json)?\s*(\{.*?\})\s*

```&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;flags&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;DOTALL&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;candidate&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;fenced&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;group&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;fenced&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;candidate&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In a production system I would harden this further with schema validation, but even this first version shows the important pattern: the LLM output is data, not authority.&lt;/p&gt;




&lt;h2&gt;
  
  
  CSV Taught Me a Product Lesson
&lt;/h2&gt;

&lt;p&gt;The first version worked with CSV files. That was useful for tests, but then a practical issue surfaced:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;CSV cannot store colors.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A manifest can say:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"highlight"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"range"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Sheet1!A3:K3"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"color"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"#f97316"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;But if the output is CSV, there is nowhere to put that color.&lt;/p&gt;

&lt;p&gt;So Vitreus handles the limitation explicitly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;CSV output:
  result.csv
  result_highlights.json

XLSX output:
  result.xlsx with values, formulas, and cell fills
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That led to the one-shot output flow:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;uv run vitreus analyze examples/sample_workbook.csv &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="s2"&gt;"Highlight rows where Spent exceeds Budget in orange, write OVER BUDGET in Notes"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--backend&lt;/span&gt; google &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--output&lt;/span&gt; /tmp/vitreus_result.xlsx
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No separate "generate manifest" and "apply manifest" commands required.&lt;/p&gt;

&lt;p&gt;The command still uses the manifest internally; it just applies it immediately and writes the output workbook.&lt;/p&gt;




&lt;h2&gt;
  
  
  XLSX Made It Feel Real
&lt;/h2&gt;

&lt;p&gt;CSV is a good interchange format. XLSX is what spreadsheet users actually expect.&lt;/p&gt;

&lt;p&gt;The current Vitreus snapshot layer can now load Excel workbooks:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;WorkbookSnapshot&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_xlsx&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;examples/test_workbook.xlsx&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sheet_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Expenses&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;WorkbookSnapshot&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_xlsx&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;examples/test_workbook.xlsx&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;all_sheets&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;WorkbookSnapshot&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_file&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;examples/test_workbook.xlsx&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;sheet_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Sales&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And the CLI can round-trip XLSX:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;uv run vitreus analyze examples/test_workbook.xlsx &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="s2"&gt;"In the Expenses sheet, highlight rows where Annual_Actual exceeds Annual_Budget"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--backend&lt;/span&gt; google &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--sheet&lt;/span&gt; Expenses &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--output&lt;/span&gt; /tmp/vitreus_expenses_result.xlsx
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The generated test workbook has three sheets:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;examples/test_workbook.xlsx
├── Sales
├── Expenses
└── HR_Reviews
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This mattered because it moved the project from "LLM over CSV" toward a more realistic spreadsheet workflow.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Gemma 4 Is Good At in This Pattern
&lt;/h2&gt;

&lt;p&gt;From this project, I found Gemma 4 most useful for:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Header-aware reasoning
&lt;/h3&gt;

&lt;p&gt;The user does not have to say "compare column I and column J." They can say:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Find rows where Spent exceeds Budget.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Gemma 4 maps that request to the right fields.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Spreadsheet reference generation
&lt;/h3&gt;

&lt;p&gt;The output needs actual ranges:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Sheet1!A3:K3
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is small but important. If the model cannot translate semantic findings back to spreadsheet coordinates, it cannot drive a workbook.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Explanations attached to actions
&lt;/h3&gt;

&lt;p&gt;Every non-obvious action can include a &lt;code&gt;reason&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"reason"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Spent (135000) exceeds Budget (110000)"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That makes the manifest useful for audit logs and review UIs.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Flexible user intent
&lt;/h3&gt;

&lt;p&gt;The same pipeline can handle:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Highlight the top 3 performers by Score in green.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;or:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Add a SUM formula below the Spent column.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;or:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;In the Expenses sheet, highlight rows where Annual_Actual exceeds Annual_Budget.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The executor is narrow, but the language interface is flexible.&lt;/p&gt;




&lt;h2&gt;
  
  
  Where I Would Use Each Gemma 4 Variant
&lt;/h2&gt;

&lt;p&gt;If you are deciding which Gemma 4 model to use, here is the mental model I ended up with:&lt;/p&gt;

&lt;h3&gt;
  
  
  Use 2B / 4B when latency and hardware matter most
&lt;/h3&gt;

&lt;p&gt;Good for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;browser or mobile helpers,&lt;/li&gt;
&lt;li&gt;preview suggestions,&lt;/li&gt;
&lt;li&gt;small table summaries,&lt;/li&gt;
&lt;li&gt;command suggestions,&lt;/li&gt;
&lt;li&gt;offline edge workflows.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In Vitreus, the 4B model is the drafter path.&lt;/p&gt;

&lt;h3&gt;
  
  
  Use 31B Dense when reasoning quality matters most
&lt;/h3&gt;

&lt;p&gt;Good for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;long-context document analysis,&lt;/li&gt;
&lt;li&gt;spreadsheet reasoning,&lt;/li&gt;
&lt;li&gt;multi-step planning,&lt;/li&gt;
&lt;li&gt;structured JSON generation,&lt;/li&gt;
&lt;li&gt;local-first professional tools.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is the default Vitreus model.&lt;/p&gt;

&lt;h3&gt;
  
  
  Use 26B MoE when throughput matters
&lt;/h3&gt;

&lt;p&gt;Good for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;many independent requests,&lt;/li&gt;
&lt;li&gt;server workloads,&lt;/li&gt;
&lt;li&gt;high-volume classification/routing,&lt;/li&gt;
&lt;li&gt;batch analysis where expert routing can improve efficiency.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I did not make MoE the default because Vitreus is currently optimized around careful workbook reasoning, not high-throughput request serving. But it is an obvious future backend option.&lt;/p&gt;




&lt;h2&gt;
  
  
  Design Checklist for LLMs Working on Private Files
&lt;/h2&gt;

&lt;p&gt;If you are building something similar with Gemma 4, I recommend this checklist:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Do not let the model directly mutate private files.&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Make the model return structured data.&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Keep action types small and explicit.&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Require reasons for destructive or non-obvious actions.&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Make local inference the default path when privacy matters.&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Offer an API-key path for users without local hardware.&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Test the executor without requiring the model.&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Tell users when the output format cannot preserve an action.&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Prefer reviewable manifests over invisible automation.&lt;/strong&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The model should be powerful, but the boundary around it should be boring.&lt;/p&gt;

&lt;p&gt;That is a compliment. Boring boundaries are what make AI tools safe enough to use.&lt;/p&gt;




&lt;h2&gt;
  
  
  Try the Pattern Yourself
&lt;/h2&gt;

&lt;p&gt;Clone and run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/divyaprakash0426/vitreus.git
&lt;span class="nb"&gt;cd &lt;/span&gt;vitreus
uv &lt;span class="nb"&gt;sync&lt;/span&gt; &lt;span class="nt"&gt;--extra&lt;/span&gt; dev
uv run pytest &lt;span class="nt"&gt;-q&lt;/span&gt;
uv run vitreus models
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Run without any model:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;uv run vitreus analyze examples/sample_workbook.csv &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="s2"&gt;"Highlight rows that need review"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Run with Google AI Studio:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;uv &lt;span class="nb"&gt;sync&lt;/span&gt; &lt;span class="nt"&gt;--extra&lt;/span&gt; integrations
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;GEMINI_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"your_key_here"&lt;/span&gt;

uv run vitreus analyze examples/sample_workbook.csv &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="s2"&gt;"Highlight all rows where Spent exceeds Budget and write OVER BUDGET in Notes"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--backend&lt;/span&gt; google
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Run with local Ollama:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;uv &lt;span class="nb"&gt;sync&lt;/span&gt; &lt;span class="nt"&gt;--extra&lt;/span&gt; integrations
ollama pull gemma4:31b

uv run vitreus analyze examples/sample_workbook.csv &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="s2"&gt;"Summarise budget risks by department"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--backend&lt;/span&gt; ollama
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Write a real XLSX file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;uv run vitreus analyze examples/test_workbook.xlsx &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="s2"&gt;"In the Sales sheet, highlight reps whose Quota_Attainment is below 80 percent"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--backend&lt;/span&gt; google &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--sheet&lt;/span&gt; Sales &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--output&lt;/span&gt; /tmp/vitreus_sales_review.xlsx
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Final Thought
&lt;/h2&gt;

&lt;p&gt;The most interesting thing about Gemma 4 for me is not just that it can run locally or reason over longer context. It is that those capabilities change the shape of applications developers can build.&lt;/p&gt;

&lt;p&gt;With a strong local model, a spreadsheet assistant does not have to be a SaaS upload box. It can be:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;local-first,&lt;/li&gt;
&lt;li&gt;auditable,&lt;/li&gt;
&lt;li&gt;scriptable,&lt;/li&gt;
&lt;li&gt;privacy-aware,&lt;/li&gt;
&lt;li&gt;and still useful.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is the direction I want AI tooling to move: powerful models at the center, but surrounded by software engineering boundaries that users can understand.&lt;/p&gt;

&lt;p&gt;Vitreus is my first pass at that pattern for spreadsheets.&lt;/p&gt;

</description>
      <category>devchallenge</category>
      <category>gemmachallenge</category>
      <category>gemma</category>
    </item>
    <item>
      <title>ClawForge — A Trustworthy, Cross-Variant Skill Hub for the OpenClaw Ecosystem</title>
      <dc:creator>divyaprakash D</dc:creator>
      <pubDate>Sun, 26 Apr 2026 19:42:38 +0000</pubDate>
      <link>https://forem.com/divyaprakash_d_2d5d085bd4/clawforge-a-trustworthy-cross-variant-skill-hub-for-the-openclaw-ecosystem-518h</link>
      <guid>https://forem.com/divyaprakash_d_2d5d085bd4/clawforge-a-trustworthy-cross-variant-skill-hub-for-the-openclaw-ecosystem-518h</guid>
      <description>&lt;p&gt;&lt;em&gt;This is a submission for the &lt;a href="https://dev.to/challenges/openclaw-2026-04-16"&gt;OpenClaw Challenge&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Built
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;ClawForge&lt;/strong&gt; is a curated-from-scratch skill hub for the OpenClaw ecosystem — 30 hand-engineered skills, 6 variant installers, 3 persona "souls", a security trio, and a local cross-architecture evidence harness, all behind a single &lt;code&gt;SKILL.md&lt;/code&gt; standard.&lt;/p&gt;

&lt;p&gt;It exists because the obvious place to get OpenClaw skills today — &lt;strong&gt;ClawHub&lt;/strong&gt;, with its ~13,729 community uploads — has a trust problem. Roughly 1-in-5 of those skills ship patterns I wouldn't run on a borrowed laptop, and the &lt;strong&gt;ClawHavoc&lt;/strong&gt; campaign already published ~300 outright trojan skills before takedown. "Just &lt;code&gt;claw install whatever&lt;/code&gt;" is the new &lt;code&gt;curl | sudo bash&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;ClawForge is the boring, opinionated alternative:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;30 skills, not 30,000.&lt;/strong&gt; Every skill is engineered from scratch against one schema — no re-uploads, no forks of forks, no anonymous maintainers.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;6 variants, one installer.&lt;/strong&gt; OpenClaw, ZeroClaw, PicoClaw, NullClaw, NanoBot, IronClaw — each with different runtimes (Node, Rust, Go, Zig, Python, WASM) and different blast radii. One &lt;code&gt;install.sh&lt;/code&gt; figures out what your target actually supports.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Static security tiers, not vibes.&lt;/strong&gt; Every skill declares an &lt;code&gt;L0/L1/L2/L3&lt;/code&gt; tier in frontmatter — read-only, scoped-write, network, or privileged. Scanners reject anything that lies about its tier.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;A local QEMU evidence ladder.&lt;/strong&gt; I don't claim "works everywhere." I claim "x86_64 = emulated guest pass, arm64/armv7/riscv64 = usermode smoke pass, i386 = metadata only," with JSON+Markdown receipts.&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;Repo: &lt;strong&gt;&lt;a href="https://github.com/divyaprakash0426/clawForge" rel="noopener noreferrer"&gt;github.com/divyaprakash0426/clawForge&lt;/a&gt;&lt;/strong&gt;&lt;br&gt;
Release: &lt;strong&gt;&lt;a href="https://github.com/divyaprakash0426/clawForge/releases/tag/v1.0.0" rel="noopener noreferrer"&gt;v1.0.0&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;If ClawHub is npm circa 2018 — open, sprawling, and quietly malicious — ClawForge is trying to be the Debian stable of skills: smaller, slower, signed for what it claims to do.&lt;/p&gt;




&lt;h2&gt;
  
  
  How I Used OpenClaw
&lt;/h2&gt;

&lt;p&gt;OpenClaw's whole pitch is "AgentSkills are portable." The &lt;em&gt;interface&lt;/em&gt; really is portable — every variant speaks the same skill protocol. But &lt;strong&gt;execution isn't&lt;/strong&gt;. A skill that works on OpenClaw (1.5GB Node toolchain) will silently die on NullClaw (1MB Zig binary on a RISC-V board). The blueprint for ClawForge fell out of trying to make that real instead of marketing.&lt;/p&gt;

&lt;p&gt;Here's how OpenClaw's design actually shaped the build, and where I had to push back on it.&lt;/p&gt;

&lt;h3&gt;
  
  
  The &lt;code&gt;SKILL.md&lt;/code&gt; standard — front-loading every honest constraint
&lt;/h3&gt;

&lt;p&gt;Every skill in ClawForge is a directory with a single &lt;code&gt;SKILL.md&lt;/code&gt;. The frontmatter is the contract:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;skill-sentinel&lt;/span&gt;
&lt;span class="na"&gt;version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;1.2.0&lt;/span&gt;
&lt;span class="na"&gt;tier&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;L1&lt;/span&gt;                       &lt;span class="c1"&gt;# read-only audit, no network&lt;/span&gt;
&lt;span class="na"&gt;variants&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;openclaw&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;full&lt;/span&gt;
  &lt;span class="na"&gt;zeroclaw&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;full&lt;/span&gt;
  &lt;span class="na"&gt;picoclaw&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;full&lt;/span&gt;
  &lt;span class="na"&gt;nullclaw&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;partial&lt;/span&gt;             &lt;span class="c1"&gt;# no JSON schema validator on Zig build&lt;/span&gt;
  &lt;span class="na"&gt;nanobot&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;full&lt;/span&gt;
  &lt;span class="na"&gt;ironclaw&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;unsupported&lt;/span&gt;         &lt;span class="c1"&gt;# WASM sandbox blocks subprocess scan&lt;/span&gt;
&lt;span class="na"&gt;entrypoint&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;./run.sh&lt;/span&gt;
&lt;span class="na"&gt;permissions&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;fs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;read&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
  &lt;span class="na"&gt;net&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[]&lt;/span&gt;
  &lt;span class="na"&gt;exec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;grep&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;awk&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;jq&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Two design decisions worth defending:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Per-variant compatibility is &lt;code&gt;full | partial | unsupported&lt;/code&gt;, not a boolean.&lt;/strong&gt; Half the pain in the ecosystem comes from skills that pretend to be universal. &lt;code&gt;partial&lt;/code&gt; forces the author to write a &lt;code&gt;COMPAT.md&lt;/code&gt; explaining what's degraded and why.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tiers are static, not dynamic.&lt;/strong&gt; I prototyped a "risk score" early on. It was useless — every author rationalizes their own skill into the green zone. &lt;code&gt;L0–L3&lt;/code&gt; are coarse, declarative, and trivially auditable: if you ask for &lt;code&gt;net&lt;/code&gt; permissions in an &lt;code&gt;L1&lt;/code&gt; skill, the scanner just rejects you.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  The variant-aware installer
&lt;/h3&gt;

&lt;p&gt;The installer is plain POSIX shell, but it does the one thing OpenClaw itself can't: it refuses to install skills that won't run.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;./install.sh &lt;span class="nt"&gt;--variant&lt;/span&gt; nullclaw
&lt;span class="o"&gt;[&lt;/span&gt;forge] target: nullclaw &lt;span class="o"&gt;(&lt;/span&gt;zig, riscv64, ~1MB&lt;span class="o"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;[&lt;/span&gt;forge] resolving 30 skills against compatibility matrix…
&lt;span class="o"&gt;[&lt;/span&gt;forge]   ✓ 24 full
&lt;span class="o"&gt;[&lt;/span&gt;forge]   ⚠ 4 partial   &lt;span class="o"&gt;(&lt;/span&gt;degrade to L1, see COMPAT.md&lt;span class="o"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;[&lt;/span&gt;forge]   ✗ 2 unsupported  skipped: bedrock-rag, kube-scout
&lt;span class="o"&gt;[&lt;/span&gt;forge] staging to ~/.nullclaw/skills/  &lt;span class="o"&gt;(&lt;/span&gt;atomic&lt;span class="o"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;[&lt;/span&gt;forge] running skill-sentinel on staged tree…
&lt;span class="o"&gt;[&lt;/span&gt;forge] OK — 28 skills installed, 2 skipped, 0 quarantined
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No "install everything and hope." If a skill's frontmatter lies about its variant support, the matching scanner test fails in CI before the skill is ever published.&lt;/p&gt;

&lt;h3&gt;
  
  
  The security trio
&lt;/h3&gt;

&lt;p&gt;Three skills act as the distribution's immune system, and they all run on every PR:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Skill&lt;/th&gt;
&lt;th&gt;Job&lt;/th&gt;
&lt;th&gt;Tier&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;skill-sentinel&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Static analysis of every &lt;code&gt;SKILL.md&lt;/code&gt; + entrypoint — tier/permission lies, shell injection patterns, suspicious network calls&lt;/td&gt;
&lt;td&gt;L1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;prompt-fence&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Scans souls/personas for prompt-injection payloads and hidden instructions&lt;/td&gt;
&lt;td&gt;L1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;secret-guard&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Pre-commit + CI scanner for committed secrets, mirrored config across local + CI&lt;/td&gt;
&lt;td&gt;L1&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Sentinel output is intentionally loud:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[skill-sentinel] scanning ./skills/arxiv-scout
  ✗ FAILED  declares tier=L1 but entrypoint contains: curl https://...
  → tier L1 forbids network egress. Either:
      (a) bump declared tier to L2 and document it in COMPAT.md, or
      (b) move the fetch into a separate L2 helper skill.
exit 1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is the part OpenClaw doesn't give you out of the box — and the part I'd argue &lt;em&gt;every&lt;/em&gt; claw distribution needs before it goes near a production box.&lt;/p&gt;

&lt;h3&gt;
  
  
  The scaffolder — keeping the schema honest
&lt;/h3&gt;

&lt;p&gt;The whole standard collapses if writing a compliant skill is harder than writing a non-compliant one. So:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;./tools/forge-skill.sh my-new-skill &lt;span class="nt"&gt;--tier&lt;/span&gt; L1 &lt;span class="nt"&gt;--variants&lt;/span&gt; full,full,partial,unsupported,full,full
&lt;span class="o"&gt;[&lt;/span&gt;forge-skill] generated skills/my-new-skill/
&lt;span class="o"&gt;[&lt;/span&gt;forge-skill]   SKILL.md       &lt;span class="o"&gt;(&lt;/span&gt;frontmatter + checklist&lt;span class="o"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;[&lt;/span&gt;forge-skill]   run.sh         &lt;span class="o"&gt;(&lt;/span&gt;entrypoint stub with safe defaults&lt;span class="o"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;[&lt;/span&gt;forge-skill]   COMPAT.md      &lt;span class="o"&gt;(&lt;/span&gt;one section per non-full variant&lt;span class="o"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;[&lt;/span&gt;forge-skill]   tests/smoke.sh &lt;span class="o"&gt;(&lt;/span&gt;skill-sentinel + prompt-fence dry run&lt;span class="o"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;[&lt;/span&gt;forge-skill] new skill ready &lt;span class="k"&gt;in &lt;/span&gt;27s. Run: ./tools/validate_skills.py my-new-skill
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A new compliant skill, end-to-end, in under 30 seconds. The validators (&lt;code&gt;tools/validate_skills.py&lt;/code&gt;, &lt;code&gt;tools/validate_souls.py&lt;/code&gt;) are stricter than the scaffolder — so the only path that passes CI is the one that passes the scaffolder &lt;em&gt;and&lt;/em&gt; the validators.&lt;/p&gt;

&lt;h3&gt;
  
  
  The cross-architecture evidence harness
&lt;/h3&gt;

&lt;p&gt;This is the part I'm proudest of, and the part that's deliberately &lt;strong&gt;not&lt;/strong&gt; in the public repo.&lt;/p&gt;

&lt;p&gt;OpenClaw runs on a deeply heterogeneous fleet — laptops, Pi-class boards, RISC-V dev kits, sandboxed WASM. "It works on my Arch x86_64 box" is a useless claim for a skill hub. So I built a local-only harness under &lt;code&gt;.local/clawforge-vm/&lt;/code&gt; (gitignored) that produces honest receipts:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;.local/clawforge-vm/
├── run-host-lane.sh             # 21 deterministic checks on the host
├── bootstrap-e2e-vm.sh          # disposable Ubuntu Jammy KVM guest, reruns the host lane inside
├── run-usermode-arch-lane.sh    # qemu-*-static smoke for arm64, armv7, riscv64
└── report-arch-evidence.sh      # emits JSON + Markdown summary
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The host lane runs all 6 variant installers, every validator, every scanner, the soul installer, scaffolder round-trip, and mock flows for &lt;code&gt;kube-scout&lt;/code&gt;, &lt;code&gt;permission-lens&lt;/code&gt;, &lt;code&gt;docker-hygiene&lt;/code&gt;, &lt;code&gt;arxiv-scout&lt;/code&gt;, &lt;code&gt;bedrock-rag&lt;/code&gt;, and &lt;code&gt;tf-copilot&lt;/code&gt; — 21 checks, deterministic, on every host change.&lt;/p&gt;

&lt;p&gt;The arch lane does what it can do honestly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[arch-evidence] summary
  x86_64   : emulated-guest  ✓  21/21 checks (Ubuntu Jammy KVM)
  arm64    : usermode-smoke  ✓  installers + sentinel  (Ubuntu Jammy rootfs)
  armv7    : usermode-smoke  ✓  installers + sentinel  (Alpine 3.20 rootfs)
  riscv64  : usermode-smoke  ✓  installers + sentinel  (Ubuntu Jammy rootfs)
  i386     : metadata-only   —  no rootfs staged, frontmatter-only
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it. No "fully supported" claims I can't back up. The same script writes a JSON receipt I can check into the next release notes.&lt;/p&gt;




&lt;h2&gt;
  
  
  Demo
&lt;/h2&gt;

&lt;p&gt;  &lt;iframe src="https://www.youtube.com/embed/fc9VKQE1Xtk"&gt;
  &lt;/iframe&gt;
&lt;/p&gt;

&lt;h3&gt;
  
  
  End-to-end proof — running inside a disposable QEMU VM
&lt;/h3&gt;

&lt;p&gt;The GIF below is an unedited asciinema recording of the full ClawForge pipeline running &lt;strong&gt;inside a fresh Ubuntu Jammy KVM guest&lt;/strong&gt; — not on my host machine. 21 deterministic checks. Zero edits.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx7n6rysnhcpepfrmg85c.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx7n6rysnhcpepfrmg85c.gif" alt="ClawForge end-to-end test inside QEMU VM — 21 checks passing" width="600" height="354"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In the meantime, the project is fully runnable today:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Repo:&lt;/strong&gt; &lt;a href="https://github.com/divyaprakash0426/clawForge" rel="noopener noreferrer"&gt;github.com/divyaprakash0426/clawForge&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Release:&lt;/strong&gt; &lt;a href="https://github.com/divyaprakash0426/clawForge/releases/tag/v1.0.0" rel="noopener noreferrer"&gt;v1.0.0&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Topics:&lt;/strong&gt; &lt;code&gt;openclaw&lt;/code&gt; · &lt;code&gt;claw-ecosystem&lt;/code&gt; · &lt;code&gt;ai-tools&lt;/code&gt; · &lt;code&gt;skill-hub&lt;/code&gt; · &lt;code&gt;security&lt;/code&gt; · &lt;code&gt;shell&lt;/code&gt; · &lt;code&gt;open-source&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Try it in 60 seconds
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Clone&lt;/span&gt;
git clone https://github.com/divyaprakash0426/clawForge.git
&lt;span class="nb"&gt;cd &lt;/span&gt;clawForge

&lt;span class="c"&gt;# Install the full skill set against your detected variant&lt;/span&gt;
./install.sh &lt;span class="nt"&gt;--variant&lt;/span&gt; openclaw     &lt;span class="c"&gt;# or zeroclaw, picoclaw, nullclaw, nanobot, ironclaw&lt;/span&gt;

&lt;span class="c"&gt;# Or pull a single skill into an existing claw install&lt;/span&gt;
./install-skill.sh skill-sentinel &lt;span class="nt"&gt;--variant&lt;/span&gt; zeroclaw

&lt;span class="c"&gt;# Scaffold your own compliant skill in under 30 seconds&lt;/span&gt;
./tools/forge-skill.sh my-skill &lt;span class="nt"&gt;--tier&lt;/span&gt; L1

&lt;span class="c"&gt;# Validate the entire repo (what CI runs on every PR)&lt;/span&gt;
python3 tools/validate_skills.py
python3 tools/validate_souls.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  What I Learned
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. AgentSkills are portable. Skill &lt;em&gt;execution&lt;/em&gt; isn't.
&lt;/h3&gt;

&lt;p&gt;The single biggest mismatch between OpenClaw's marketing and reality: every variant agrees on the skill &lt;em&gt;interface&lt;/em&gt;, but the skills themselves silently assume Node, or &lt;code&gt;bash&lt;/code&gt;, or &lt;code&gt;curl&lt;/code&gt;, or 200MB of RAM. The &lt;code&gt;variants: full | partial | unsupported&lt;/code&gt; field only exists because I tried to claim "universal" once and got humiliated by NullClaw.&lt;/p&gt;

&lt;p&gt;If you're building anything for a multi-variant ecosystem: front-load the compatibility table in machine-readable frontmatter, then make your installer &lt;em&gt;refuse&lt;/em&gt; to lie.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Static tiers beat dynamic risk scores every time.
&lt;/h3&gt;

&lt;p&gt;I genuinely tried the dynamic-scoring approach first — heuristics over the entrypoint, weighted permission analysis, "danger floats." It was a beautiful dashboard producing useless numbers. Authors gamed it within an afternoon.&lt;/p&gt;

&lt;p&gt;Four flat tiers (&lt;code&gt;L0&lt;/code&gt; read-only metadata, &lt;code&gt;L1&lt;/code&gt; read-only fs, &lt;code&gt;L2&lt;/code&gt; scoped write + network, &lt;code&gt;L3&lt;/code&gt; privileged) are coarse enough that arguing about them is pointless and fine-grained enough that the scanner can mechanically reject lies. Boring, declarative, &lt;em&gt;correct&lt;/em&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. The harness finds the bugs that demos hide.
&lt;/h3&gt;

&lt;p&gt;Three real bugs the local lane caught that I would 100% have shipped otherwise:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;bedrock-rag&lt;/code&gt; mock path bug&lt;/strong&gt; — relative-path assumption that worked from the repo root, broke instantly when the installer staged it under &lt;code&gt;~/.openclaw/skills/&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;arxiv-scout&lt;/code&gt; non-determinism&lt;/strong&gt; — model-eval cache used a hash of the wall-clock minute as a salt. Passed once locally, failed every CI re-run inside the QEMU guest.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Alpine busybox absolute-symlink trap on armv7&lt;/strong&gt; — busybox's &lt;code&gt;ln -s&lt;/code&gt; resolves absolute symlinks against the host root, not the rootfs. The arm64 Ubuntu lane was happy. The armv7 Alpine lane was a pile of dangling links until I switched to relative symlinks everywhere.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;None of these would have been caught by "looks fine on my machine." Build the harness early. Even a usermode-smoke lane is worth ten "trust me" tweets.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. "30 engineered" beats "300 curated" for a security story.
&lt;/h3&gt;

&lt;p&gt;The first plan was bigger — pull in the most-starred ClawHub skills, audit them, re-ship. I killed that within a week. Auditing someone else's skill is roughly the same effort as writing one from scratch, and you inherit their bus factor and their git history. 30 skills I wrote, against one schema, with one threat model, is a story I can defend. 300 curated ones is a story I'd have to keep apologizing for.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. The scaffolder is the standard.
&lt;/h3&gt;

&lt;p&gt;A schema is just a wishlist until the easiest path to a new skill is also the &lt;em&gt;only&lt;/em&gt; compliant path. &lt;code&gt;forge-skill.sh&lt;/code&gt; exists because every skill I added to the repo by hand had at least one frontmatter typo. Now the scaffolder produces compliant skills, and the validators reject anything that drifts. The standard enforces itself.&lt;/p&gt;




&lt;h2&gt;
  
  
  ClawCon Michigan
&lt;/h2&gt;

&lt;p&gt;I didn't make it to ClawCon Michigan in person this round — wrong continent, wrong calendar week — but the energy coming out of it (the talks, the trip reports, the ClawHavoc post-mortems, the variant maintainers comparing notes in public) is most of what made me stop hand-wringing and actually ship ClawForge. If you were there: the unsigned-skill-distribution debate from day two is the reason &lt;code&gt;skill-sentinel&lt;/code&gt; is opinionated instead of advisory. Thank you. I owe you all a drink at the next one.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Built with stubbornness, a lot of `qemu-system-&lt;/em&gt;`, and a deep suspicion of any skill hub that has more skills than it has reviewers.*&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Repo:&lt;/strong&gt; &lt;a href="https://github.com/divyaprakash0426/clawForge" rel="noopener noreferrer"&gt;github.com/divyaprakash0426/clawForge&lt;/a&gt; · &lt;strong&gt;Release:&lt;/strong&gt; &lt;a href="https://github.com/divyaprakash0426/clawForge/releases/tag/v1.0.0" rel="noopener noreferrer"&gt;v1.0.0&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Comments, PRs, and especially "your tier system is wrong because…" arguments are very welcome.&lt;/p&gt;

</description>
      <category>devchallenge</category>
      <category>openclawchallenge</category>
      <category>openclaw</category>
      <category>architecture</category>
    </item>
    <item>
      <title>Why the Claw ecosystem needs a skill commons — and how I built one</title>
      <dc:creator>divyaprakash D</dc:creator>
      <pubDate>Sun, 26 Apr 2026 12:08:55 +0000</pubDate>
      <link>https://forem.com/divyaprakash_d_2d5d085bd4/why-the-claw-ecosystem-needs-a-skill-commons-and-how-i-built-one-1caa</link>
      <guid>https://forem.com/divyaprakash_d_2d5d085bd4/why-the-claw-ecosystem-needs-a-skill-commons-and-how-i-built-one-1caa</guid>
      <description>&lt;p&gt;&lt;em&gt;This is a submission for the &lt;a href="https://dev.to/challenges/openclaw-2026-04-16"&gt;OpenClaw Writing Challenge&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;This essay accompanies the &lt;a href=""&gt;ClawForge repository&lt;/a&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  The supply chain problem nobody wants to talk about
&lt;/h2&gt;

&lt;p&gt;ClawHub crossed 13,729 community-published skills in April 2026. That number sounds impressive until you learn that security researchers estimate roughly 20% of those skills contain patterns consistent with prompt injection, credential exfiltration, or unsafe shell execution. The ClawHavoc campaign made this concrete: a coordinated actor published ~300 skills across multiple Claw variants, each embedding a silent exfiltration hook behind an otherwise legitimate-looking system automation. Skills were installed by thousands of developers before the campaign was identified.&lt;/p&gt;

&lt;p&gt;The root problem is structural. ClawHub operates like npm circa 2014: publish-and-forget, no manifest enforcement, no per-variant compatibility declaration, no permission tier labeling. When you &lt;code&gt;install&lt;/code&gt; a skill, you are trusting a markdown file you've never read, written by someone you've never met, to run arbitrary shell commands with your agent's credentials.&lt;/p&gt;

&lt;p&gt;ClawForge is my answer to that problem. It is not another skill list. It is an engineered commons: every skill in the catalog was built from scratch with an explicit trust model baked into the schema, the tooling, and the CI pipeline.&lt;/p&gt;




&lt;h2&gt;
  
  
  The variant problem is bigger than it looks
&lt;/h2&gt;

&lt;p&gt;Ask most OpenClaw developers which variants they target and you'll get a blank stare. But the variant landscape is genuinely fragmented:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Variant&lt;/th&gt;
&lt;th&gt;Runtime&lt;/th&gt;
&lt;th&gt;Idle RAM&lt;/th&gt;
&lt;th&gt;Key constraint&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;OpenClaw&lt;/td&gt;
&lt;td&gt;TypeScript / Node.js&lt;/td&gt;
&lt;td&gt;~1.5 GB&lt;/td&gt;
&lt;td&gt;Full feature set, desktop only&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ZeroClaw&lt;/td&gt;
&lt;td&gt;Rust&lt;/td&gt;
&lt;td&gt;~7.8 MB&lt;/td&gt;
&lt;td&gt;Production VPS, low resource&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;PicoClaw&lt;/td&gt;
&lt;td&gt;Go&lt;/td&gt;
&lt;td&gt;&amp;lt; 10 MB&lt;/td&gt;
&lt;td&gt;Raspberry Pi, $10 boards&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;NullClaw&lt;/td&gt;
&lt;td&gt;Zig&lt;/td&gt;
&lt;td&gt;~1 MB&lt;/td&gt;
&lt;td&gt;RISC-V, ultra-embedded&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;NanoBot&lt;/td&gt;
&lt;td&gt;Python&lt;/td&gt;
&lt;td&gt;~100 MB&lt;/td&gt;
&lt;td&gt;AI research, ML workflows&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;IronClaw&lt;/td&gt;
&lt;td&gt;TypeScript + WASM&lt;/td&gt;
&lt;td&gt;Standard&lt;/td&gt;
&lt;td&gt;High-security enterprise sandbox&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;A skill that calls &lt;code&gt;curl | bash&lt;/code&gt; works fine on OpenClaw. It will silently fail or be blocked entirely on NullClaw. A skill that requires &lt;code&gt;aws-cli&lt;/code&gt; in PATH is useless on a PicoClaw board. A skill that writes to &lt;code&gt;~/.config&lt;/code&gt; may violate the IronClaw sandbox policy.&lt;/p&gt;

&lt;p&gt;No existing skill collection tracks this. ClawForge's &lt;code&gt;SKILL.md&lt;/code&gt; frontmatter requires every skill to declare a per-variant compatibility value — &lt;code&gt;full&lt;/code&gt;, &lt;code&gt;partial&lt;/code&gt;, or &lt;code&gt;unsupported&lt;/code&gt; — and document the reason for any non-&lt;code&gt;full&lt;/code&gt; status in a &lt;code&gt;COMPAT.md&lt;/code&gt; file. This makes the compatibility matrix machine-readable: the installer filters skills automatically for the detected variant.&lt;/p&gt;

&lt;h3&gt;
  
  
  Show, Don't Just Tell
&lt;/h3&gt;

&lt;p&gt;Here's an example of what that engineered frontmatter actually looks like in practice. Notice the explicit compatibility matrix and the security tier—this is what the system parses before anything runs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;arch-sentry&lt;/span&gt;
&lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Arch Linux system health monitor.&lt;/span&gt;
&lt;span class="na"&gt;version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;1.0.0&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;openclaw&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;compat&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;openclaw&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;full&lt;/span&gt;
    &lt;span class="na"&gt;zeroclaw&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;full&lt;/span&gt;
    &lt;span class="na"&gt;picoclaw&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;partial&lt;/span&gt;     &lt;span class="c1"&gt;# no scheduled heartbeat support&lt;/span&gt;
    &lt;span class="na"&gt;nullclaw&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;unsupported&lt;/span&gt; &lt;span class="c1"&gt;# requires sudo bash&lt;/span&gt;
  &lt;span class="na"&gt;security_tier&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;L1&lt;/span&gt;       &lt;span class="c1"&gt;# L1 (read-only audit)&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Because of this machine-readable schema, the Developer Experience (DX) becomes dead simple and inherently secure:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# 1. Provide a one-command install that detects your variant&lt;/span&gt;
./install.sh &lt;span class="nt"&gt;--variant&lt;/span&gt; zeroclaw

&lt;span class="c"&gt;# 2. Block bad skills automatically in CI or locally&lt;/span&gt;
./skill-sentinel scan ./skills/sketchy-aws-tool
🚨 &lt;span class="o"&gt;[&lt;/span&gt;FAILED] Risk Score: 9/10
- Credential exfiltration pattern detected: &lt;span class="s1"&gt;'curl -d @$AWS_ACCESS_KEY_ID'&lt;/span&gt;
- Install blocked.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  What ClawHavoc actually taught us
&lt;/h2&gt;

&lt;p&gt;The ClawHavoc campaign followed a pattern security researchers call a "trojan commons" attack. The attacker published genuinely useful skills — a Docker cleanup tool, an AWS cost monitor, a git log formatter — each with a payload buried in a post-install hook or an obfuscated eval block inside a &lt;code&gt;run.sh&lt;/code&gt;. Because the skills were useful, they accumulated real installs. Because ClawHub had no automated scanning, the payload ran undetected for weeks on developer machines with active AWS, GitHub, and Telegram credentials in the environment.&lt;/p&gt;

&lt;p&gt;Three lessons:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Usefulness and safety are orthogonal.&lt;/strong&gt; A trojan skill can be more useful than a clean one. Utility alone is not a trust signal.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Schema enforcement is a prerequisite for trust.&lt;/strong&gt; If the format is loose enough to hide a payload, it will be used to hide a payload.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Defense has to be part of the install path.&lt;/strong&gt; A scanner that runs after installation catches nothing.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;ClawForge responds to all three. Every skill in the catalog has been authored with explicit permission tiers. &lt;code&gt;skill-sentinel&lt;/code&gt; runs as a GitHub Actions workflow on every pull request and is available as a local pre-install scanner. &lt;code&gt;prompt-fence&lt;/code&gt; extends this to instruction and soul files. &lt;code&gt;secret-guard&lt;/code&gt; covers committed secrets in skill configs. The security primitives are not opt-in — they're part of the repository CI.&lt;/p&gt;




&lt;h2&gt;
  
  
  The China signal: what the underground market proved
&lt;/h2&gt;

&lt;p&gt;In early 2026, reports emerged of Chinese developers selling "AI skill packs" for OpenClaw and QClaw on secondary markets, with bundles trading for tens of thousands of RMB. Tencent responded by launching QClaw with 5,000 prebuilt skills and a three-minute deployment story. The pattern was striking: there was clearly enormous appetite for &lt;strong&gt;batteries-included, domain-specific skill bundles&lt;/strong&gt; that a non-technical user could drop in and use immediately.&lt;/p&gt;

&lt;p&gt;No English-language equivalent existed. The closest thing was Andrej Karpathy's personal skill repo — credible and well-reasoned, but intentionally narrow. &lt;code&gt;everything-claude-code&lt;/code&gt; showed that a complete, production-tested system beats a list every time (46K+ stars, Anthropic hackathon winner). But neither project addressed the variant landscape or the supply chain problem.&lt;/p&gt;

&lt;p&gt;ClawForge is the legitimate, open-source, English-language answer to the demand signal the underground market proved. Thirty skills across seven domains, installable in one command, with explicit compatibility metadata and a security scanner in the CI pipeline.&lt;/p&gt;




&lt;h2&gt;
  
  
  The AgentSkills portability problem
&lt;/h2&gt;

&lt;p&gt;The emerging AgentSkills spec promises cross-agent skill portability: a skill authored once should be usable by OpenClaw, NullClaw, and a hypothetical future variant that doesn't exist yet. The promise is real. The current implementation has gaps.&lt;/p&gt;

&lt;p&gt;The biggest gap is capability negotiation. The spec defines a skill's &lt;em&gt;interface&lt;/em&gt; but not its &lt;em&gt;execution requirements&lt;/em&gt;. A skill that requires a 1.5 GB Node.js runtime to evaluate can declare itself AgentSkills-compatible while being completely unusable on a NullClaw device. Portability at the interface level without portability at the execution level creates a false sense of compatibility.&lt;/p&gt;

&lt;p&gt;ClawForge's &lt;code&gt;COMPAT.md&lt;/code&gt; pattern is a practical interim solution. It doesn't replace the spec — it augments it with the ground truth that the spec doesn't yet capture: which variants actually run this skill, what breaks on partial variants, and what the operator needs to know before installing on a constrained device.&lt;/p&gt;




&lt;h2&gt;
  
  
  Design decisions and trade-offs
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Why 30 skills instead of 300?&lt;/strong&gt; Engineering quality is the product, not catalog size. ClawHub already has 13,729 entries. Adding to the noise isn't useful. The 30 flagship skills were designed and built from scratch — each one with a concrete runbook, domain-specific helper scripts, a per-variant compatibility declaration, and a security tier. The deliberate scope means every skill can be fully owned and maintained rather than scaffolded and abandoned.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why a static security tier instead of a dynamic risk score?&lt;/strong&gt; Dynamic scoring invites gaming. A skill author can rewrite a risky shell call to avoid pattern detection. A static permission tier declared by the author and reviewed by a maintainer is harder to fake — it creates a paper trail and puts accountability on the contributor.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why &lt;code&gt;souls/&lt;/code&gt; in the same repo?&lt;/strong&gt; Persona packs are not separate from skills — they shape how skills behave. A security operator running &lt;code&gt;skill-sentinel&lt;/code&gt; should get terse, actionable output. A research analyst running &lt;code&gt;arxiv-scout&lt;/code&gt; should get structured summaries with citation context. Shipping persona packs alongside the skill catalog makes the behavioral contract explicit.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why not build on top of ClawHub's existing infrastructure?&lt;/strong&gt; Because the problem is the infrastructure. ClawHub's permissive publish model is the attack surface. ClawForge is designed to be a curated alternative that coexists with ClawHub, not a replacement for it. The install tooling is intentionally local-first and format-verified.&lt;/p&gt;




&lt;h2&gt;
  
  
  What comes next
&lt;/h2&gt;

&lt;p&gt;ClawForge is a foundation, not a finished product. The work that matters next:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Per-variant install test matrix&lt;/strong&gt;: automated CI that actually runs a representative skill on each variant in a sandboxed container and reports pass/fail against the declared compatibility.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Soul-to-skill binding&lt;/strong&gt;: a mechanism to declare which skills a persona pack activates by default, reducing per-session configuration friction.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Community skill review queue&lt;/strong&gt;: a structured contribution path with security checklist, reviewer assignment, and automated &lt;code&gt;skill-sentinel&lt;/code&gt; gate — analogous to a software package maintainer model.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Locale packs&lt;/strong&gt;: the demand signal from the China market is real. Non-English skill bodies and locale-adapted persona packs would unlock a substantial user base.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The commons model only works if it stays engineered. That means saying no to volume for the sake of volume, enforcing the schema, and treating every new skill as a potential attack surface until proven otherwise.&lt;/p&gt;

&lt;p&gt;That's the bet ClawForge makes.&lt;/p&gt;




&lt;h3&gt;
  
  
  Over to you
&lt;/h3&gt;

&lt;p&gt;I've open-sourced the entire 30-skill catalog and the scaffolding CLI. If you're building for OpenClaw (or ZeroClaw, or NanoBot), try pulling down the repo. Run &lt;code&gt;./tools/forge-skill.sh&lt;/code&gt; to generate your first secure skill in 30 seconds.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;I’d love to know: what’s the biggest variant-compatibility headache you’ve run into? Drop a comment below, and let’s fix the skill commons together.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;ClawForge is open source. Contributions follow the schema in &lt;a href="//../CONTRIBUTING.md"&gt;&lt;code&gt;CONTRIBUTING.md&lt;/code&gt;&lt;/a&gt; and the skill format in &lt;a href="//skill-format.md"&gt;&lt;code&gt;docs/skill-format.md&lt;/code&gt;&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  ClawCon Michigan
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;While I couldn't attend ClawCon Michigan in person, following the community's energy and the announcements remotely is what inspired me to dive deep into the cross-variant compatibility problem to build ClawForge.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>devchallenge</category>
      <category>openclawchallenge</category>
      <category>openclaw</category>
      <category>architecture</category>
    </item>
    <item>
      <title>Why the Claw ecosystem needs a skill commons — and how I built one</title>
      <dc:creator>divyaprakash D</dc:creator>
      <pubDate>Sun, 26 Apr 2026 12:08:55 +0000</pubDate>
      <link>https://forem.com/divyaprakash_d_2d5d085bd4/why-the-claw-ecosystem-needs-a-skill-commons-and-how-i-built-one-5ha9</link>
      <guid>https://forem.com/divyaprakash_d_2d5d085bd4/why-the-claw-ecosystem-needs-a-skill-commons-and-how-i-built-one-5ha9</guid>
      <description>&lt;p&gt;&lt;em&gt;This is a submission for the &lt;a href="https://dev.to/challenges/openclaw-2026-04-16"&gt;OpenClaw Writing Challenge&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;This essay accompanies the &lt;a href=""&gt;ClawForge repository&lt;/a&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  The supply chain problem nobody wants to talk about
&lt;/h2&gt;

&lt;p&gt;ClawHub crossed 13,729 community-published skills in April 2026. That number sounds impressive until you learn that security researchers estimate roughly 20% of those skills contain patterns consistent with prompt injection, credential exfiltration, or unsafe shell execution. The ClawHavoc campaign made this concrete: a coordinated actor published ~300 skills across multiple Claw variants, each embedding a silent exfiltration hook behind an otherwise legitimate-looking system automation. Skills were installed by thousands of developers before the campaign was identified.&lt;/p&gt;

&lt;p&gt;The root problem is structural. ClawHub operates like npm circa 2014: publish-and-forget, no manifest enforcement, no per-variant compatibility declaration, no permission tier labeling. When you &lt;code&gt;install&lt;/code&gt; a skill, you are trusting a markdown file you've never read, written by someone you've never met, to run arbitrary shell commands with your agent's credentials.&lt;/p&gt;

&lt;p&gt;ClawForge is my answer to that problem. It is not another skill list. It is an engineered commons: every skill in the catalog was built from scratch with an explicit trust model baked into the schema, the tooling, and the CI pipeline.&lt;/p&gt;




&lt;h2&gt;
  
  
  The variant problem is bigger than it looks
&lt;/h2&gt;

&lt;p&gt;Ask most OpenClaw developers which variants they target and you'll get a blank stare. But the variant landscape is genuinely fragmented:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Variant&lt;/th&gt;
&lt;th&gt;Runtime&lt;/th&gt;
&lt;th&gt;Idle RAM&lt;/th&gt;
&lt;th&gt;Key constraint&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;OpenClaw&lt;/td&gt;
&lt;td&gt;TypeScript / Node.js&lt;/td&gt;
&lt;td&gt;~1.5 GB&lt;/td&gt;
&lt;td&gt;Full feature set, desktop only&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ZeroClaw&lt;/td&gt;
&lt;td&gt;Rust&lt;/td&gt;
&lt;td&gt;~7.8 MB&lt;/td&gt;
&lt;td&gt;Production VPS, low resource&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;PicoClaw&lt;/td&gt;
&lt;td&gt;Go&lt;/td&gt;
&lt;td&gt;&amp;lt; 10 MB&lt;/td&gt;
&lt;td&gt;Raspberry Pi, $10 boards&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;NullClaw&lt;/td&gt;
&lt;td&gt;Zig&lt;/td&gt;
&lt;td&gt;~1 MB&lt;/td&gt;
&lt;td&gt;RISC-V, ultra-embedded&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;NanoBot&lt;/td&gt;
&lt;td&gt;Python&lt;/td&gt;
&lt;td&gt;~100 MB&lt;/td&gt;
&lt;td&gt;AI research, ML workflows&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;IronClaw&lt;/td&gt;
&lt;td&gt;TypeScript + WASM&lt;/td&gt;
&lt;td&gt;Standard&lt;/td&gt;
&lt;td&gt;High-security enterprise sandbox&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;A skill that calls &lt;code&gt;curl | bash&lt;/code&gt; works fine on OpenClaw. It will silently fail or be blocked entirely on NullClaw. A skill that requires &lt;code&gt;aws-cli&lt;/code&gt; in PATH is useless on a PicoClaw board. A skill that writes to &lt;code&gt;~/.config&lt;/code&gt; may violate the IronClaw sandbox policy.&lt;/p&gt;

&lt;p&gt;No existing skill collection tracks this. ClawForge's &lt;code&gt;SKILL.md&lt;/code&gt; frontmatter requires every skill to declare a per-variant compatibility value — &lt;code&gt;full&lt;/code&gt;, &lt;code&gt;partial&lt;/code&gt;, or &lt;code&gt;unsupported&lt;/code&gt; — and document the reason for any non-&lt;code&gt;full&lt;/code&gt; status in a &lt;code&gt;COMPAT.md&lt;/code&gt; file. This makes the compatibility matrix machine-readable: the installer filters skills automatically for the detected variant.&lt;/p&gt;

&lt;h3&gt;
  
  
  Show, Don't Just Tell
&lt;/h3&gt;

&lt;p&gt;Here's an example of what that engineered frontmatter actually looks like in practice. Notice the explicit compatibility matrix and the security tier—this is what the system parses before anything runs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;arch-sentry&lt;/span&gt;
&lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Arch Linux system health monitor.&lt;/span&gt;
&lt;span class="na"&gt;version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;1.0.0&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;openclaw&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;compat&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;openclaw&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;full&lt;/span&gt;
    &lt;span class="na"&gt;zeroclaw&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;full&lt;/span&gt;
    &lt;span class="na"&gt;picoclaw&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;partial&lt;/span&gt;     &lt;span class="c1"&gt;# no scheduled heartbeat support&lt;/span&gt;
    &lt;span class="na"&gt;nullclaw&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;unsupported&lt;/span&gt; &lt;span class="c1"&gt;# requires sudo bash&lt;/span&gt;
  &lt;span class="na"&gt;security_tier&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;L1&lt;/span&gt;       &lt;span class="c1"&gt;# L1 (read-only audit)&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Because of this machine-readable schema, the Developer Experience (DX) becomes dead simple and inherently secure:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# 1. Provide a one-command install that detects your variant&lt;/span&gt;
./install.sh &lt;span class="nt"&gt;--variant&lt;/span&gt; zeroclaw

&lt;span class="c"&gt;# 2. Block bad skills automatically in CI or locally&lt;/span&gt;
./skill-sentinel scan ./skills/sketchy-aws-tool
🚨 &lt;span class="o"&gt;[&lt;/span&gt;FAILED] Risk Score: 9/10
- Credential exfiltration pattern detected: &lt;span class="s1"&gt;'curl -d @$AWS_ACCESS_KEY_ID'&lt;/span&gt;
- Install blocked.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  What ClawHavoc actually taught us
&lt;/h2&gt;

&lt;p&gt;The ClawHavoc campaign followed a pattern security researchers call a "trojan commons" attack. The attacker published genuinely useful skills — a Docker cleanup tool, an AWS cost monitor, a git log formatter — each with a payload buried in a post-install hook or an obfuscated eval block inside a &lt;code&gt;run.sh&lt;/code&gt;. Because the skills were useful, they accumulated real installs. Because ClawHub had no automated scanning, the payload ran undetected for weeks on developer machines with active AWS, GitHub, and Telegram credentials in the environment.&lt;/p&gt;

&lt;p&gt;Three lessons:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Usefulness and safety are orthogonal.&lt;/strong&gt; A trojan skill can be more useful than a clean one. Utility alone is not a trust signal.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Schema enforcement is a prerequisite for trust.&lt;/strong&gt; If the format is loose enough to hide a payload, it will be used to hide a payload.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Defense has to be part of the install path.&lt;/strong&gt; A scanner that runs after installation catches nothing.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;ClawForge responds to all three. Every skill in the catalog has been authored with explicit permission tiers. &lt;code&gt;skill-sentinel&lt;/code&gt; runs as a GitHub Actions workflow on every pull request and is available as a local pre-install scanner. &lt;code&gt;prompt-fence&lt;/code&gt; extends this to instruction and soul files. &lt;code&gt;secret-guard&lt;/code&gt; covers committed secrets in skill configs. The security primitives are not opt-in — they're part of the repository CI.&lt;/p&gt;




&lt;h2&gt;
  
  
  The China signal: what the underground market proved
&lt;/h2&gt;

&lt;p&gt;In early 2026, reports emerged of Chinese developers selling "AI skill packs" for OpenClaw and QClaw on secondary markets, with bundles trading for tens of thousands of RMB. Tencent responded by launching QClaw with 5,000 prebuilt skills and a three-minute deployment story. The pattern was striking: there was clearly enormous appetite for &lt;strong&gt;batteries-included, domain-specific skill bundles&lt;/strong&gt; that a non-technical user could drop in and use immediately.&lt;/p&gt;

&lt;p&gt;No English-language equivalent existed. The closest thing was Andrej Karpathy's personal skill repo — credible and well-reasoned, but intentionally narrow. &lt;code&gt;everything-claude-code&lt;/code&gt; showed that a complete, production-tested system beats a list every time (46K+ stars, Anthropic hackathon winner). But neither project addressed the variant landscape or the supply chain problem.&lt;/p&gt;

&lt;p&gt;ClawForge is the legitimate, open-source, English-language answer to the demand signal the underground market proved. Thirty skills across seven domains, installable in one command, with explicit compatibility metadata and a security scanner in the CI pipeline.&lt;/p&gt;




&lt;h2&gt;
  
  
  The AgentSkills portability problem
&lt;/h2&gt;

&lt;p&gt;The emerging AgentSkills spec promises cross-agent skill portability: a skill authored once should be usable by OpenClaw, NullClaw, and a hypothetical future variant that doesn't exist yet. The promise is real. The current implementation has gaps.&lt;/p&gt;

&lt;p&gt;The biggest gap is capability negotiation. The spec defines a skill's &lt;em&gt;interface&lt;/em&gt; but not its &lt;em&gt;execution requirements&lt;/em&gt;. A skill that requires a 1.5 GB Node.js runtime to evaluate can declare itself AgentSkills-compatible while being completely unusable on a NullClaw device. Portability at the interface level without portability at the execution level creates a false sense of compatibility.&lt;/p&gt;

&lt;p&gt;ClawForge's &lt;code&gt;COMPAT.md&lt;/code&gt; pattern is a practical interim solution. It doesn't replace the spec — it augments it with the ground truth that the spec doesn't yet capture: which variants actually run this skill, what breaks on partial variants, and what the operator needs to know before installing on a constrained device.&lt;/p&gt;




&lt;h2&gt;
  
  
  Design decisions and trade-offs
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Why 30 skills instead of 300?&lt;/strong&gt; Engineering quality is the product, not catalog size. ClawHub already has 13,729 entries. Adding to the noise isn't useful. The 30 flagship skills were designed and built from scratch — each one with a concrete runbook, domain-specific helper scripts, a per-variant compatibility declaration, and a security tier. The deliberate scope means every skill can be fully owned and maintained rather than scaffolded and abandoned.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why a static security tier instead of a dynamic risk score?&lt;/strong&gt; Dynamic scoring invites gaming. A skill author can rewrite a risky shell call to avoid pattern detection. A static permission tier declared by the author and reviewed by a maintainer is harder to fake — it creates a paper trail and puts accountability on the contributor.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why &lt;code&gt;souls/&lt;/code&gt; in the same repo?&lt;/strong&gt; Persona packs are not separate from skills — they shape how skills behave. A security operator running &lt;code&gt;skill-sentinel&lt;/code&gt; should get terse, actionable output. A research analyst running &lt;code&gt;arxiv-scout&lt;/code&gt; should get structured summaries with citation context. Shipping persona packs alongside the skill catalog makes the behavioral contract explicit.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why not build on top of ClawHub's existing infrastructure?&lt;/strong&gt; Because the problem is the infrastructure. ClawHub's permissive publish model is the attack surface. ClawForge is designed to be a curated alternative that coexists with ClawHub, not a replacement for it. The install tooling is intentionally local-first and format-verified.&lt;/p&gt;




&lt;h2&gt;
  
  
  What comes next
&lt;/h2&gt;

&lt;p&gt;ClawForge is a foundation, not a finished product. The work that matters next:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Per-variant install test matrix&lt;/strong&gt;: automated CI that actually runs a representative skill on each variant in a sandboxed container and reports pass/fail against the declared compatibility.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Soul-to-skill binding&lt;/strong&gt;: a mechanism to declare which skills a persona pack activates by default, reducing per-session configuration friction.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Community skill review queue&lt;/strong&gt;: a structured contribution path with security checklist, reviewer assignment, and automated &lt;code&gt;skill-sentinel&lt;/code&gt; gate — analogous to a software package maintainer model.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Locale packs&lt;/strong&gt;: the demand signal from the China market is real. Non-English skill bodies and locale-adapted persona packs would unlock a substantial user base.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The commons model only works if it stays engineered. That means saying no to volume for the sake of volume, enforcing the schema, and treating every new skill as a potential attack surface until proven otherwise.&lt;/p&gt;

&lt;p&gt;That's the bet ClawForge makes.&lt;/p&gt;




&lt;h3&gt;
  
  
  Over to you
&lt;/h3&gt;

&lt;p&gt;I've open-sourced the entire 30-skill catalog and the scaffolding CLI. If you're building for OpenClaw (or ZeroClaw, or NanoBot), try pulling down the repo. Run &lt;code&gt;./tools/forge-skill.sh&lt;/code&gt; to generate your first secure skill in 30 seconds.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;I’d love to know: what’s the biggest variant-compatibility headache you’ve run into? Drop a comment below, and let’s fix the skill commons together.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;ClawForge is open source. Contributions follow the schema in &lt;a href="//../CONTRIBUTING.md"&gt;&lt;code&gt;CONTRIBUTING.md&lt;/code&gt;&lt;/a&gt; and the skill format in &lt;a href="//skill-format.md"&gt;&lt;code&gt;docs/skill-format.md&lt;/code&gt;&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  ClawCon Michigan
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;While I couldn't attend ClawCon Michigan in person, following the community's energy and the announcements remotely is what inspired me to dive deep into the cross-variant compatibility problem to build ClawForge.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>devchallenge</category>
      <category>openclawchallenge</category>
      <category>openclaw</category>
      <category>architecture</category>
    </item>
    <item>
      <title>[Boost]</title>
      <dc:creator>divyaprakash D</dc:creator>
      <pubDate>Sat, 14 Feb 2026 05:18:06 +0000</pubDate>
      <link>https://forem.com/divyaprakash_d_2d5d085bd4/-3pkp</link>
      <guid>https://forem.com/divyaprakash_d_2d5d085bd4/-3pkp</guid>
      <description>&lt;div class="ltag__link"&gt;
  &lt;a href="/divyaprakash_d_2d5d085bd4" class="ltag__link__link"&gt;
    &lt;div class="ltag__link__pic"&gt;
      &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1761017%2Ff2cb9107-da61-4982-977d-2099598f1e5d.jpg" alt="divyaprakash_d_2d5d085bd4"&gt;
    &lt;/div&gt;
  &lt;/a&gt;
  &lt;a href="https://dev.to/divyaprakash_d_2d5d085bd4/stop-editing-start-playing-meet-autoshorts-the-ai-gaming-editor-4mbp" class="ltag__link__link"&gt;
    &lt;div class="ltag__link__content"&gt;
      &lt;h2&gt;Stop Editing. Start Playing. Meet AutoShorts: The AI Gaming Editor 🎮&lt;/h2&gt;
      &lt;h3&gt;divyaprakash D ・ Feb 13&lt;/h3&gt;
      &lt;div class="ltag__link__taglist"&gt;
        &lt;span class="ltag__link__tag"&gt;#devchallenge&lt;/span&gt;
        &lt;span class="ltag__link__tag"&gt;#githubchallenge&lt;/span&gt;
        &lt;span class="ltag__link__tag"&gt;#cli&lt;/span&gt;
        &lt;span class="ltag__link__tag"&gt;#githubcopilot&lt;/span&gt;
      &lt;/div&gt;
    &lt;/div&gt;
  &lt;/a&gt;
&lt;/div&gt;


</description>
      <category>devchallenge</category>
      <category>githubchallenge</category>
      <category>cli</category>
      <category>githubcopilot</category>
    </item>
    <item>
      <title>Stop Editing. Start Playing. Meet AutoShorts: The AI Gaming Editor 🎮</title>
      <dc:creator>divyaprakash D</dc:creator>
      <pubDate>Fri, 13 Feb 2026 09:25:52 +0000</pubDate>
      <link>https://forem.com/divyaprakash_d_2d5d085bd4/stop-editing-start-playing-meet-autoshorts-the-ai-gaming-editor-4mbp</link>
      <guid>https://forem.com/divyaprakash_d_2d5d085bd4/stop-editing-start-playing-meet-autoshorts-the-ai-gaming-editor-4mbp</guid>
      <description>&lt;p&gt;&lt;em&gt;This is a submission for the &lt;a href="https://dev.to/challenges/github-2026-01-21"&gt;GitHub Copilot CLI Challenge&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Built
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;AutoShorts&lt;/strong&gt; is an AI-powered pipeline that automatically transforms long-form gameplay footage into viral-ready vertical clips. It uses &lt;strong&gt;Vision AI&lt;/strong&gt; to semantically understand content—distinguishing between "action," "clutch plays," and "WTF moments"—then adds &lt;strong&gt;AI-generated captions&lt;/strong&gt; and &lt;strong&gt;AI voiceovers&lt;/strong&gt; with matching energy and personality.&lt;/p&gt;

&lt;p&gt;The result? Hours of gameplay → polished TikTok/Shorts/Reels-ready clips, with minimal human intervention.&lt;/p&gt;

&lt;h2&gt;
  
  
  Demo
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;View Project on GitHub:&lt;/strong&gt; &lt;a href="https://github.com/divyaprakash0426/autoshorts" rel="noopener noreferrer"&gt;Link&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Demo Video:&lt;/strong&gt; 

  &lt;iframe src="https://www.youtube.com/embed/JZawIDjbxCg"&gt;
  &lt;/iframe&gt;


&lt;/p&gt;

&lt;h3&gt;
  
  
  🎥 Showcase: Multi-Language &amp;amp; Style Generation
&lt;/h3&gt;

&lt;p&gt;AutoShorts automatically adapts its editing style, captions, and voiceover personality based on the content and target language. Here are some examples generated entirely by the pipeline:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Content&lt;/th&gt;
&lt;th&gt;Style&lt;/th&gt;
&lt;th&gt;Language&lt;/th&gt;
&lt;th&gt;Video&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Fortnite&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Story Roast&lt;/td&gt;
&lt;td&gt;🇺🇸 English&lt;/td&gt;
&lt;td&gt;&lt;a href="https://www.youtube.com/shorts/tTUipTAdBlk" rel="noopener noreferrer"&gt;Watch Part 1&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Indiana Jones&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;GenZ Slang&lt;/td&gt;
&lt;td&gt;🇺🇸 English&lt;/td&gt;
&lt;td&gt;&lt;a href="https://www.youtube.com/shorts/VAOlR5RAX14" rel="noopener noreferrer"&gt;Watch Part 1&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Battlefield 6&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Dramatic Story&lt;/td&gt;
&lt;td&gt;🇯🇵 Japanese&lt;/td&gt;
&lt;td&gt;&lt;a href="https://www.youtube.com/shorts/DYNEr1CzTpY" rel="noopener noreferrer"&gt;Watch Part 1&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Indiana Jones&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Story News&lt;/td&gt;
&lt;td&gt;🇨🇳 Chinese&lt;/td&gt;
&lt;td&gt;&lt;a href="https://www.youtube.com/shorts/kGRrpu66fpk" rel="noopener noreferrer"&gt;Watch Part 1&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Fortnite&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Story Roast&lt;/td&gt;
&lt;td&gt;🇪🇸 Spanish&lt;/td&gt;
&lt;td&gt;&lt;a href="https://www.youtube.com/shorts/5QcelWS1oSo" rel="noopener noreferrer"&gt;Watch Part 1&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Fortnite&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Story Roast&lt;/td&gt;
&lt;td&gt;🇷🇺 Russian&lt;/td&gt;
&lt;td&gt;&lt;a href="https://www.youtube.com/shorts/A06FdnycTYo" rel="noopener noreferrer"&gt;Watch Part 1&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Indiana Jones&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Auto Gameplay&lt;/td&gt;
&lt;td&gt;🇧🇷 Portuguese&lt;/td&gt;
&lt;td&gt;&lt;a href="https://www.youtube.com/shorts/qDFsTnH9qxc" rel="noopener noreferrer"&gt;Watch Part 1&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  📸 Dashboard Interface
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;1. Generate Page&lt;/strong&gt;&lt;br&gt;
The command center for creating new content. Simply drop a video or select an existing one, choose your analysis mode (Local vs. Cloud), and hit "Find Clips."&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5hlsak6ol3o8kdmuj07k.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5hlsak6ol3o8kdmuj07k.png" alt="Generate Page"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Settings &amp;amp; Cost Control&lt;/strong&gt;&lt;br&gt;
Full control over which AI models are used and strictly managed API costs. You can toggle between OpenAI, Gemini, or efficient Local Heuristics.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy6iicn8whq7qav401owe.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy6iicn8whq7qav401owe.png" alt="Settings Page"&gt;&lt;/a&gt;&lt;/p&gt;


&lt;h2&gt;
  
  
  Why I Built This
&lt;/h2&gt;

&lt;p&gt;I had a problem that every content creator knows: &lt;strong&gt;hours of gameplay footage, but no time to edit&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Recording gameplay is the easy part. The hard part is scrubbing through 2-hour VODs looking for that one clutch moment, that hilarious fail, or that "wait, what just happened?" clip. Then you need to:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Find the moment&lt;/li&gt;
&lt;li&gt;Crop to vertical (9:16)&lt;/li&gt;
&lt;li&gt;Add captions that match the vibe&lt;/li&gt;
&lt;li&gt;Maybe add commentary or voiceover&lt;/li&gt;
&lt;li&gt;Export and repeat... dozens of times&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;I was spending 3-4 hours editing for every hour of footage. That's backwards.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;I wanted a system where I could:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Drop a raw gameplay file&lt;/li&gt;
&lt;li&gt;Walk away&lt;/li&gt;
&lt;li&gt;Come back to ready-to-upload clips with captions and voiceovers&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;AutoShorts is that system.&lt;/p&gt;


&lt;h2&gt;
  
  
  How I Built It (Technical Deep-Dive)
&lt;/h2&gt;

&lt;p&gt;Building AutoShorts was a rollercoaster of "this is genius" moments immediately followed by "why is everything on fire." Here's the real story — the problems nobody warns you about, and the solutions that made it all work.&lt;/p&gt;
&lt;h3&gt;
  
  
  The Architecture Challenge
&lt;/h3&gt;

&lt;p&gt;When the feature set started growing — Vision AI analysis, TTS voice synthesis, story narration, cross-clip narrative arcs — it became clear that a single orchestration file wasn't going to cut it. Every new feature touched everything else, and debugging felt like untangling christmas lights.&lt;/p&gt;

&lt;p&gt;The fix was &lt;strong&gt;Domain-Driven Design&lt;/strong&gt; — splitting the logic into focused modules, each owning its piece of the pipeline:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;src/
├── shorts.py              # Orchestration &amp;amp; rendering
├── ai_providers.py        # Gemini/OpenAI abstraction
├── tts_generator.py       # Qwen3-TTS voice synthesis
├── subtitle_generator.py  # Caption generation &amp;amp; timing
└── story_narrator.py      # Cross-clip narrative generation
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This separation seemed like overkill at first. Then I discovered I needed to load and unload AI models from GPU memory between pipeline stages — TTS has to yield VRAM for rendering, which has to yield for AI analysis — and suddenly having clean boundaries between modules was the only thing keeping me sane.&lt;/p&gt;

&lt;h3&gt;
  
  
  The VRAM Juggling Act
&lt;/h3&gt;

&lt;p&gt;Here's the thing about running AI models on consumer GPUs: &lt;strong&gt;they don't share nicely.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Qwen3-TTS (voice synthesis) needs ~4GB VRAM. Video rendering with PyTorch needs ~2GB. These models don't politely step aside for each other — they sit in VRAM until you physically evict them.&lt;/p&gt;

&lt;p&gt;The solution was &lt;strong&gt;aggressive model lifecycle management&lt;/strong&gt; — singleton patterns with explicit cleanup:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# After TTS generation completes
&lt;/span&gt;&lt;span class="n"&gt;QwenTTS&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;clear_instance&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cuda&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;empty_cache&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;gc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;collect&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;logging&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;TTS model unloaded — VRAM freed for rendering&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Without this, the pipeline would OOM (out-of-memory crash) after processing 2-3 clips. Fun times at 2 AM when you're wondering why clip #3 always segfaults.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Qwen3-VL Dead End: When "Local" Goes Too Far
&lt;/h3&gt;

&lt;p&gt;I desperately wanted the entire video analysis to happen locally. I actually got &lt;strong&gt;Qwen3-VL&lt;/strong&gt; (video-language model) integrated and working, but it was a textbook case of &lt;em&gt;"just because you can, doesn't mean you should."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Qwen3-VL is a monster. It’s not just big; it's VRAM-hungry beyond reason. My 12GB RTX 4080 laptop didn't stand a chance, and even on high-end 24GB cards, it would regularly hit the OOM wall during long video sequences.&lt;/p&gt;

&lt;p&gt;I attempted a last-ditch effort using &lt;strong&gt;Qwen3-VL-4B-Instruct-FP8&lt;/strong&gt;, but even with quantization, the stability wasn't there—it still occasionally nuked the pipeline. Worse, the analysis quality didn't justify the struggle; the results were underwhelming compared to the resource cost. It felt like I was trying to race a semi-truck on a go-kart track.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The pivot:&lt;/strong&gt; This failure is actually what led to the &lt;strong&gt;Deep Analysis Proxy&lt;/strong&gt; system. I realized that instead of fighting 30GB models locally, I could spend those dev cycles on intelligent preprocessing (the 15MB proxy) and let a cloud model do the heavy lifting for pennies. The result was a pipeline that's actually accessible to people with consumer GPUs, rather than just data center owners.&lt;/p&gt;

&lt;h3&gt;
  
  
  The TTS Timing Nightmare
&lt;/h3&gt;

&lt;p&gt;This was the most infuriating bug I encountered, and it took three separate debugging sessions to crack.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The problem:&lt;/strong&gt; Subtitles and voiceover were drifting out of sync in story mode. By the end of a 60-second clip, subtitles were 3-4 seconds ahead of the voice. Not great when you're going for "professional esports broadcast" and getting "badly dubbed foreign film."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The investigation:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Story mode generates a continuous narration (like a broadcaster). The TTS engine reads all sentences as one flowing piece. But subtitles were timed by probing each sentence &lt;em&gt;individually&lt;/em&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Subtitle timing (probed separately):
  "The player approaches" → 2.3s
  "An incredible shot"    → 1.8s
  Total: 4.1s

TTS (generated as merged text):
  "The player approaches an incredible shot" → 3.6s
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;See the problem? When you join sentences, the TTS naturally flows faster — no pause between them. That 0.5s error &lt;em&gt;accumulated&lt;/em&gt; across every sentence.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The fix:&lt;/strong&gt; Probe the &lt;em&gt;merged&lt;/em&gt; narration once, then distribute timing proportionally:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# ❌ Wrong: probe each sentence separately
&lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;sentence&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;sentences&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;duration&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;probe_tts&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sentence&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Accumulated error!
&lt;/span&gt;
&lt;span class="c1"&gt;# ✅ Right: probe merged text, distribute proportionally
&lt;/span&gt;&lt;span class="n"&gt;full_narration&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sentences&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;total_duration&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;probe_tts&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;full_narration&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;sentence&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;sentences&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;sentence_duration&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;total_duration&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sentence&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;total_chars&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;One of those fixes where you stare at the solution and think &lt;em&gt;"why didn't I see this three days ago?"&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  The "TTS Longer Than Video" Problem
&lt;/h3&gt;

&lt;p&gt;Sometimes the AI writes an essay when you asked for a tweet. A 45-second gameplay clip ends up with 52 seconds of narration. Now what?&lt;/p&gt;

&lt;p&gt;Three options on the table:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Option A:&lt;/strong&gt; Truncate the voiceover → Loses content, sounds cut off&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Option B:&lt;/strong&gt; Speed up the voice → Sounds like a chipmunk reading the news&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Option C:&lt;/strong&gt; Extend the video to match → 🤔&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Option C won, but with nuance:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;tts_duration&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;clip_duration&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mf"&gt;1.5&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# Big gap: go back to source video, extract more footage
&lt;/span&gt;    &lt;span class="nf"&gt;rerender_clip_for_tts&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;clip&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;render_meta&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tts_duration&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mf"&gt;1.0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# Small gap: freeze last frame using FFmpeg tpad
&lt;/span&gt;    &lt;span class="n"&gt;ffmpeg_filter&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tpad=stop_mode=clone:stop_duration=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;gap&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;s&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The re-render logic reaches back into the &lt;em&gt;original source video&lt;/em&gt; and extracts more footage — even beyond the original scene boundaries. This required tracking render metadata (start time, source file, scene duration) through the entire pipeline. Worth it though. No more cut-off narration.&lt;/p&gt;

&lt;h3&gt;
  
  
  FlashAttention: When Your RAM Isn't Enough
&lt;/h3&gt;

&lt;p&gt;Qwen3-TTS performs best with FlashAttention 2 — a CUDA kernel that speeds up attention computation by 3-4x. One problem: building it from source requires compiling CUDA code, which needs &lt;strong&gt;125GB+ RAM&lt;/strong&gt; during compilation. On machines with less than 32GB RAM, the build takes &lt;strong&gt;24 hours or more&lt;/strong&gt; — if the OOM killer doesn't murder it first.&lt;/p&gt;

&lt;p&gt;My machine has 16GB. &lt;code&gt;Killed&lt;/code&gt; — my favorite one-word error message.&lt;/p&gt;

&lt;p&gt;The solution? Prebuilt wheels. Someone lovely had already compiled FlashAttention for various PyTorch + CUDA combinations:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight make"&gt;&lt;code&gt;&lt;span class="nl"&gt;install_flash_attn&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;
    &lt;span class="p"&gt;@&lt;/span&gt;&lt;span class="nv"&gt;PYVER&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;$$(&lt;/span&gt;python &lt;span class="nt"&gt;-c&lt;/span&gt; &lt;span class="s2"&gt;"import sys; print(f'cp{sys.version_info.major&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;{sys.version_info.minor}')"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    pip &lt;span class="nb"&gt;install &lt;/span&gt;https://github.com/.../flash_attn-2.6.3+cu128torch2.10-&lt;span class="nv"&gt;$$&lt;/span&gt;PYVER-linux_x86_64.whl
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;One line. No compilation. No 125GB RAM requirement. Installation went from "impossible on my hardware" to "done in 30 seconds."&lt;/p&gt;

&lt;h3&gt;
  
  
  Deep Analysis: Letting AI See the Full Picture
&lt;/h3&gt;

&lt;p&gt;Here's an insight that changed everything: &lt;strong&gt;short clips lack context.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;In the default mode, each candidate clip is analyzed independently — the AI sees 2 minutes of footage and scores it. But it doesn't know what happened &lt;em&gt;before&lt;/em&gt; or &lt;em&gt;after&lt;/em&gt;. A celebration makes no sense without the clutch play that preceded it.&lt;/p&gt;

&lt;p&gt;Deep Analysis mode fixes this by letting Gemini see the &lt;strong&gt;entire video&lt;/strong&gt; — but we're not about to upload a multi-GB 4K recording raw. That would take forever and burn through API quotas.&lt;/p&gt;

&lt;p&gt;Instead, we generate a &lt;strong&gt;lightweight proxy&lt;/strong&gt; first using GPU-accelerated FFmpeg:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# GPU-accelerated proxy: 4K@60fps → 640p@1fps, high compression
&lt;/span&gt;&lt;span class="n"&gt;gpu_cmd&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ffmpeg&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-y&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-hwaccel&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cuda&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-hwaccel_output_format&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cuda&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-i&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;video_path&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-vf&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;scale_cuda=640:-2,fps=1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;   &lt;span class="c1"&gt;# 640px wide, 1 frame per second
&lt;/span&gt;    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-c:v&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;hevc_nvenc&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-qp&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;35&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;                         &lt;span class="c1"&gt;# Aggressive compression
&lt;/span&gt;    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-c:a&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;aac&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-b:a&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;32k&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-ac&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# Mono 32kbps audio
&lt;/span&gt;    &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;temp_proxy&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A 2-hour 4K gameplay recording (~30GB) becomes a ~15MB proxy. Same content, same timeline, same audio cues — just tiny enough to upload in seconds. The proxy is also cached by file hash, so re-runs skip the generation step entirely.&lt;/p&gt;

&lt;p&gt;The AI can now identify narrative arcs — the setup, the payoff, the aftermath. It finds moments that a clip-by-clip analysis would miss entirely. The quality jump is &lt;em&gt;dramatic&lt;/em&gt;, and all it costs is a ~15MB upload instead of 30GB.&lt;/p&gt;

&lt;h3&gt;
  
  
  Voice Design: From Text to Personality
&lt;/h3&gt;

&lt;p&gt;The most "wow" feature. Instead of picking from generic preset voices, you describe the voice you want in natural language:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;VOICE_PRESET_MAP&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;story_news&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
        gender: Male.
        pitch: Dynamic, high-energy with excitement.
        speed: Brisk, fast-paced, maintaining high momentum.
        emotion: Hype, adrenaline, &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;unbelievable play&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; excitement.
        personality: Charismatic, knowledgeable, maximum energy.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;story_dramatic&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
        gender: Female.
        pitch: Rich, resonant mid-range with expressive depth.
        speed: Measured, deliberate pacing with dramatic pauses for impact.
        emotion: Intense, evocative, drawing listeners into the story.
        personality: Wise, commanding, magnetic storyteller presence.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Qwen3-TTS reads this description and synthesizes a matching voice. The same caption sounds completely different between "esports broadcaster" and "creepypasta narrator" — and it all happens locally. No cloud TTS API, no per-word billing.&lt;/p&gt;

&lt;h3&gt;
  
  
  Slang Preprocessing: Making TTS Sound Natural
&lt;/h3&gt;

&lt;p&gt;TTS engines and internet slang do not get along. "rn" becomes "urn." "lol" becomes "loll." "fr fr" sounds like a French car brand.&lt;/p&gt;

&lt;p&gt;The fix is a preprocessing layer that expands slang before TTS sees it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;preprocess_tts_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;
    &lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sub&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;\brn\b&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;right now&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;flags&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;IGNORECASE&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sub&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;\blol\b&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;L O L&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;flags&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;IGNORECASE&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sub&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;\bidk\b&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;I don&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;t know&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;flags&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;IGNORECASE&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Qwen3-TTS doesn't pause at dashes, so swap them for ellipses
&lt;/span&gt;    &lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; -- &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;... &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; - &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;... &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Small detail, huge impact. GenZ-style captions like "bro that was lowkey insane rn fr fr" actually &lt;em&gt;sound&lt;/em&gt; right when spoken aloud.&lt;/p&gt;

&lt;h3&gt;
  
  
  CJK Subtitle Handling: When Words Don't Have Spaces
&lt;/h3&gt;

&lt;p&gt;English subtitles are easy — split on spaces, chunk into 7-word captions, done. But Japanese, Chinese, and Korean (JCK languages) don't use spaces between words. A sentence is one continuous stream of characters.&lt;/p&gt;

&lt;p&gt;This completely broke the subtitle chunking logic. A 40-character Japanese sentence would appear as one massive wall of text filling the entire screen.&lt;/p&gt;

&lt;p&gt;The fix was &lt;strong&gt;character-based splitting with language detection&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Detect CJK characters in the sentence
&lt;/span&gt;&lt;span class="n"&gt;is_cjk&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;any&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\u4e00&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="n"&gt;char&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\u9fff&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt;  &lt;span class="c1"&gt;# Chinese
&lt;/span&gt;              &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\u3040&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="n"&gt;char&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\u30ff&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;       &lt;span class="c1"&gt;# Japanese
&lt;/span&gt;              &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;char&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;sentence&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;MAX_CJK_CHARS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;18&lt;/span&gt;  &lt;span class="c1"&gt;# Characters per line for CJK
&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;is_cjk&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# Character-based splitting instead of word-based
&lt;/span&gt;    &lt;span class="n"&gt;chunks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;sentence&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="n"&gt;MAX_CJK_CHARS&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
              &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sentence&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;MAX_CJK_CHARS&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt;
    &lt;span class="c1"&gt;# Distribute TTS duration proportionally by character count
&lt;/span&gt;    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;chunk_ratio&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sentence&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;chunk_duration&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tts_duration&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;chunk_ratio&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The sentence splitter also handles CJK punctuation (&lt;code&gt;。！？&lt;/code&gt;) which doesn't follow the English pattern of period-then-whitespace. These characters terminate sentences directly, no space required.&lt;/p&gt;

&lt;p&gt;One of those "obvious in hindsight" fixes that makes multi-language support actually work instead of just being a checkbox on a feature list.&lt;/p&gt;




&lt;h2&gt;
  
  
  My Experience with GitHub Copilot CLI
&lt;/h2&gt;

&lt;p&gt;Everything above? That's the engineering. But I'd be lying if I said I did it alone. GitHub Copilot CLI was my pair programmer through most of this — and here's how it actually helped.&lt;/p&gt;

&lt;p&gt;Copilot CLI wasn't just autocomplete — it was a debugging partner, architecture consultant, and documentation writer rolled into one.&lt;/p&gt;

&lt;h3&gt;
  
  
  What Worked Exceptionally Well
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;1. Plan Mode for Complex Changes&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Using &lt;code&gt;[[PLAN]]&lt;/code&gt; prefix before major refactors gave me a structured approach:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[[PLAN]] Migrate from ChatterBox TTS to Qwen3-TTS VoiceDesign
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Copilot generated a 6-phase plan covering dependency changes, API migration, FlashAttention setup, testing checkpoints, and rollback strategies. I could review and edit the plan before implementation started.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Debugging Across Sessions&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The checkpoint system was crucial. When investigating the subtitle timing bug, I could reference earlier sessions:&lt;/p&gt;

&lt;p&gt;&lt;em&gt;"Check checkpoint 012-tts-subtitle-sync for what we tried before"&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Copilot would review the history and avoid repeating failed approaches.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Parallel Exploration&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;When I wasn't sure which approach to take, I'd ask Copilot to spin up explore agents to investigate multiple paths simultaneously:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;task agent_type: explore
prompt: "How does generate_for_captions() handle timing in story mode vs normal mode?"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This let me understand the codebase faster than reading linearly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Test Generation&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;After making changes, Copilot helped write comprehensive tests:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;test_preprocess_tts_text_em_dash&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;preprocess_tts_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;wait — what&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;
    &lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;—&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;50 tests covering subtitle formatting, TTS preprocessing, voice description generation, and scene combination logic — all generated from understanding the code context.&lt;/p&gt;

&lt;h3&gt;
  
  
  What I Learned
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Be specific about constraints.&lt;/strong&gt; "Fix the OOM error" is less useful than "We have 10GB VRAM, model A needs 8GB, model B needs 4GB, how do we sequence them?"&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use checkpoints liberally.&lt;/strong&gt; Complex debugging spans sessions. Good checkpoints save hours.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Let Copilot see the errors.&lt;/strong&gt; Pasting full stack traces and logs gives it the context to diagnose accurately.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Trust but verify.&lt;/strong&gt; Copilot's suggestions are usually good, but always run the tests.&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  The Pipeline Today
&lt;/h2&gt;

&lt;p&gt;Here's what happens when you drop a gameplay video into AutoShorts:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Scene Detection&lt;/strong&gt; — GPU-accelerated analysis finds candidate moments using audio spikes + motion detection&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AI Ranking&lt;/strong&gt; — Vision AI (Gemini/OpenAI) watches each clip and scores it across 7 semantic categories&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Deep Analysis&lt;/strong&gt; &lt;em&gt;(optional)&lt;/em&gt; — GPU-downscaled proxy uploaded to Gemini for context-aware moment detection&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Smart Selection&lt;/strong&gt; — Diverse category selection ensures variety (not just all "action" clips)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GPU Rendering&lt;/strong&gt; — NVENC hardware encoding creates vertical crops with blurred backgrounds&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Caption Generation&lt;/strong&gt; — AI writes contextual captions matching the clip's energy&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Voice Synthesis&lt;/strong&gt; — Qwen3-TTS creates matching voiceovers with style-appropriate personalities&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Timing Sync&lt;/strong&gt; — Subtitle timing synchronized with actual TTS audio duration&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Smart Mixing&lt;/strong&gt; — Game audio ducked during voiceover, video extended if TTS runs long&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Total processing time: ~5-7 minutes per clip on an RTX 3080.&lt;/p&gt;




&lt;h2&gt;
  
  
  Analysis Modes &amp;amp; Cost
&lt;/h2&gt;

&lt;p&gt;AutoShorts supports four analysis modes, each with different tradeoffs between &lt;strong&gt;cost&lt;/strong&gt;, &lt;strong&gt;accuracy&lt;/strong&gt;, and &lt;strong&gt;speed&lt;/strong&gt;. You choose the mode via environment variables — no code changes needed.&lt;/p&gt;

&lt;h3&gt;
  
  
  How Each Mode Works
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;🔧 Local Heuristics Only&lt;/strong&gt; (&lt;code&gt;AI_PROVIDER=local&lt;/code&gt;)&lt;/p&gt;

&lt;p&gt;Zero API calls. Scenes are scored purely on GPU-computed signals:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Audio RMS&lt;/strong&gt; — Loudness spikes (explosions, crowd reactions, voice peaks).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Spectral Flux&lt;/strong&gt; — Sudden frequency changes (gunshots, impacts, glass breaking).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Visual Motion&lt;/strong&gt; — Pixel-diff action scoring via GPU-accelerated grayscale diffing.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All three signals are computed in a single pass using PyTorch on GPU. Scenes are ranked by a combined &lt;strong&gt;&lt;code&gt;0.6 × Audio (RMS + Flux) + 0.4 × Visual Motion&lt;/code&gt;&lt;/strong&gt; score. Fast, free, and surprisingly effective for high-action content — but blind to &lt;em&gt;context&lt;/em&gt; (it can't tell a celebration from a firefight).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;🖼️ OpenAI Vision&lt;/strong&gt; (&lt;code&gt;AI_PROVIDER=openai&lt;/code&gt;)&lt;/p&gt;

&lt;p&gt;Heuristics first narrow the field using &lt;strong&gt;Smart Selection&lt;/strong&gt; (70% top scores + 30% random exploration), then candidates are sent to OpenAI. OpenAI's API doesn't accept video, so we extract &lt;strong&gt;8 keyframe JPEGs&lt;/strong&gt; per clip:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Extract 8 static frames as base64 JPEGs
&lt;/span&gt;&lt;span class="n"&gt;cmd&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ffmpeg&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-i&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;clip_path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-vf&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;fps=1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-frames:v&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;8&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;...]&lt;/span&gt;
&lt;span class="c1"&gt;# Send as image_url content to GPT-4o
&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;image_url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;image_url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;data:image/jpeg;base64,&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;frame&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The AI scores each clip across 7 semantic categories (action, funny, clutch, wtf, epic_fail, hype, skill). Good accuracy from static frames alone, but it &lt;em&gt;can't hear audio&lt;/em&gt; and misses motion-dependent moments like glitches or physics bugs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;🎬 Gemini Per-Clip&lt;/strong&gt; (&lt;code&gt;AI_PROVIDER=gemini&lt;/code&gt;)&lt;/p&gt;

&lt;p&gt;Uses the same &lt;strong&gt;Smart Selection&lt;/strong&gt; (mixing high-heuristic clips with &lt;strong&gt;random segments&lt;/strong&gt; for diversity), but uploads each candidate as &lt;strong&gt;actual video&lt;/strong&gt; (downscaled to 640px wide). Gemini sees motion, timing, and audio:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Each candidate clip: 640p downscaled, ~30-60s, uploaded as MP4
&lt;/span&gt;&lt;span class="n"&gt;video_file&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;files&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;upload&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;clip_data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mime_type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;video/mp4&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;models&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate_content&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gemini-3-flash&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;contents&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;video_file&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Significantly better at detecting &lt;em&gt;funny&lt;/em&gt;, &lt;em&gt;wtf&lt;/em&gt;, and &lt;em&gt;clutch&lt;/em&gt; moments that depend on temporal context. Clips are analyzed in parallel (3 concurrent threads) to keep latency manageable.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;🧠 Gemini Deep Analysis&lt;/strong&gt; (&lt;code&gt;GEMINI_DEEP_ANALYSIS=true&lt;/code&gt;)&lt;/p&gt;

&lt;p&gt;The nuclear option. Instead of pre-filtering with heuristics then analyzing clips, Deep Analysis lets Gemini see the &lt;strong&gt;entire video&lt;/strong&gt; — but not the raw multi-GB 4K file. A GPU-accelerated proxy is generated first:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;4K @ 60fps → 640p @ 1fps, QP 35, mono 32kbps audio
~30GB gameplay recording -&amp;gt; ~15MB proxy
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Gemini watches the whole thing and returns timestamped moments with categories and scores. No heuristic bias, no missed context. The AI finds narrative arcs — the buildup before a clutch play, the reaction after an epic fail — that clip-by-clip analysis simply can't detect.&lt;/p&gt;

&lt;p&gt;Deep Analysis moments are scored with a &lt;code&gt;+200&lt;/code&gt; bias to ensure they rank above any heuristic candidate. A few high-action heuristic backups are still included as safety net clips.&lt;/p&gt;

&lt;h3&gt;
  
  
  Comparison Summary (1-hour 4K gameplay)
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Mode&lt;/th&gt;
&lt;th&gt;Accuracy&lt;/th&gt;
&lt;th&gt;Analysis Cost&lt;/th&gt;
&lt;th&gt;Creative Cost*&lt;/th&gt;
&lt;th&gt;Total Cost&lt;/th&gt;
&lt;th&gt;Data Uploaded&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Local Heuristics&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;⭐⭐&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Free&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;Free&lt;/strong&gt; (Whisper)&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Free&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;0 bytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;OpenAI Vision&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;⭐⭐⭐&lt;/td&gt;
&lt;td&gt;~\$0.15&lt;/td&gt;
&lt;td&gt;~\$0.15&lt;/td&gt;
&lt;td&gt;~\$0.30&lt;/td&gt;
&lt;td&gt;~6MB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Gemini Per-Clip&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;⭐⭐⭐⭐&lt;/td&gt;
&lt;td&gt;~\$0.08&lt;/td&gt;
&lt;td&gt;~\$0.08&lt;/td&gt;
&lt;td&gt;~\$0.16&lt;/td&gt;
&lt;td&gt;~90MB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Gemini Deep Analysis&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;⭐⭐⭐⭐⭐&lt;/td&gt;
&lt;td&gt;~\$0.05&lt;/td&gt;
&lt;td&gt;~\$0.08&lt;/td&gt;
&lt;td&gt;~\$0.13&lt;/td&gt;
&lt;td&gt;~60MB&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;*Creative Cost:&lt;/strong&gt; Includes AI caption generation (LLM API call) + Voiceover synthesized locally (Free).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The counterintuitive result:&lt;/strong&gt; Deep Analysis is the most cost-effective mode because it replaces 15 individual analysis uploads with one optimized proxy upload, while still delivering superior context-aware detection.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Roadmap &amp;amp; Vision
&lt;/h2&gt;

&lt;p&gt;AutoShorts works today as a local pipeline for content creators. But the underlying engine — scene detection, AI ranking, voice synthesis, smart cropping — is a &lt;strong&gt;general-purpose highlight extraction backend&lt;/strong&gt;. Here's where this is heading:&lt;/p&gt;

&lt;h3&gt;
  
  
  🔮 What's Next
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Phase&lt;/th&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;Status&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;v2.1&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Universal Video Type Support (Podcasts, Sports, Entertainment, etc.)&lt;/td&gt;
&lt;td&gt;🔜 Planned&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;v2.2&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;SFX generation — AI-generated sound effects matched to on-screen action&lt;/td&gt;
&lt;td&gt;🔜 Planned&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;v2.3&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Cloud API mode (submit video URL → get clips back)&lt;/td&gt;
&lt;td&gt;📐 Designing&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;v3.0&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Live stream monitoring (detect highlights in real-time)&lt;/td&gt;
&lt;td&gt;🔬 Research&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;v3.x&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Multi-platform auto-upload (TikTok, YouTube Shorts, Reels)&lt;/td&gt;
&lt;td&gt;📋 Backlog&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  🎮 Platform Integration Potential
&lt;/h3&gt;

&lt;p&gt;The most exciting future isn't AutoShorts as a standalone tool — it's AutoShorts as a &lt;strong&gt;backend engine&lt;/strong&gt; embedded in platforms millions of gamers already use:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Microsoft Xbox Game Bar&lt;/strong&gt; — The overlay already captures screenshots and gameplay recordings (&lt;code&gt;Win+G&lt;/code&gt;). Imagine a "Generate Highlights" button that takes your captured footage and produces ready-to-share clips with captions and voiceover — &lt;em&gt;without ever leaving the overlay.&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;NVIDIA ShadowPlay&lt;/strong&gt; — ShadowPlay's Instant Replay already silently records the last 30 seconds to 20 minutes of gameplay. Pair that buffer with AutoShorts' AI ranking, and ShadowPlay could &lt;em&gt;automatically identify and export your best moments&lt;/em&gt; with professional-grade overlays and narration. No scrubbing through footage. No editing. Just play.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Discord Activity Integration&lt;/strong&gt; — Post-session highlight reels generated from screen shares, dropped directly into your server channel.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The core thesis: &lt;strong&gt;highlight detection + voice synthesis + smart cropping&lt;/strong&gt; is infrastructure, not an app. Every platform that captures gameplay footage could use this engine to turn passive recording into active content creation.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;The best highlight reel is the one you never had to make.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Acknowledgements
&lt;/h2&gt;

&lt;p&gt;This project builds upon:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/artryazanov/shorts-maker-gpu" rel="noopener noreferrer"&gt;artryazanov/shorts-maker-gpu&lt;/a&gt;&lt;/strong&gt; — GPU-accelerated clip extraction using heuristic scoring (audio dB + motion detection).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/Binary-Bytes/Auto-YouTube-Shorts-Maker" rel="noopener noreferrer"&gt;Binary-Bytes/Auto-YouTube-Shorts-Maker&lt;/a&gt;&lt;/strong&gt; — Original concept and inspiration for the automated short-form content pipeline.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/QwenLM/Qwen3-TTS" rel="noopener noreferrer"&gt;Qwen3-TTS&lt;/a&gt;&lt;/strong&gt; — Voice synthesis with natural language design&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/francozanardi/pycaps" rel="noopener noreferrer"&gt;PyCaps&lt;/a&gt;&lt;/strong&gt; — Animated subtitle rendering&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Key Improvements Over Base Project
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;Base Project&lt;/th&gt;
&lt;th&gt;AutoShorts&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Architecture&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Monolithic script&lt;/td&gt;
&lt;td&gt;Modular package with lifecycle management&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Scene Scoring&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Audio dB + motion only&lt;/td&gt;
&lt;td&gt;Hybrid: heuristics + Vision AI semantic analysis&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Deep Analysis&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;td&gt;Full-video Gemini analysis for context-aware detection&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Voiceover&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;None&lt;/td&gt;
&lt;td&gt;Qwen3-TTS with style-adaptive voice design&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Captions&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;None&lt;/td&gt;
&lt;td&gt;AI-generated, 10+ styles including story modes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;CJK Support&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;td&gt;Character-based subtitle chunking for JCK languages&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Memory&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Single model&lt;/td&gt;
&lt;td&gt;VRAM-aware model sequencing (unload between phases)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;TTS Sync&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;td&gt;Per-sentence TTS generation for accurate timing&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Overflow Handling&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;td&gt;Re-render clips when TTS &amp;gt; video length&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Try It Yourself
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;
&lt;span class="c"&gt;# Clone the repository&lt;/span&gt;
git clone https://github.com/divyaprakash0426/autoshorts.git
&lt;span class="nb"&gt;cd &lt;/span&gt;autoshorts

&lt;span class="c"&gt;# Setup environment variables&lt;/span&gt;
&lt;span class="nb"&gt;cp&lt;/span&gt; .env.example .env
&lt;span class="c"&gt;# Edit .env and add your API keys (Gemini/OpenAI) &lt;/span&gt;

&lt;span class="c"&gt;# Option 1: Using Makefile (Recommended)&lt;/span&gt;

make &lt;span class="nb"&gt;install&lt;/span&gt;

&lt;span class="c"&gt;# Option 2: Using Shell Script&lt;/span&gt;
./install.sh

&lt;span class="c"&gt;# Drop videos in gameplay/, then run:&lt;/span&gt;
./.venv/bin/python run.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or launch the dashboard:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;./.venv/bin/streamlit run src/dashboard/About.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  🛡️ Battle Tested On
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Asus Zephyrus G16&lt;/strong&gt; (RTX 4080 Mobile, Intel Ultra 9) running Arch Linux.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Built with frustration, caffeine, and GitHub Copilot CLI.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>devchallenge</category>
      <category>githubchallenge</category>
      <category>cli</category>
      <category>githubcopilot</category>
    </item>
    <item>
      <title>Building AutoShorts: A High-Performance AI Pipeline for Automated Viral Content 🎬🤖</title>
      <dc:creator>divyaprakash D</dc:creator>
      <pubDate>Sat, 24 Jan 2026 14:59:45 +0000</pubDate>
      <link>https://forem.com/divyaprakash_d_2d5d085bd4/building-autoshorts-a-high-performance-ai-pipeline-for-automated-viral-content-g5i</link>
      <guid>https://forem.com/divyaprakash_d_2d5d085bd4/building-autoshorts-a-high-performance-ai-pipeline-for-automated-viral-content-g5i</guid>
      <description>&lt;h2&gt;
  
  
  The Problem: Content Creation is a Bottleneck
&lt;/h2&gt;

&lt;p&gt;Every creator knows the "highlight reel" struggle. You have hours of high-quality gameplay footage, but finding that perfect 30-second clip, cropping it, adding subtitles, and layering a voiceover takes hours of manual labor.&lt;br&gt;
I wanted to see if I could build a &lt;strong&gt;fully automated, high-performance pipeline&lt;/strong&gt; to handle this from start to finish. Today, I'm open-sourcing &lt;strong&gt;AutoShorts&lt;/strong&gt;.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftgsxcr48ugtjwf0ibmso.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftgsxcr48ugtjwf0ibmso.png" alt="AutoShorts Architecture Architecture" width="800" height="800"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What is AutoShorts?
&lt;/h2&gt;

&lt;p&gt;AutoShorts is a GPU-optimized CLI tool that analyzes long-form video, identifies high-engagement scenes using AI, and synthesizes them into ready-to-upload vertical shorts. &lt;br&gt;
It doesn't just "cut" video; it understands it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Technical Deep Dive 🛠️
&lt;/h2&gt;

&lt;p&gt;To keep processing times low and avoid massive cloud API bills, I focused heavily on local processing and hardware acceleration:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. GPU Scene Analysis ⚡
&lt;/h3&gt;

&lt;p&gt;Using &lt;code&gt;decord&lt;/code&gt; and &lt;code&gt;PyTorch&lt;/code&gt;, the pipeline performs frame extraction and visual feature analysis directly on the GPU. We calculate action density and spectral flux to find "loud" or "fast" moments before the text-based AI even sees the clip.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Dual-AI Intelligence 🧠
&lt;/h3&gt;

&lt;p&gt;The pipeline integrates with &lt;strong&gt;OpenAI (GPT-4o)&lt;/strong&gt; and &lt;strong&gt;Google Gemini&lt;/strong&gt;. We pass the metadata and scene descriptions to the LLM to score segments based on:&lt;br&gt;
&lt;strong&gt;Hook Potential&lt;/strong&gt;: Is the start grabby?&lt;br&gt;
&lt;strong&gt;Relevance&lt;/strong&gt;: Does the action make sense?&lt;br&gt;
&lt;strong&gt;Emotional Impact&lt;/strong&gt;: Is it funny, impressive, or a "fail"?&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Smart Subtitles &amp;amp; Neural TTS 🗣️
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Local TTS&lt;/strong&gt;: Instead of paid APIs, we use &lt;strong&gt;ChatterBox&lt;/strong&gt; locally. It supports emotional prosody, so the voiceover doesn't sound like a monotone robot.&lt;br&gt;
&lt;strong&gt;PyCaps Renderer&lt;/strong&gt;: We use a custom Playwright-based renderer to create those "MrBeast style" word-by-word animated captions that are essential for mobile retention.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. NVENC Rendering 🎞️
&lt;/h3&gt;

&lt;p&gt;Final assembly—including audio mixing, blurring backgrounds (for the vertical look), and burning in subtitles—is offloaded to &lt;strong&gt;NVIDIA’s NVENC hardware&lt;/strong&gt;. This keeps the CPU free for other tasks and slashes render times.&lt;/p&gt;

&lt;h2&gt;
  
  
  🚧 What’s Next? (The Roadmap)
&lt;/h2&gt;

&lt;p&gt;This is a v1.0 release, and while the pipeline is robust, the potential for enhancement is huge. I’m looking for contributors to help with:&lt;br&gt;
&lt;strong&gt;Upgrading the Voice Engine&lt;/strong&gt;: Integrating more recent open-source models like &lt;strong&gt;ChatterBoxTurbo&lt;/strong&gt;, &lt;strong&gt;Qwen-TTS&lt;/strong&gt;, or &lt;strong&gt;NVIDIA’s latest TTS&lt;/strong&gt; for even more realistic voice cloning and prosody.&lt;br&gt;
&lt;strong&gt;Intelligent Auto-Zoom&lt;/strong&gt;: Currently, the 9:16 crop is centered. Adding object detection (YOLO/RT-DETR) to &lt;strong&gt;follow the action&lt;/strong&gt;—dynamically moving the crop window to follow a character or a vehicle.&lt;br&gt;
&lt;strong&gt;Advanced Transition Styles&lt;/strong&gt;: Adding AI-generated transitions between merged scenes.&lt;/p&gt;

&lt;h2&gt;
  
  
  Build With Me 🚀
&lt;/h2&gt;

&lt;p&gt;The project is fully dockerized and open for contributions. Whether you're interested in machine learning, computer vision, or just want to automate your own YouTube channel, I'd love to see you in the PRs.&lt;br&gt;
&lt;strong&gt;GitHub Repository:&lt;/strong&gt; &lt;a href="https://github.com/divyaprakash0426/autoshorts" rel="noopener noreferrer"&gt;github.com/divyaprakash0426/autoshorts&lt;/a&gt;&lt;br&gt;
&lt;em&gt;A huge thanks to the original concepts from &lt;a href="https://github.com/artryazanov/shorts-maker-gpu" rel="noopener noreferrer"&gt;artryazanov&lt;/a&gt; and &lt;a href="https://github.com/Binary-Bytes/Auto-YouTube-Shorts-Maker" rel="noopener noreferrer"&gt;Binary-Bytes&lt;/a&gt; which provided the foundation for this refactored release.&lt;/em&gt;&lt;br&gt;
&lt;strong&gt;What features would you add to an AI video pipeline like this? Let's discuss in the comments! 👇&lt;/strong&gt;&lt;/p&gt;

</description>
      <category>python</category>
      <category>ai</category>
      <category>opensource</category>
      <category>showdev</category>
    </item>
  </channel>
</rss>
