<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Cian</title>
    <description>The latest articles on Forem by Cian (@mutaician).</description>
    <link>https://forem.com/mutaician</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1071931%2F29ff7a98-a20e-4902-884d-00465e4c3e24.png</url>
      <title>Forem: Cian</title>
      <link>https://forem.com/mutaician</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/mutaician"/>
    <language>en</language>
    <item>
      <title>I built Persite because I was tired of guessing GPU costs in my head</title>
      <dc:creator>Cian</dc:creator>
      <pubDate>Fri, 13 Mar 2026 19:51:10 +0000</pubDate>
      <link>https://forem.com/mutaician/i-built-persite-because-i-was-tired-of-guessing-gpu-costs-in-my-head-5837</link>
      <guid>https://forem.com/mutaician/i-built-persite-because-i-was-tired-of-guessing-gpu-costs-in-my-head-5837</guid>
      <description>&lt;p&gt;When this started, I was not trying to build a hackathon project.&lt;/p&gt;

&lt;p&gt;I was trying to rent GPUs for my ML project.&lt;/p&gt;

&lt;p&gt;What I wanted was simple:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;price per hour&lt;/li&gt;
&lt;li&gt;in my local currency&lt;/li&gt;
&lt;li&gt;visible immediately&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;What I got on many sites was the opposite: generic hero copy, too many sections, and pricing buried somewhere I had to hunt for. Then I had to mentally convert currency and estimate actual cost.&lt;/p&gt;

&lt;p&gt;That friction became the core idea behind Persite.&lt;/p&gt;

&lt;p&gt;Persite is a locale-aware and intent-aware personalization system. It tries to answer this:&lt;/p&gt;

&lt;p&gt;If someone arrives with clear intent, why are we still forcing everyone through the same static page?&lt;/p&gt;

&lt;h2&gt;
  
  
  The problem I wanted to solve
&lt;/h2&gt;

&lt;p&gt;Most websites treat a buyer in Kenya the same as a buyer in the US or Germany. Same message, same CTA, same layout priority.&lt;/p&gt;

&lt;p&gt;I think that is a product problem, not just a translation problem.&lt;/p&gt;

&lt;p&gt;The key point for me:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;translation alone is not enough&lt;/li&gt;
&lt;li&gt;intent alone is not enough&lt;/li&gt;
&lt;li&gt;locale plus intent is where it starts making sense&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I also wanted a privacy-friendly approach. I did not want profile tracking or long-term behavior graphs. I wanted lightweight signals:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;locale&lt;/li&gt;
&lt;li&gt;URL params&lt;/li&gt;
&lt;li&gt;UTM data&lt;/li&gt;
&lt;li&gt;referrer&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What I built
&lt;/h2&gt;

&lt;p&gt;I built two surfaces:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;A landing page that adapts full-page content based on landing intent (&lt;code&gt;judge&lt;/code&gt;, &lt;code&gt;github&lt;/code&gt;, &lt;code&gt;investor&lt;/code&gt;, &lt;code&gt;browse&lt;/code&gt;) and locale.&lt;/li&gt;
&lt;li&gt;A demo e-commerce store (&lt;code&gt;/demo&lt;/code&gt;) that adapts hero copy, product ordering behavior, and localized product description content.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Both surfaces have a draggable control panel so you can switch locale and intent quickly and see why a variant was selected.&lt;/p&gt;

&lt;h2&gt;
  
  
  High-level architecture
&lt;/h2&gt;

&lt;p&gt;The flow is straightforward:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Detect signals (locale + intent)&lt;/li&gt;
&lt;li&gt;Choose a variant with deterministic rules&lt;/li&gt;
&lt;li&gt;Send selected content to Lingo API for localization&lt;/li&gt;
&lt;li&gt;Render localized result&lt;/li&gt;
&lt;li&gt;Expose decision metadata in panel&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;I kept the logic explicit and finite on purpose. I wanted a demo that can be explained under time pressure.&lt;/p&gt;

&lt;h2&gt;
  
  
  The code that made the project real
&lt;/h2&gt;

&lt;p&gt;This is the core hero localization flow in my API route:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;step3Decision&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;buildStep3Decision&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;intent&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;intentDetection&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;intent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;source&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;intentDetection&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;source&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;localizablePayload&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;LocalizablePayload&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;headline&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;step3Decision&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;template&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;baseContent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;headline&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;subheadline&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;step3Decision&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;template&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;baseContent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;subheadline&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;ctaLabel&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;step3Decision&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;template&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;baseContent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ctaLabel&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;localizationResponse&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;LINGO_LOCALIZE_URL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;method&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;POST&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;X-API-Key&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;apiKey&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Content-Type&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;application/json&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="na"&gt;body&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="nx"&gt;engineId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;sourceLocale&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;en&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;targetLocale&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;localeDetection&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;locale&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;localizablePayload&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;}),&lt;/span&gt;
  &lt;span class="na"&gt;cache&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;no-store&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That block is where static template content becomes locale-adapted output.&lt;/p&gt;

&lt;p&gt;And this is the part that saved me when product localization became too slow:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;chunks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;chunkProducts&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;toTranslate&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;CHUNK_SIZE&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="nx"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="nx"&gt;MAX_PARALLEL_CHUNKS&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;group&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;slice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nx"&gt;MAX_PARALLEL_CHUNKS&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;groupResults&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;all&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;group&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;localizeChunk&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;)));&lt;/span&gt;

  &lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;groupResults&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nb"&gt;Object&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;assign&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;translatedByKey&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That was a practical turning point.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where the clean design broke
&lt;/h2&gt;

&lt;p&gt;My initial clean idea was bigger: make this portable as a script that works on any website.&lt;/p&gt;

&lt;p&gt;Reality check: every website has a different structure, different content ownership, and different component boundaries.&lt;/p&gt;

&lt;p&gt;So I narrowed scope to a controlled environment where I could show the value clearly:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;deterministic variant model&lt;/li&gt;
&lt;li&gt;explainable decisions&lt;/li&gt;
&lt;li&gt;real localization behavior&lt;/li&gt;
&lt;li&gt;fast enough demo interactions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That scope cut is honestly what made it shippable.&lt;/p&gt;

&lt;h2&gt;
  
  
  Biggest pain point and messy workaround
&lt;/h2&gt;

&lt;p&gt;The biggest pain was localization latency when payloads got large.&lt;/p&gt;

&lt;p&gt;I hit situations where requests were taking far too long. I ended up doing a mix of:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;parallel chunked localization&lt;/li&gt;
&lt;li&gt;selective localization (only high-impact copy)&lt;/li&gt;
&lt;li&gt;caching by locale + intent + content key&lt;/li&gt;
&lt;li&gt;fallback paths for missing translations&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It is not the purest architecture, but it crossed the finish line and stayed understandable.&lt;/p&gt;

&lt;h2&gt;
  
  
  Trade-offs I accepted
&lt;/h2&gt;

&lt;p&gt;I made deliberate trade-offs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;I did not localize everything through the API.&lt;/li&gt;
&lt;li&gt;I used hybrid content strategy:

&lt;ul&gt;
&lt;li&gt;dynamic/high-impact copy through API&lt;/li&gt;
&lt;li&gt;repeated UI labels via locale dictionaries&lt;/li&gt;
&lt;li&gt;technical product names/spec tokens kept unchanged&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;I optimized for demo clarity over maximum abstraction.&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;I think this was the right call for a hackathon MVP.&lt;/p&gt;

&lt;h2&gt;
  
  
  One extra thing I intentionally modeled
&lt;/h2&gt;

&lt;p&gt;In the demo store data, I also reflected a real market situation: GPU and RAM prices being elevated due to AI-era supply pressure.&lt;/p&gt;

&lt;p&gt;That was intentional. I wanted the demo to feel like it understands real buyer context, not just UI translation.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to run it
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/mutaician/persite
&lt;span class="nb"&gt;cd &lt;/span&gt;persite
pnpm &lt;span class="nb"&gt;install&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Create &lt;code&gt;.env&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;LINGO_API_KEY=your_lingo_api_key
LINGO_ENGINE_ID=your_lingo_engine_id
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Run dev server:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pnpm dev
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Build check:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pnpm run build
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Useful routes to test:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;http://localhost:3000/?intent=judge&amp;amp;locale=de-DE&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;http://localhost:3000/?intent=investor&amp;amp;locale=sw-KE&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;http://localhost:3000/demo?intent=compare&amp;amp;locale=fr-FR&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;http://localhost:3000/demo?intent=budget&amp;amp;locale=pt-BR&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What I would build next
&lt;/h2&gt;

&lt;p&gt;The missing piece is portability.&lt;/p&gt;

&lt;p&gt;I want a reusable integration layer that can plug into arbitrary websites and personalize key surfaces (especially pricing and plan-selection pages) based on intent and locale, without requiring each team to rewrite their whole frontend.&lt;/p&gt;

&lt;p&gt;That is where this can move from a strong demo to a deployable product.&lt;/p&gt;

&lt;h2&gt;
  
  
  Links
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Live demo: &lt;a href="https://persite-seven.vercel.app/" rel="noopener noreferrer"&gt;https://persite-seven.vercel.app/&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Repository: &lt;a href="https://github.com/mutaician/persite" rel="noopener noreferrer"&gt;https://github.com/mutaician/persite&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>hackathon</category>
      <category>lingo</category>
      <category>ai</category>
      <category>localization</category>
    </item>
    <item>
      <title>Gemini Became My Entire Hackathon Team — How a Solo Dev in Kenya Won His First MLH Prize Building RepoX</title>
      <dc:creator>Cian</dc:creator>
      <pubDate>Tue, 03 Mar 2026 14:46:48 +0000</pubDate>
      <link>https://forem.com/mutaician/gemini-became-my-entire-hackathon-team-how-a-solo-dev-in-kenya-won-his-first-mlh-prize-building-2k27</link>
      <guid>https://forem.com/mutaician/gemini-became-my-entire-hackathon-team-how-a-solo-dev-in-kenya-won-his-first-mlh-prize-building-2k27</guid>
      <description>&lt;p&gt;&lt;em&gt;This is a submission for the &lt;a href="https://dev.to/challenges/mlh-built-with-google-gemini-02-25-26"&gt;Built with Google Gemini: Writing Challenge&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Built with Google Gemini
&lt;/h2&gt;

&lt;p&gt;Picture this: It’s 3 a.m.. My first-ever MLH hackathon. I’m staring at a blank screen, heart racing, knowing I’m completely outmatched by teams with years of experience.&lt;/p&gt;

&lt;p&gt;Then I opened &lt;strong&gt;Google Antigravity&lt;/strong&gt; — and everything changed.&lt;/p&gt;

&lt;p&gt;I built &lt;strong&gt;RepoX&lt;/strong&gt;: an interactive platform that turns any public GitHub repository into a living, breathing learning adventure.&lt;/p&gt;

&lt;p&gt;No more getting lost in massive codebases. RepoX gives you:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A stunning D3.js force-directed graph that maps every file and its relationships like a neural network&lt;/li&gt;
&lt;li&gt;Instant AI-powered explanations for any file (including “Explain Like I’m 5” mode that actually makes sense)&lt;/li&gt;
&lt;li&gt;Smart personalized learning paths — the AI reads the entire repo and tells you the exact smartest order to explore it&lt;/li&gt;
&lt;li&gt;Progress checklists and history so you never lose momentum&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The crazy part? The app itself runs &lt;strong&gt;on Gemini&lt;/strong&gt;. Every explanation and learning path is generated live by the Gemini API (securely routed through Cloudflare Workers).&lt;/p&gt;

&lt;p&gt;And yes — this project won &lt;strong&gt;Best AI Application Built with Cloudflare&lt;/strong&gt; at Hacks for Hackers 2026. My very first hackathon… and I took home a prize. I still can’t believe it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Demo
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Live app&lt;/strong&gt; (paste any GitHub repo and watch the magic): &lt;a href="https://main.repox.pages.dev" rel="noopener noreferrer"&gt;https://main.repox.pages.dev&lt;/a&gt;  &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Full YouTube demo&lt;/strong&gt;: &lt;br&gt;


  &lt;iframe src="https://www.youtube.com/embed/8m2kGGZEJTw"&gt;
  &lt;/iframe&gt;


  &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Devpost&lt;/strong&gt;: &lt;a href="https://devpost.com/software/repox" rel="noopener noreferrer"&gt;https://devpost.com/software/repox&lt;/a&gt;&lt;br&gt;&lt;br&gt;
&lt;strong&gt;GitHub&lt;/strong&gt;: &lt;a href="https://github.com/mutaician/RepoX" rel="noopener noreferrer"&gt;https://github.com/mutaician/RepoX&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Learned
&lt;/h2&gt;

&lt;p&gt;This wasn’t just a hackathon project — it was my crash course in what happens when you stop coding &lt;em&gt;alone&lt;/em&gt; and start coding &lt;em&gt;with&lt;/em&gt; an AI teammate.&lt;/p&gt;

&lt;p&gt;I went from zero D3.js experience to building a smooth, responsive graph that handles thousands of nodes. I learned secure API proxying on Cloudflare Workers under extreme time pressure. I mastered prompt engineering at a level I never thought possible — crafting system prompts so precise that Gemini would output perfectly formatted learning paths every single time.&lt;/p&gt;

&lt;p&gt;Most importantly, I learned that one determined developer + the right Gemini workflow can outpace entire traditional teams. The confidence this gave me is something no tutorial could ever provide.&lt;/p&gt;

&lt;h2&gt;
  
  
  Google Gemini Feedback
&lt;/h2&gt;

&lt;p&gt;Gemini wasn’t a tool. It was my co-founder, my senior dev, my QA tester, and my creative director — all in one.&lt;/p&gt;

&lt;p&gt;I used &lt;strong&gt;Antigravity&lt;/strong&gt; (Google’s agentic IDE) the entire time:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Gemini 3 Pro&lt;/strong&gt; handled the heavy lifting — autonomously designing the learning-path algorithm, reasoning through complex repo analysis, and even suggesting UI tweaks that made the graph feel alive.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Gemini 3 Flash&lt;/strong&gt; was my speed demon — instantly generating UI components, ELI5 explanations, and quick fixes while I kept momentum.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Gemini 2.5&lt;/strong&gt; was the reliable fallback when context got too big on massive repos.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;What blew me away:&lt;/strong&gt;&lt;br&gt;
The agentic flow was unreal. I’d describe a feature once, and Antigravity would plan, code, debug, and iterate — often better than I would have done myself. The personalized learning paths Gemini 3.1 created were scarily good — logical, educational, and genuinely helpful.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Where it got messy (keeping it real):&lt;/strong&gt;&lt;br&gt;
Larger repos sometimes overwhelmed the context window and the agent would start hallucinating relationships or going off on wild creative tangents. I had to get surgical with my prompts and occasionally switch models. Response formatting could be inconsistent (markdown breaking in weird places), and yes, the token costs added up during heavy 3.1 sessions.&lt;/p&gt;

&lt;p&gt;But here’s the truth: Without this exact multi-model + Antigravity setup, RepoX would still be a half-finished idea on my laptop. Gemini didn’t just help me finish — it helped me win my first hackathon.&lt;/p&gt;

&lt;p&gt;From a nervous solo dev in Kenya to MLH prize winner in 48 hours. That’s the power of Google Gemini.&lt;/p&gt;

&lt;p&gt;Thanks for reading my story — can’t wait to see what we build next. 🚀&lt;/p&gt;

</description>
      <category>devchallenge</category>
      <category>geminireflections</category>
      <category>gemini</category>
      <category>hackathon</category>
    </item>
    <item>
      <title>I Built an AI That Can See Your Arduino and Write the Code For It</title>
      <dc:creator>Cian</dc:creator>
      <pubDate>Fri, 27 Feb 2026 17:49:51 +0000</pubDate>
      <link>https://forem.com/mutaician/i-built-an-ai-that-can-see-your-arduino-and-write-the-code-for-it-558l</link>
      <guid>https://forem.com/mutaician/i-built-an-ai-that-can-see-your-arduino-and-write-the-code-for-it-558l</guid>
      <description>&lt;p&gt;There is a specific frustration anyone who has worked with Arduino knows well.&lt;/p&gt;

&lt;p&gt;You have a breadboard in front of you. Components are wired up. You open a chat window, describe your setup in text — "I have an LED on pin 8 with a 220 ohm resistor" — copy the code the AI gives you, paste it into the Arduino IDE, hit upload, and watch the LED do nothing. You go back to the chat window. You describe what happened. You get a revised version. You copy it again.&lt;/p&gt;

&lt;p&gt;You do this five times before realizing the AI gave you code for pin 9 because you told it pin 8 and it added a one-line comment that said "change this to match your wiring" which you missed.&lt;/p&gt;

&lt;p&gt;Every AI coding assistant has this problem: they are blind to your physical setup.&lt;/p&gt;

&lt;p&gt;ArduinoVision is my attempt to fix that.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Idea
&lt;/h2&gt;

&lt;p&gt;The concept is simple enough to state in one sentence: an AI agent that can see your breadboard through a camera, write the correct Arduino code based on what it actually observes, and upload it directly to your board.&lt;/p&gt;

&lt;p&gt;No copy-paste. No IDE switching. No describing your wiring in text. You connect the components. The AI handles everything else.&lt;/p&gt;

&lt;p&gt;I built this for the Vision Possible: Agent Protocol hackathon by WeMakeDevs, and the core of it runs on the VisionAgents SDK by Stream.&lt;/p&gt;




&lt;h2&gt;
  
  
  What VisionAgents Makes Possible
&lt;/h2&gt;

&lt;p&gt;Before I get into the build, I want to explain why this project needed VisionAgents specifically — because that is not an obvious answer.&lt;/p&gt;

&lt;p&gt;The challenge with building a hardware coding agent is that it needs three things happening simultaneously and tightly integrated: it needs to see video (your camera), hear audio (your voice), reason about both together (the LLM), and take external actions (compile, upload). Wiring all of that together manually — WebRTC for the camera feed, a separate STT service, a separate LLM call, a separate TTS for the response — is a significant amount of infrastructure before you write a single line of the actual agent logic.&lt;/p&gt;

&lt;p&gt;VisionAgents collapses all of that into a few lines of Python.&lt;/p&gt;

&lt;p&gt;The relevant part of the agent setup looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;vision_agents.core&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;AgentLauncher&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;User&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Runner&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;vision_agents.plugins&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;getstream&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt;

&lt;span class="n"&gt;llm&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Realtime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-realtime&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;voice&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cedar&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fps&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;edge&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;getstream&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Edge&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="n"&gt;agent_user&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;User&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ArduinoVision&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;arduino-vision-agent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;instructions&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;SYSTEM_PROMPT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That is the entire transport and LLM setup. &lt;code&gt;getstream.Edge()&lt;/code&gt; handles the WebRTC infrastructure — video/audio in and out, connection management, reconnection logic. &lt;code&gt;openai.Realtime()&lt;/code&gt; handles speech-to-speech natively — no separate STT or TTS services, no intermediate text conversion, just audio in and audio out with video frames attached. Stream's edge network keeps the latency under 30ms, which matters when someone is physically holding a component in front of the camera.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;fps=1&lt;/code&gt; setting deserves a note. I initially had it at &lt;code&gt;fps=3&lt;/code&gt; and the audio quality was noticeably degraded — cutting out, pitch shifts mid-sentence. Dropping to one frame per second freed up the audio pipeline entirely. For identifying breadboard wiring, one frame per second is more than sufficient.&lt;/p&gt;




&lt;h2&gt;
  
  
  Registering Arduino Tools
&lt;/h2&gt;

&lt;p&gt;The agent's practical capability comes from tool registration. VisionAgents uses &lt;code&gt;@llm.register_function()&lt;/code&gt; to make Python functions callable by the model during conversation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@llm.register_function&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;List all connected Arduino boards. Returns port, board name, and FQBN. ALWAYS call this first to find the port needed for upload.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;list_boards&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;boards&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;list_arduino_boards&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;boards&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;found&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;boards&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;boards&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;message&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Found &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;boards&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; board(s). Use the &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;port&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt; for upload operations.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;found&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;boards&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[],&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;message&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;No Arduino boards detected.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I registered six tools in total: &lt;code&gt;list_boards&lt;/code&gt;, &lt;code&gt;write_code&lt;/code&gt;, &lt;code&gt;compile_code&lt;/code&gt;, &lt;code&gt;upload_code&lt;/code&gt;, &lt;code&gt;serial_monitor&lt;/code&gt;, and &lt;code&gt;deploy_code&lt;/code&gt; (which chains the previous three). Each one wraps a call to &lt;code&gt;arduino-cli&lt;/code&gt; on the system.&lt;/p&gt;

&lt;p&gt;What makes this work well in practice is that the model chains these calls naturally based on the conversation. The user says "make the LED blink." The model calls &lt;code&gt;list_boards&lt;/code&gt; to find the port, calls &lt;code&gt;write_code&lt;/code&gt; to save the sketch, then &lt;code&gt;deploy_code&lt;/code&gt; to compile and upload. The user did not ask it to do those steps in that order — the model inferred the sequence from context and tool descriptions.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Event System
&lt;/h2&gt;

&lt;p&gt;One thing I found genuinely useful during development was the event subscription API. Every tool call emits a &lt;code&gt;ToolStartEvent&lt;/code&gt; and &lt;code&gt;ToolEndEvent&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@agent.events.subscribe&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;on_tool_start&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ToolStartEvent&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;TOOL START: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tool_name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Args: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;arguments&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;indent&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nd"&gt;@agent.events.subscribe&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;on_tool_end&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ToolEndEvent&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;success&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;TOOL END: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tool_name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; (&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;execution_time_ms&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;ms)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;TOOL FAILED: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tool_name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; - &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;error&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When you are building a hardware-in-the-loop agent where failures are physical (LED does not blink, board does not respond), having a structured log of every tool call with arguments and timing is essential. It is also how I caught that the model was calling &lt;code&gt;deploy_code&lt;/code&gt; before the port permissions were correctly set — the error message was clear in the log instantly.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I Learned Building This
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Real-time video AI has a different failure mode than text AI.&lt;/strong&gt; With text AI, wrong output is obvious — you read it and fix the prompt. With video AI, wrong output means the board does not respond and you are staring at a stationary LED trying to figure out if the model misidentified the pin, or the code is wrong, or the upload failed, or the LED is wired backwards. Good observability (the event system) is not optional.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tool descriptions are more important than I expected.&lt;/strong&gt; The model's behaviour changed significantly based on how I phrased the tool descriptions. "Detect connected Arduino boards" caused the model to call it inconsistently. "List all connected Arduino boards. ALWAYS call this first to find the port needed for upload." made it call the tool reliably every time, in the right order.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Hardware-in-the-loop iteration is slow.&lt;/strong&gt; Software agents can iterate in milliseconds. Hardware agents have a four-second compile-upload cycle. This changes how you design the system — you want the model to be confident before it acts, not to try-and-retry. Good visual grounding (making sure the agent can clearly see the wiring before generating code) matters more than in pure software contexts.&lt;/p&gt;




&lt;h2&gt;
  
  
  What It Is Not (Yet)
&lt;/h2&gt;

&lt;p&gt;ArduinoVision is a hackathon prototype. Its scope right now is: AVR boards (Uno, Nano), basic GPIO (digital pins, LEDs, buttons), one board connected at a time. It does not handle I2C sensors, servo control, ESP32/ESP8266, or multi-board setups. These are natural extensions but they are not in this version.&lt;/p&gt;

&lt;p&gt;The interface also relies on the VisionAgents demo UI at demo.visionagents.ai rather than a custom frontend. For a prototype this is fine — building a custom WebRTC client is significant work that would have added nothing to the core idea.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Bigger Picture
&lt;/h2&gt;

&lt;p&gt;The thing that strikes me about this project is how little code it took to get something genuinely useful working. The Arduino tooling (list boards, write, compile, upload) is maybe 300 lines of Python. The agent setup is another 150. The entire relevant surface area is small.&lt;/p&gt;

&lt;p&gt;What VisionAgents provides is the hard part: real-time video transport, speech-to-speech latency that feels natural, and a clean function calling interface that the model uses reliably. Without that infrastructure being pre-built, this project would have been two weeks of WebRTC work before a single Arduino command got called.&lt;/p&gt;

&lt;p&gt;There is a real category of applications that becomes possible when AI agents can see physical environments and take actions based on what they observe. Hardware debugging is one. Lab automation is another. Physical quality control. Teaching environments where a student shows their circuit and gets immediate, accurate feedback.&lt;/p&gt;

&lt;p&gt;ArduinoVision is a small example of what that category looks like when the infrastructure is available.&lt;/p&gt;




&lt;h2&gt;
  
  
  Try It
&lt;/h2&gt;

&lt;p&gt;The code is on GitHub: &lt;a href="https://github.com/mutaician/arduino-vision" rel="noopener noreferrer"&gt;github.com/mutaician/arduino-vision&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You need a Stream account (free tier works), an OpenAI API key, Python 3.12, and arduino-cli. The README has full setup instructions. If you are on Windows, there are notes on forwarding the USB serial port to WSL.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Built for the Vision Possible: Agent Protocol hackathon by WeMakeDevs. Powered by VisionAgents SDK by Stream.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>arduino</category>
      <category>visionagents</category>
      <category>showdev</category>
    </item>
  </channel>
</rss>
