Forem: Cian

I built Persite because I was tired of guessing GPU costs in my head

Cian — Fri, 13 Mar 2026 19:51:10 +0000

When this started, I was not trying to build a hackathon project.

I was trying to rent GPUs for my ML project.

What I wanted was simple:

price per hour
in my local currency
visible immediately

What I got on many sites was the opposite: generic hero copy, too many sections, and pricing buried somewhere I had to hunt for. Then I had to mentally convert currency and estimate actual cost.

That friction became the core idea behind Persite.

Persite is a locale-aware and intent-aware personalization system. It tries to answer this:

If someone arrives with clear intent, why are we still forcing everyone through the same static page?

The problem I wanted to solve

Most websites treat a buyer in Kenya the same as a buyer in the US or Germany. Same message, same CTA, same layout priority.

I think that is a product problem, not just a translation problem.

The key point for me:

translation alone is not enough
intent alone is not enough
locale plus intent is where it starts making sense

I also wanted a privacy-friendly approach. I did not want profile tracking or long-term behavior graphs. I wanted lightweight signals:

locale
URL params
UTM data
referrer

What I built

I built two surfaces:

A landing page that adapts full-page content based on landing intent (judge, github, investor, browse) and locale.
A demo e-commerce store (/demo) that adapts hero copy, product ordering behavior, and localized product description content.

Both surfaces have a draggable control panel so you can switch locale and intent quickly and see why a variant was selected.

High-level architecture

The flow is straightforward:

Detect signals (locale + intent)
Choose a variant with deterministic rules
Send selected content to Lingo API for localization
Render localized result
Expose decision metadata in panel

I kept the logic explicit and finite on purpose. I wanted a demo that can be explained under time pressure.

The code that made the project real

This is the core hero localization flow in my API route:

const step3Decision = buildStep3Decision({
  intent: intentDetection.intent,
  source: intentDetection.source,
});

const localizablePayload: LocalizablePayload = {
  headline: step3Decision.template.baseContent.headline,
  subheadline: step3Decision.template.baseContent.subheadline,
  ctaLabel: step3Decision.template.baseContent.ctaLabel,
};

const localizationResponse = await fetch(LINGO_LOCALIZE_URL, {
  method: "POST",
  headers: {
    "X-API-Key": apiKey,
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    engineId,
    sourceLocale: "en",
    targetLocale: localeDetection.locale,
    data: localizablePayload,
  }),
  cache: "no-store",
});

That block is where static template content becomes locale-adapted output.

And this is the part that saved me when product localization became too slow:

const chunks = chunkProducts(toTranslate, CHUNK_SIZE);

for (let i = 0; i < chunks.length; i += MAX_PARALLEL_CHUNKS) {
  const group = chunks.slice(i, i + MAX_PARALLEL_CHUNKS);
  const groupResults = await Promise.all(group.map((chunk) => localizeChunk(chunk)));

  for (const result of groupResults) {
    Object.assign(translatedByKey, result);
  }
}

That was a practical turning point.

Where the clean design broke

My initial clean idea was bigger: make this portable as a script that works on any website.

Reality check: every website has a different structure, different content ownership, and different component boundaries.

So I narrowed scope to a controlled environment where I could show the value clearly:

deterministic variant model
explainable decisions
real localization behavior
fast enough demo interactions

That scope cut is honestly what made it shippable.

Biggest pain point and messy workaround

The biggest pain was localization latency when payloads got large.

I hit situations where requests were taking far too long. I ended up doing a mix of:

parallel chunked localization
selective localization (only high-impact copy)
caching by locale + intent + content key
fallback paths for missing translations

It is not the purest architecture, but it crossed the finish line and stayed understandable.

Trade-offs I accepted

I made deliberate trade-offs:

I did not localize everything through the API.
I used hybrid content strategy:
- dynamic/high-impact copy through API
- repeated UI labels via locale dictionaries
- technical product names/spec tokens kept unchanged
I optimized for demo clarity over maximum abstraction.

I think this was the right call for a hackathon MVP.

One extra thing I intentionally modeled

In the demo store data, I also reflected a real market situation: GPU and RAM prices being elevated due to AI-era supply pressure.

That was intentional. I wanted the demo to feel like it understands real buyer context, not just UI translation.

How to run it

git clone https://github.com/mutaician/persite
cd persite
pnpm install

Create .env:

LINGO_API_KEY=your_lingo_api_key
LINGO_ENGINE_ID=your_lingo_engine_id

Run dev server:

pnpm dev

Build check:

pnpm run build

Useful routes to test:

http://localhost:3000/?intent=judge&locale=de-DE
http://localhost:3000/?intent=investor&locale=sw-KE
http://localhost:3000/demo?intent=compare&locale=fr-FR
http://localhost:3000/demo?intent=budget&locale=pt-BR

What I would build next

The missing piece is portability.

I want a reusable integration layer that can plug into arbitrary websites and personalize key surfaces (especially pricing and plan-selection pages) based on intent and locale, without requiring each team to rewrite their whole frontend.

That is where this can move from a strong demo to a deployable product.

Gemini Became My Entire Hackathon Team — How a Solo Dev in Kenya Won His First MLH Prize Building RepoX

Cian — Tue, 03 Mar 2026 14:46:48 +0000

This is a submission for the Built with Google Gemini: Writing Challenge

What I Built with Google Gemini

Picture this: It’s 3 a.m.. My first-ever MLH hackathon. I’m staring at a blank screen, heart racing, knowing I’m completely outmatched by teams with years of experience.

Then I opened Google Antigravity — and everything changed.

I built RepoX: an interactive platform that turns any public GitHub repository into a living, breathing learning adventure.

No more getting lost in massive codebases. RepoX gives you:

A stunning D3.js force-directed graph that maps every file and its relationships like a neural network
Instant AI-powered explanations for any file (including “Explain Like I’m 5” mode that actually makes sense)
Smart personalized learning paths — the AI reads the entire repo and tells you the exact smartest order to explore it
Progress checklists and history so you never lose momentum

The crazy part? The app itself runs on Gemini. Every explanation and learning path is generated live by the Gemini API (securely routed through Cloudflare Workers).

And yes — this project won Best AI Application Built with Cloudflare at Hacks for Hackers 2026. My very first hackathon… and I took home a prize. I still can’t believe it.

Demo

Live app (paste any GitHub repo and watch the magic): https://main.repox.pages.dev

Full YouTube demo:

Devpost: https://devpost.com/software/repox

GitHub: https://github.com/mutaician/RepoX

What I Learned

This wasn’t just a hackathon project — it was my crash course in what happens when you stop coding alone and start coding with an AI teammate.

I went from zero D3.js experience to building a smooth, responsive graph that handles thousands of nodes. I learned secure API proxying on Cloudflare Workers under extreme time pressure. I mastered prompt engineering at a level I never thought possible — crafting system prompts so precise that Gemini would output perfectly formatted learning paths every single time.

Most importantly, I learned that one determined developer + the right Gemini workflow can outpace entire traditional teams. The confidence this gave me is something no tutorial could ever provide.

Google Gemini Feedback

Gemini wasn’t a tool. It was my co-founder, my senior dev, my QA tester, and my creative director — all in one.

I used Antigravity (Google’s agentic IDE) the entire time:

Gemini 3 Pro handled the heavy lifting — autonomously designing the learning-path algorithm, reasoning through complex repo analysis, and even suggesting UI tweaks that made the graph feel alive.
Gemini 3 Flash was my speed demon — instantly generating UI components, ELI5 explanations, and quick fixes while I kept momentum.
Gemini 2.5 was the reliable fallback when context got too big on massive repos.

What blew me away:
The agentic flow was unreal. I’d describe a feature once, and Antigravity would plan, code, debug, and iterate — often better than I would have done myself. The personalized learning paths Gemini 3.1 created were scarily good — logical, educational, and genuinely helpful.

Where it got messy (keeping it real):
Larger repos sometimes overwhelmed the context window and the agent would start hallucinating relationships or going off on wild creative tangents. I had to get surgical with my prompts and occasionally switch models. Response formatting could be inconsistent (markdown breaking in weird places), and yes, the token costs added up during heavy 3.1 sessions.

But here’s the truth: Without this exact multi-model + Antigravity setup, RepoX would still be a half-finished idea on my laptop. Gemini didn’t just help me finish — it helped me win my first hackathon.

From a nervous solo dev in Kenya to MLH prize winner in 48 hours. That’s the power of Google Gemini.

Thanks for reading my story — can’t wait to see what we build next. 🚀

I Built an AI That Can See Your Arduino and Write the Code For It

Cian — Fri, 27 Feb 2026 17:49:51 +0000

There is a specific frustration anyone who has worked with Arduino knows well.

You have a breadboard in front of you. Components are wired up. You open a chat window, describe your setup in text — "I have an LED on pin 8 with a 220 ohm resistor" — copy the code the AI gives you, paste it into the Arduino IDE, hit upload, and watch the LED do nothing. You go back to the chat window. You describe what happened. You get a revised version. You copy it again.

You do this five times before realizing the AI gave you code for pin 9 because you told it pin 8 and it added a one-line comment that said "change this to match your wiring" which you missed.

Every AI coding assistant has this problem: they are blind to your physical setup.

ArduinoVision is my attempt to fix that.

The Idea

The concept is simple enough to state in one sentence: an AI agent that can see your breadboard through a camera, write the correct Arduino code based on what it actually observes, and upload it directly to your board.

No copy-paste. No IDE switching. No describing your wiring in text. You connect the components. The AI handles everything else.

I built this for the Vision Possible: Agent Protocol hackathon by WeMakeDevs, and the core of it runs on the VisionAgents SDK by Stream.

What VisionAgents Makes Possible

Before I get into the build, I want to explain why this project needed VisionAgents specifically — because that is not an obvious answer.

The challenge with building a hardware coding agent is that it needs three things happening simultaneously and tightly integrated: it needs to see video (your camera), hear audio (your voice), reason about both together (the LLM), and take external actions (compile, upload). Wiring all of that together manually — WebRTC for the camera feed, a separate STT service, a separate LLM call, a separate TTS for the response — is a significant amount of infrastructure before you write a single line of the actual agent logic.

VisionAgents collapses all of that into a few lines of Python.

The relevant part of the agent setup looks like this:

from vision_agents.core import Agent, AgentLauncher, User, Runner
from vision_agents.plugins import getstream, openai

llm = openai.Realtime(model="gpt-realtime", voice="cedar", fps=1)

agent = Agent(
    edge=getstream.Edge(),
    agent_user=User(name="ArduinoVision", id="arduino-vision-agent"),
    instructions=SYSTEM_PROMPT,
    llm=llm,
)

That is the entire transport and LLM setup. getstream.Edge() handles the WebRTC infrastructure — video/audio in and out, connection management, reconnection logic. openai.Realtime() handles speech-to-speech natively — no separate STT or TTS services, no intermediate text conversion, just audio in and audio out with video frames attached. Stream's edge network keeps the latency under 30ms, which matters when someone is physically holding a component in front of the camera.

The fps=1 setting deserves a note. I initially had it at fps=3 and the audio quality was noticeably degraded — cutting out, pitch shifts mid-sentence. Dropping to one frame per second freed up the audio pipeline entirely. For identifying breadboard wiring, one frame per second is more than sufficient.

Registering Arduino Tools

The agent's practical capability comes from tool registration. VisionAgents uses @llm.register_function() to make Python functions callable by the model during conversation:

@llm.register_function(
    description="List all connected Arduino boards. Returns port, board name, and FQBN. ALWAYS call this first to find the port needed for upload."
)
async def list_boards() -> dict:
    boards = list_arduino_boards()
    if boards:
        return {
            "found": True,
            "boards": boards,
            "message": f"Found {len(boards)} board(s). Use the 'port' for upload operations."
        }
    return {"found": False, "boards": [], "message": "No Arduino boards detected."}

I registered six tools in total: list_boards, write_code, compile_code, upload_code, serial_monitor, and deploy_code (which chains the previous three). Each one wraps a call to arduino-cli on the system.

What makes this work well in practice is that the model chains these calls naturally based on the conversation. The user says "make the LED blink." The model calls list_boards to find the port, calls write_code to save the sketch, then deploy_code to compile and upload. The user did not ask it to do those steps in that order — the model inferred the sequence from context and tool descriptions.

The Event System

One thing I found genuinely useful during development was the event subscription API. Every tool call emits a ToolStartEvent and ToolEndEvent:

@agent.events.subscribe
async def on_tool_start(event: ToolStartEvent):
    logger.info(f"TOOL START: {event.tool_name}")
    logger.info(f"Args: {json.dumps(event.arguments, indent=2)}")

@agent.events.subscribe
async def on_tool_end(event: ToolEndEvent):
    if event.success:
        logger.info(f"TOOL END: {event.tool_name} ({event.execution_time_ms:.0f}ms)")
    else:
        logger.error(f"TOOL FAILED: {event.tool_name} - {event.error}")

When you are building a hardware-in-the-loop agent where failures are physical (LED does not blink, board does not respond), having a structured log of every tool call with arguments and timing is essential. It is also how I caught that the model was calling deploy_code before the port permissions were correctly set — the error message was clear in the log instantly.

What I Learned Building This

Real-time video AI has a different failure mode than text AI. With text AI, wrong output is obvious — you read it and fix the prompt. With video AI, wrong output means the board does not respond and you are staring at a stationary LED trying to figure out if the model misidentified the pin, or the code is wrong, or the upload failed, or the LED is wired backwards. Good observability (the event system) is not optional.

Tool descriptions are more important than I expected. The model's behaviour changed significantly based on how I phrased the tool descriptions. "Detect connected Arduino boards" caused the model to call it inconsistently. "List all connected Arduino boards. ALWAYS call this first to find the port needed for upload." made it call the tool reliably every time, in the right order.

Hardware-in-the-loop iteration is slow. Software agents can iterate in milliseconds. Hardware agents have a four-second compile-upload cycle. This changes how you design the system — you want the model to be confident before it acts, not to try-and-retry. Good visual grounding (making sure the agent can clearly see the wiring before generating code) matters more than in pure software contexts.

What It Is Not (Yet)

ArduinoVision is a hackathon prototype. Its scope right now is: AVR boards (Uno, Nano), basic GPIO (digital pins, LEDs, buttons), one board connected at a time. It does not handle I2C sensors, servo control, ESP32/ESP8266, or multi-board setups. These are natural extensions but they are not in this version.

The interface also relies on the VisionAgents demo UI at demo.visionagents.ai rather than a custom frontend. For a prototype this is fine — building a custom WebRTC client is significant work that would have added nothing to the core idea.

The Bigger Picture

The thing that strikes me about this project is how little code it took to get something genuinely useful working. The Arduino tooling (list boards, write, compile, upload) is maybe 300 lines of Python. The agent setup is another 150. The entire relevant surface area is small.

What VisionAgents provides is the hard part: real-time video transport, speech-to-speech latency that feels natural, and a clean function calling interface that the model uses reliably. Without that infrastructure being pre-built, this project would have been two weeks of WebRTC work before a single Arduino command got called.

There is a real category of applications that becomes possible when AI agents can see physical environments and take actions based on what they observe. Hardware debugging is one. Lab automation is another. Physical quality control. Teaching environments where a student shows their circuit and gets immediate, accurate feedback.

ArduinoVision is a small example of what that category looks like when the infrastructure is available.

Try It

The code is on GitHub: github.com/mutaician/arduino-vision

You need a Stream account (free tier works), an OpenAI API key, Python 3.12, and arduino-cli. The README has full setup instructions. If you are on Windows, there are notes on forwarding the USB serial port to WSL.

Built for the Vision Possible: Agent Protocol hackathon by WeMakeDevs. Powered by VisionAgents SDK by Stream.