Forem: Tanush shah

How I Built Cursivis: A Cursor-Native Gemini UI Agent on Google Cloud

Tanush shah — Mon, 16 Mar 2026 17:12:08 +0000

I created this content for the purposes of entering the Gemini Live Agent Challenge.

GeminiLiveAgentChallenge

Introduction

Most AI products still start with the same workflow: open a chatbot, describe the context, paste content, wait for an answer, then manually apply that answer somewhere else.

I wanted to build something different.

That idea became Cursivis:

Selection = Context, Trigger = Intent, Gemini = Intelligence

Instead of moving work into a prompt box, Cursivis brings AI directly to what the user is already looking at. The user selects text, an image, or a UI region, presses a trigger, and Gemini decides the most useful action based on context. Then Cursivis either returns a useful result or takes action directly in the browser UI.

What Cursivis Does

Cursivis is a cursor-native multimodal AI agent designed for desktop workflows.

It can:

summarize long reports and articles
explain or debug selected code
rewrite rough text or emails
draft responses to emails
analyze selected images
accept voice commands
autofill forms
reply in live browser tabs

The goal is to move beyond text-in/text-out AI and toward an interaction model where the AI becomes part of the interface itself.

Core Product Idea

The main interaction loop is very simple:

The user selects something on screen
The user presses a trigger
Gemini reasons about the selection
Cursivis returns the most useful result
The user can optionally press Take Action to execute it in the UI

That means a selection is not just text. It is context.

This made Cursivis a strong fit for the UI Navigator category of the Gemini Live Agent Challenge, because it does not stop at answering. It interprets screen context and can output executable actions for the interface.

How I Built It

Cursivis is built as a multi-part system:

a Windows companion app in WPF and .NET 8
a Gemini backend in Node.js using the Google GenAI SDK
a voice pipeline for hold-to-talk capture and transcription
a Chromium browser extension for real current-tab actions
a local browser bridge for DOM-aware execution
a Google Cloud Run deployment for the backend
integration with the Logitech MX Creative Console interaction model

The backend handles:

contextual reasoning
multimodal text and image understanding
dynamic action suggestion
voice transcription
browser action planning

The companion app handles:

text selection capture
lasso screenshot capture
orb and result UI
guided and smart modes
action preview and follow-up flows

For browser execution, I built a real-tab path through a Chromium extension so Cursivis can act in the browser session the user is already logged into, instead of depending only on a separate managed automation browser.

Why Gemini Was Important

Gemini was central to the project because I did not want a rigid menu-driven assistant.

The most important design goal was:

the system should look at the selection
understand what kind of content it is
infer the likely user intent
return the most useful result

That means the same trigger can behave differently depending on context:

a report might be summarized
foreign-language text might be translated
broken code might be debugged
correct code might be explained
an email might be polished or replied to

This flexibility is what made the interaction feel agentic instead of scripted.

Google Cloud Deployment

To meet the challenge requirement and make the backend reproducible, I deployed the Gemini backend to Google Cloud Run.

That deployment path includes:

containerizing the backend
building it with Cloud Build
deploying it to Cloud Run
verifying the live backend with a health endpoint

I also added an automated deployment script so the cloud deployment process is visible in the codebase and reproducible by judges.

Challenges I Faced

The hardest part was not generating text. The hard part was building a system that feels like a real UI agent.

Some of the biggest challenges were:

keeping Smart Mode useful without over-hardcoding behavior
handling text, image, and voice in one coherent flow
making browser actions work inside real logged-in tabs
keeping the UI smooth and understandable
balancing flexibility with safe execution

Voice interaction and browser action reliability were especially challenging, because those are the places where a project stops being a demo and starts behaving like a real agent.

What I Learned

This project taught me a few important things:

multimodal AI becomes much more compelling when tied to a real interface
good agent UX depends heavily on trust and clarity
hardware triggers create a much more natural feeling than opening a chatbot
the most useful AI interaction is often not “ask a prompt” but simply “select and trigger”
execution quality matters as much as model quality

Why Cursivis Matters

Cursivis is my attempt to explore a future where AI is no longer a separate destination.

Instead of:

opening a chat app
explaining context
copying data in and out
manually taking action

the user can simply:

select
trigger
review
act

That is the experience I wanted to prototype: a multimodal AI layer that lives directly on top of everyday work.

Closing

Cursivis started from one simple idea:

What if the cursor itself became an AI agent?

By combining Gemini, Google Cloud, multimodal input, browser execution, and a hardware-triggered UX, I built a system that moves beyond the text box and turns ordinary on-screen context into something actionable.

How I Built a Cinematic AI-Powered App Using Kiro for Kiroween 🎃

Tanush shah — Sat, 29 Nov 2025 07:18:34 +0000

kiro

👻 Building an AI-Enhanced Creative Engine with Kiro — My Kiroween Hackathon Journey 🎃

For Kiroween, I wanted to push myself into building something that felt alive — an application that reacts, adapts, and evolves with the user. Instead of following a fixed template, I wanted an experience that feels cinematic, intelligent, and magical.

🧠 Inspiration

This project was inspired by an idea:

What if a user’s creativity never had to hit a limit?

During this hackathon, I challenged myself to build an interactive system where AI assists across design, user experience, and automation — all powered by Kiro. The spooky theme gave me the perfect excuse to lean into dramatic visuals, atmospheric UI elements, and intelligent workflows that feel… enchanted.

I can’t explicitly reveal the final product yet 😉, but I can say this: the goal was to let creativity flow instantly, without friction.

⚙️ What It Does

At a high level, my project combines:

AI-powered generation
Dynamic UI/UX with a spooky cinematic theme
Automated pipelines for processing, previewing, and rendering
A multi-stage flow orchestrated with Kiro

It creates a seamless journey where a user can go from idea → visual output → functional interface in just a few steps.

Every key component — prompts, generation, previewing, transformations, and error-proof workflows — was developed interactively with Kiro.

🪄 How I Built It (with Kiro)

Kiro wasn’t just a tool — it was essentially my pair-engineer.

🧩 1. Vibe Coding

I structured conversations with Kiro like I would with a senior engineer:

I explained high-level intent
Kiro generated modular components
I refined and iterated with micro-prompts
Together we shaped the core logic of the application

The most impressive generation was when Kiro produced an entire multi-step pipeline with validation, async handling, and UI state synchronization — all in one go.

⚙️ 2. Spec-Driven Development

To maintain structure in a fast-moving hackathon environment, I wrote a compact specification describing the expected behaviors, interactions, and data flow.

Kiro then:

Converted these specs into type-safe code
Identified missing edge cases
Ensured consistency across the whole project

This spec-driven workflow made the entire codebase “snap in” perfectly — extremely valuable for rapid iteration.

🔁 3. Agent Hooks

I created automated workflows using Kiro hooks to:

Format and lint code upon generation
Validate generated outputs
Auto-fix conflicting structures
Enforce naming conventions across files

This removed repetitive work and let me focus entirely on building the creative core.

🧭 4. Steering Docs

Steering allowed me to “teach” Kiro my preferred architecture style:

Modular components
Clean data flow
Reusable utilities
Error-resilient async code

After applying steering, the quality of responses improved massively — Kiro adapted to my coding style like a personalized assistant.

🔌 5. MCP (Model Context Protocol)

Using MCP extensions allowed me to introduce specialized capabilities into Kiro’s workflow:

Automated scaffolding
Batch-file generation
Resource fetching
Smart transformations

These were tasks that would've taken hours manually — Kiro cut it down to minutes.

🕸️ Challenges I Faced

Like any ambitious project, I faced some hurdles:

Structuring intelligent workflows that feel natural
Maintaining performance while adding cinematic UI effects
Handling complex async interactions
Ensuring portability across environments

Kiro helped me debug, refine, and stabilize the system fast enough to meet the hackathon deadline.

🏆 Accomplishments I'm Proud Of

Integrating multiple Kiro features in one project
Designing a spooky UI that feels alive
Building a fully automated flow from concept → output
Creating a codebase that is scalable, clean, and production-ready
Completing the entire system within the hackathon timeframe

This project not only works — it feels magical.

📚 What I Learned

How to use Kiro both as a “creative brainstormer” and as a structured engineering assistant
Writing better specs for fast iteration
Improving code quality with hooks and steering
Architecting async flows elegantly
Building cinematic UI elements with performance in mind
Leveraging multiple AI-driven systems in harmony

🔮 What’s Next

I plan to:

Expand UI capabilities
Add more intelligent behaviors
Introduce user-driven customizations
Polish the experience into a fully public product

This hackathon project is just the beginning — the foundation I built with Kiro opens doors to something much bigger.

If you enjoyed this write-up, feel free to follow along — there’s a lot more on the way.

Happy Kiroween! 🎃👻✨