<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Tanush shah</title>
    <description>The latest articles on Forem by Tanush shah (@tanush_shah_e5ac47ddd561e).</description>
    <link>https://forem.com/tanush_shah_e5ac47ddd561e</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3635708%2F24a92aca-9a3b-4fcc-a9cb-7e20d8288faa.png</url>
      <title>Forem: Tanush shah</title>
      <link>https://forem.com/tanush_shah_e5ac47ddd561e</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/tanush_shah_e5ac47ddd561e"/>
    <language>en</language>
    <item>
      <title>How I Built Cursivis: A Cursor-Native Gemini UI Agent on Google Cloud</title>
      <dc:creator>Tanush shah</dc:creator>
      <pubDate>Mon, 16 Mar 2026 17:12:08 +0000</pubDate>
      <link>https://forem.com/tanush_shah_e5ac47ddd561e/how-i-built-cursivis-a-cursor-native-gemini-ui-agent-on-google-cloud-45kn</link>
      <guid>https://forem.com/tanush_shah_e5ac47ddd561e/how-i-built-cursivis-a-cursor-native-gemini-ui-agent-on-google-cloud-45kn</guid>
      <description>&lt;p&gt;I created this content for the purposes of entering the Gemini Live Agent Challenge.  &lt;/p&gt;

&lt;h1&gt;
  
  
  GeminiLiveAgentChallenge
&lt;/h1&gt;

&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Most AI products still start with the same workflow: open a chatbot, describe the context, paste content, wait for an answer, then manually apply that answer somewhere else.&lt;/p&gt;

&lt;p&gt;I wanted to build something different.&lt;/p&gt;

&lt;p&gt;That idea became &lt;strong&gt;Cursivis&lt;/strong&gt;:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Selection = Context, Trigger = Intent, Gemini = Intelligence&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Instead of moving work into a prompt box, Cursivis brings AI directly to what the user is already looking at. The user selects text, an image, or a UI region, presses a trigger, and Gemini decides the most useful action based on context. Then Cursivis either returns a useful result or takes action directly in the browser UI.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Cursivis Does
&lt;/h2&gt;

&lt;p&gt;Cursivis is a &lt;strong&gt;cursor-native multimodal AI agent&lt;/strong&gt; designed for desktop workflows.&lt;/p&gt;

&lt;p&gt;It can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;summarize long reports and articles&lt;/li&gt;
&lt;li&gt;explain or debug selected code&lt;/li&gt;
&lt;li&gt;rewrite rough text or emails&lt;/li&gt;
&lt;li&gt;draft responses to emails&lt;/li&gt;
&lt;li&gt;analyze selected images&lt;/li&gt;
&lt;li&gt;accept voice commands&lt;/li&gt;
&lt;li&gt;autofill forms&lt;/li&gt;
&lt;li&gt;reply in live browser tabs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The goal is to move beyond text-in/text-out AI and toward an interaction model where the AI becomes part of the interface itself.&lt;/p&gt;

&lt;h2&gt;
  
  
  Core Product Idea
&lt;/h2&gt;

&lt;p&gt;The main interaction loop is very simple:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The user selects something on screen&lt;/li&gt;
&lt;li&gt;The user presses a trigger&lt;/li&gt;
&lt;li&gt;Gemini reasons about the selection&lt;/li&gt;
&lt;li&gt;Cursivis returns the most useful result&lt;/li&gt;
&lt;li&gt;The user can optionally press &lt;strong&gt;Take Action&lt;/strong&gt; to execute it in the UI&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That means a selection is not just text. It is context.&lt;/p&gt;

&lt;p&gt;This made Cursivis a strong fit for the &lt;strong&gt;UI Navigator&lt;/strong&gt; category of the Gemini Live Agent Challenge, because it does not stop at answering. It interprets screen context and can output executable actions for the interface.&lt;/p&gt;

&lt;h2&gt;
  
  
  How I Built It
&lt;/h2&gt;

&lt;p&gt;Cursivis is built as a multi-part system:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a &lt;strong&gt;Windows companion app&lt;/strong&gt; in WPF and .NET 8&lt;/li&gt;
&lt;li&gt;a &lt;strong&gt;Gemini backend&lt;/strong&gt; in Node.js using the &lt;strong&gt;Google GenAI SDK&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;a &lt;strong&gt;voice pipeline&lt;/strong&gt; for hold-to-talk capture and transcription&lt;/li&gt;
&lt;li&gt;a &lt;strong&gt;Chromium browser extension&lt;/strong&gt; for real current-tab actions&lt;/li&gt;
&lt;li&gt;a &lt;strong&gt;local browser bridge&lt;/strong&gt; for DOM-aware execution&lt;/li&gt;
&lt;li&gt;a &lt;strong&gt;Google Cloud Run deployment&lt;/strong&gt; for the backend&lt;/li&gt;
&lt;li&gt;integration with the &lt;strong&gt;Logitech MX Creative Console&lt;/strong&gt; interaction model&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The backend handles:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;contextual reasoning&lt;/li&gt;
&lt;li&gt;multimodal text and image understanding&lt;/li&gt;
&lt;li&gt;dynamic action suggestion&lt;/li&gt;
&lt;li&gt;voice transcription&lt;/li&gt;
&lt;li&gt;browser action planning&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The companion app handles:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;text selection capture&lt;/li&gt;
&lt;li&gt;lasso screenshot capture&lt;/li&gt;
&lt;li&gt;orb and result UI&lt;/li&gt;
&lt;li&gt;guided and smart modes&lt;/li&gt;
&lt;li&gt;action preview and follow-up flows&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For browser execution, I built a real-tab path through a Chromium extension so Cursivis can act in the browser session the user is already logged into, instead of depending only on a separate managed automation browser.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Gemini Was Important
&lt;/h2&gt;

&lt;p&gt;Gemini was central to the project because I did not want a rigid menu-driven assistant.&lt;/p&gt;

&lt;p&gt;The most important design goal was:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the system should look at the selection&lt;/li&gt;
&lt;li&gt;understand what kind of content it is&lt;/li&gt;
&lt;li&gt;infer the likely user intent&lt;/li&gt;
&lt;li&gt;return the most useful result&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That means the same trigger can behave differently depending on context:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a report might be summarized&lt;/li&gt;
&lt;li&gt;foreign-language text might be translated&lt;/li&gt;
&lt;li&gt;broken code might be debugged&lt;/li&gt;
&lt;li&gt;correct code might be explained&lt;/li&gt;
&lt;li&gt;an email might be polished or replied to&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This flexibility is what made the interaction feel agentic instead of scripted.&lt;/p&gt;

&lt;h2&gt;
  
  
  Google Cloud Deployment
&lt;/h2&gt;

&lt;p&gt;To meet the challenge requirement and make the backend reproducible, I deployed the Gemini backend to &lt;strong&gt;Google Cloud Run&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;That deployment path includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;containerizing the backend&lt;/li&gt;
&lt;li&gt;building it with Cloud Build&lt;/li&gt;
&lt;li&gt;deploying it to Cloud Run&lt;/li&gt;
&lt;li&gt;verifying the live backend with a health endpoint&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I also added an automated deployment script so the cloud deployment process is visible in the codebase and reproducible by judges.&lt;/p&gt;

&lt;h2&gt;
  
  
  Challenges I Faced
&lt;/h2&gt;

&lt;p&gt;The hardest part was not generating text. The hard part was building a system that feels like a real UI agent.&lt;/p&gt;

&lt;p&gt;Some of the biggest challenges were:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;keeping Smart Mode useful without over-hardcoding behavior&lt;/li&gt;
&lt;li&gt;handling text, image, and voice in one coherent flow&lt;/li&gt;
&lt;li&gt;making browser actions work inside real logged-in tabs&lt;/li&gt;
&lt;li&gt;keeping the UI smooth and understandable&lt;/li&gt;
&lt;li&gt;balancing flexibility with safe execution&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Voice interaction and browser action reliability were especially challenging, because those are the places where a project stops being a demo and starts behaving like a real agent.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Learned
&lt;/h2&gt;

&lt;p&gt;This project taught me a few important things:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;multimodal AI becomes much more compelling when tied to a real interface&lt;/li&gt;
&lt;li&gt;good agent UX depends heavily on trust and clarity&lt;/li&gt;
&lt;li&gt;hardware triggers create a much more natural feeling than opening a chatbot&lt;/li&gt;
&lt;li&gt;the most useful AI interaction is often not “ask a prompt” but simply “select and trigger”&lt;/li&gt;
&lt;li&gt;execution quality matters as much as model quality&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Why Cursivis Matters
&lt;/h2&gt;

&lt;p&gt;Cursivis is my attempt to explore a future where AI is no longer a separate destination.&lt;/p&gt;

&lt;p&gt;Instead of:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;opening a chat app&lt;/li&gt;
&lt;li&gt;explaining context&lt;/li&gt;
&lt;li&gt;copying data in and out&lt;/li&gt;
&lt;li&gt;manually taking action&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;the user can simply:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;select&lt;/li&gt;
&lt;li&gt;trigger&lt;/li&gt;
&lt;li&gt;review&lt;/li&gt;
&lt;li&gt;act&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is the experience I wanted to prototype: a multimodal AI layer that lives directly on top of everyday work.&lt;/p&gt;

&lt;h2&gt;
  
  
  Closing
&lt;/h2&gt;

&lt;p&gt;Cursivis started from one simple idea:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What if the cursor itself became an AI agent?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;By combining Gemini, Google Cloud, multimodal input, browser execution, and a hardware-triggered UX, I built a system that moves beyond the text box and turns ordinary on-screen context into something actionable.&lt;/p&gt;

</description>
      <category>googlecloud</category>
      <category>ai</category>
      <category>gemini</category>
      <category>hackathon</category>
    </item>
    <item>
      <title>How I Built a Cinematic AI-Powered App Using Kiro for Kiroween 🎃</title>
      <dc:creator>Tanush shah</dc:creator>
      <pubDate>Sat, 29 Nov 2025 07:18:34 +0000</pubDate>
      <link>https://forem.com/tanush_shah_e5ac47ddd561e/how-i-built-a-cinematic-ai-powered-app-using-kiro-for-kiroween-1oe3</link>
      <guid>https://forem.com/tanush_shah_e5ac47ddd561e/how-i-built-a-cinematic-ai-powered-app-using-kiro-for-kiroween-1oe3</guid>
      <description>&lt;h1&gt;
  
  
  kiro
&lt;/h1&gt;

&lt;h1&gt;
  
  
  👻 Building an AI-Enhanced Creative Engine with Kiro — My Kiroween Hackathon Journey 🎃
&lt;/h1&gt;

&lt;p&gt;For Kiroween, I wanted to push myself into building something that felt &lt;em&gt;alive&lt;/em&gt; — an application that reacts, adapts, and evolves with the user. Instead of following a fixed template, I wanted an experience that feels cinematic, intelligent, and magical.&lt;/p&gt;

&lt;h2&gt;
  
  
  🧠 Inspiration
&lt;/h2&gt;

&lt;p&gt;This project was inspired by an idea:&lt;br&gt;&lt;br&gt;
&lt;strong&gt;What if a user’s creativity never had to hit a limit?&lt;/strong&gt;  &lt;/p&gt;

&lt;p&gt;During this hackathon, I challenged myself to build an interactive system where AI assists across design, user experience, and automation — all powered by Kiro. The spooky theme gave me the perfect excuse to lean into dramatic visuals, atmospheric UI elements, and intelligent workflows that feel… enchanted.&lt;/p&gt;

&lt;p&gt;I can’t explicitly reveal the final product yet 😉, but I &lt;em&gt;can&lt;/em&gt; say this: the goal was to let creativity flow instantly, without friction.&lt;/p&gt;




&lt;h2&gt;
  
  
  ⚙️ What It Does
&lt;/h2&gt;

&lt;p&gt;At a high level, my project combines:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;AI-powered generation&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Dynamic UI/UX with a spooky cinematic theme&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Automated pipelines for processing, previewing, and rendering&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;A multi-stage flow orchestrated with Kiro&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It creates a seamless journey where a user can go from &lt;em&gt;idea&lt;/em&gt; → &lt;em&gt;visual output&lt;/em&gt; → &lt;em&gt;functional interface&lt;/em&gt; in just a few steps.&lt;/p&gt;

&lt;p&gt;Every key component — prompts, generation, previewing, transformations, and error-proof workflows — was developed interactively with Kiro.&lt;/p&gt;




&lt;h2&gt;
  
  
  🪄 How I Built It (with Kiro)
&lt;/h2&gt;

&lt;p&gt;Kiro wasn’t just a tool — it was essentially my &lt;em&gt;pair-engineer&lt;/em&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;🧩 1. Vibe Coding&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;I structured conversations with Kiro like I would with a senior engineer:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;I explained high-level intent&lt;/li&gt;
&lt;li&gt;Kiro generated modular components&lt;/li&gt;
&lt;li&gt;I refined and iterated with micro-prompts&lt;/li&gt;
&lt;li&gt;Together we shaped the core logic of the application&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The most impressive generation was when Kiro produced an entire multi-step pipeline with validation, async handling, and UI state synchronization — all in one go.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;⚙️ 2. Spec-Driven Development&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;To maintain structure in a fast-moving hackathon environment, I wrote a compact specification describing the expected behaviors, interactions, and data flow.&lt;/p&gt;

&lt;p&gt;Kiro then:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Converted these specs into type-safe code&lt;/li&gt;
&lt;li&gt;Identified missing edge cases&lt;/li&gt;
&lt;li&gt;Ensured consistency across the whole project&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This spec-driven workflow made the entire codebase “snap in” perfectly — extremely valuable for rapid iteration.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;🔁 3. Agent Hooks&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;I created automated workflows using Kiro hooks to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Format and lint code upon generation
&lt;/li&gt;
&lt;li&gt;Validate generated outputs
&lt;/li&gt;
&lt;li&gt;Auto-fix conflicting structures
&lt;/li&gt;
&lt;li&gt;Enforce naming conventions across files
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This removed repetitive work and let me focus entirely on building the creative core.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;🧭 4. Steering Docs&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Steering allowed me to “teach” Kiro my preferred architecture style:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Modular components
&lt;/li&gt;
&lt;li&gt;Clean data flow
&lt;/li&gt;
&lt;li&gt;Reusable utilities
&lt;/li&gt;
&lt;li&gt;Error-resilient async code
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;After applying steering, the quality of responses improved massively — Kiro adapted to my coding style like a personalized assistant.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;🔌 5. MCP (Model Context Protocol)&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Using MCP extensions allowed me to introduce specialized capabilities into Kiro’s workflow:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Automated scaffolding
&lt;/li&gt;
&lt;li&gt;Batch-file generation
&lt;/li&gt;
&lt;li&gt;Resource fetching
&lt;/li&gt;
&lt;li&gt;Smart transformations
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These were tasks that would've taken hours manually — Kiro cut it down to minutes.&lt;/p&gt;




&lt;h2&gt;
  
  
  🕸️ Challenges I Faced
&lt;/h2&gt;

&lt;p&gt;Like any ambitious project, I faced some hurdles:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Structuring intelligent workflows that feel natural
&lt;/li&gt;
&lt;li&gt;Maintaining performance while adding cinematic UI effects
&lt;/li&gt;
&lt;li&gt;Handling complex async interactions
&lt;/li&gt;
&lt;li&gt;Ensuring portability across environments
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Kiro helped me debug, refine, and stabilize the system fast enough to meet the hackathon deadline.&lt;/p&gt;




&lt;h2&gt;
  
  
  🏆 Accomplishments I'm Proud Of
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Integrating multiple Kiro features in one project
&lt;/li&gt;
&lt;li&gt;Designing a spooky UI that feels alive
&lt;/li&gt;
&lt;li&gt;Building a fully automated flow from concept → output
&lt;/li&gt;
&lt;li&gt;Creating a codebase that is scalable, clean, and production-ready
&lt;/li&gt;
&lt;li&gt;Completing the entire system within the hackathon timeframe
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This project not only works — it &lt;em&gt;feels&lt;/em&gt; magical.&lt;/p&gt;




&lt;h2&gt;
  
  
  📚 What I Learned
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;How to use Kiro both as a “creative brainstormer” and as a structured engineering assistant
&lt;/li&gt;
&lt;li&gt;Writing better specs for fast iteration
&lt;/li&gt;
&lt;li&gt;Improving code quality with hooks and steering
&lt;/li&gt;
&lt;li&gt;Architecting async flows elegantly
&lt;/li&gt;
&lt;li&gt;Building cinematic UI elements with performance in mind
&lt;/li&gt;
&lt;li&gt;Leveraging multiple AI-driven systems in harmony
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  🔮 What’s Next
&lt;/h2&gt;

&lt;p&gt;I plan to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Expand UI capabilities
&lt;/li&gt;
&lt;li&gt;Add more intelligent behaviors
&lt;/li&gt;
&lt;li&gt;Introduce user-driven customizations
&lt;/li&gt;
&lt;li&gt;Polish the experience into a fully public product
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This hackathon project is just the beginning — the foundation I built with Kiro opens doors to something much bigger.&lt;/p&gt;




&lt;p&gt;If you enjoyed this write-up, feel free to follow along — there’s a lot more on the way.&lt;br&gt;&lt;br&gt;
Happy Kiroween! 🎃👻✨&lt;/p&gt;

</description>
      <category>kiro</category>
      <category>ai</category>
      <category>webdev</category>
      <category>programming</category>
    </item>
  </channel>
</rss>
