<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Usman Mehfooz</title>
    <description>The latest articles on Forem by Usman Mehfooz (@firevibe).</description>
    <link>https://forem.com/firevibe</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3493456%2Feea1ad35-3f6e-4ae8-95fe-7b652cc7749b.png</url>
      <title>Forem: Usman Mehfooz</title>
      <link>https://forem.com/firevibe</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/firevibe"/>
    <language>en</language>
    <item>
      <title>SelfSprite Maze Game---------Take a selfie, and become an animated playable character in the game+Gif anything</title>
      <dc:creator>Usman Mehfooz</dc:creator>
      <pubDate>Sun, 14 Sep 2025 19:03:32 +0000</pubDate>
      <link>https://forem.com/firevibe/selfsprite-maze-game-take-a-selfie-and-become-an-animated-playable-character-in-the-game-3h2h</link>
      <guid>https://forem.com/firevibe/selfsprite-maze-game-take-a-selfie-and-become-an-animated-playable-character-in-the-game-3h2h</guid>
      <description>&lt;p&gt;&lt;em&gt;This is a submission for the &lt;a href="https://dev.to/challenges/google-ai-studio-2025-09-03"&gt;Google AI Studio Multimodal Challenge&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpjy2eu9pvrmrtuwnuntd.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpjy2eu9pvrmrtuwnuntd.gif" alt="Swat Kats"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Demo Video&lt;br&gt;
  &lt;iframe src="https://www.youtube.com/embed/HjY5rvxPi0s"&gt;
  &lt;/iframe&gt;
&lt;/p&gt;

&lt;p&gt;A live demo of the applet is right here: &lt;a href="https://selfsprite-maze-110292485924.us-west1.run.app" rel="noopener noreferrer"&gt;Selfsprite Maze Demo&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Built
&lt;/h2&gt;

&lt;p&gt;Generic gaming avatars are dead. I built &lt;strong&gt;Selfsprite Maze&lt;/strong&gt; to fix the disconnect between player and character.&lt;/p&gt;

&lt;p&gt;It's a retro game that uses multimodal GenAI to rip your actual face from a selfie and mint a custom, animated 8-bit animated sprite. &lt;strong&gt;You are the hero.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The gameplay loop is brutally simple:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;📸 &lt;strong&gt;Create Your Hero:&lt;/strong&gt; Snap a selfie, pick a class like 'Wizard' or 'Cyberpunk', and the AI spits out a personalized sprite sheet. Done in seconds.&lt;/li&gt;
&lt;li&gt;😈 &lt;strong&gt;Design Your Enemy:&lt;/strong&gt; Here's the twist. Run the process again, but this time you're creating the enemy guards. Now you can literally fight your friends, a celebrity, or a weird alternate-reality version of yourself.&lt;/li&gt;
&lt;li&gt;🏃 &lt;strong&gt;Escape the Maze:&lt;/strong&gt; You're dropped into a procedurally generated maze. The goal? Hit the exit. The problem? The guards you just made are running pathfinding algorithms to hunt you down. No pressure.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This game itself is a creative engine.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Infinite Replayability:&lt;/strong&gt; AI-driven level-gen means you'll never play the same maze twice.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Smart Enemies:&lt;/strong&gt; Guards use line-of-sight and A* pathfinding. They aren't stormtroopers; they will find you.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Digital Swag:&lt;/strong&gt; Beat the level and you get to download your character as a high-quality GIF and PNG frames. Your new profile pic is waiting.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Offline Mode:&lt;/strong&gt; Use existing sprite sheet to save API, or use google AI studio to build sprite sheets.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Demo
&lt;/h2&gt;

&lt;p&gt;A live demo of the applet is right here:  This Demo is unfortunately without paid API, for testing, Please use upload option in the game, and with a generation of a sprite sheet directly from google AI studio for free, you dont need to use API.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://selfsprite-maze-110292485924.us-west1.run.app" rel="noopener noreferrer"&gt;SelfSprite Maze Demo&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F071s4s6hgdga0o79rqng.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F071s4s6hgdga0o79rqng.gif" alt="Boxer Trump"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F46ufgxcfqfr5cxroakl1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F46ufgxcfqfr5cxroakl1.png" alt="Main Screen"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Feo8asilpne59unci8a6a.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Feo8asilpne59unci8a6a.png" alt="Generated Sprites"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9hgc4l20f4rb96ik9jhz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9hgc4l20f4rb96ik9jhz.png" alt="Maze"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr8ju8p42x1fq4u7ucsqe.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr8ju8p42x1fq4u7ucsqe.png" alt="Download Gif"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Demo Video&lt;br&gt;
  &lt;iframe src="https://www.youtube.com/embed/HjY5rvxPi0s"&gt;
  &lt;/iframe&gt;
&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Intro:&lt;/strong&gt; Kicks off with a retro instruction screen. You'll know how to play in 10 seconds.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Character Creation:&lt;/strong&gt; Watch the full flow: snap a selfie, select the 'character' class, and fire it off to the AI.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Motion Creation:&lt;/strong&gt; Watch the full flow: select the 'motion' class, and fire it off to the AI.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Real-Time Generation:&lt;/strong&gt; The AI-generated sprite sheet appears and gets sliced into animation frames on the fly. Pure visual feedback.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Showcase:&lt;/strong&gt; A classic "VS" screen showcases your hero and the enemy you designed, building hype for the showdown.&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Gameplay:&lt;/strong&gt; The real deal. My sprite navigating a maze, getting spotted, and a tense chase kicking off with NPC AI Players.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Victory &amp;amp; Download:&lt;/strong&gt; Make it to the exit, get the "Level Complete" screen, and one-click download a ZIP file with your GIF and all the frames. Ship it.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  How I Used Google AI Studio
&lt;/h2&gt;

&lt;p&gt;This entire project runs on Google AI. I used AI Studio for rapid-fire prompt engineering, and the final build uses the &lt;code&gt;@google/genai&lt;/code&gt; SDK exclusively. The magic is in how it orchestrates two different Gemini models.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;(Nano Banana) for Multimodal Generation:&lt;/strong&gt; This is the creative engine. It takes two inputs—an &lt;strong&gt;image prompt&lt;/strong&gt; (your selfie) and a &lt;strong&gt;text prompt&lt;/strong&gt; (my instructions for style, class, etc.)—and fuses them into a brand new sprite sheet. This image-plus-text-to-image pipeline is the core feature.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Vision-Based Analysis with &lt;code&gt;gemini-2.5-flash&lt;/code&gt;:&lt;/strong&gt; After an image is generated, I need its grid dimensions. Instead of guessing, I just show the image back to a vision model and ask, "How many columns and rows?" I use &lt;code&gt;responseSchema&lt;/code&gt; to force the output into a clean JSON object. The AI becomes a reliable, automated data-processing tool.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Multimodal Features
&lt;/h2&gt;

&lt;p&gt;Multimodal isn't just a feature here; it's the entire foundation of the app.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Deep Personalization through Image-to-Image Transformation:&lt;/strong&gt; This isn't just &lt;code&gt;text-to-image&lt;/code&gt;; it's &lt;code&gt;image-plus-text-to-image&lt;/code&gt;. The user's photo is the actual foundational reference, not just a loose inspiration. Seeing an 8-bit version animated of &lt;em&gt;yourself&lt;/em&gt; being chased through a dungeon hits different than playing as a generic knight.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Vision-Powered Automation:&lt;/strong&gt; I built a closed-loop pipeline. The AI generates a creative asset, and then another AI analyzes that asset to provide the technical data needed for the next step (Image -&amp;gt; JSON). It bridges the gap between the creative and technical, making a complex process feel instant and overcome hallucination limitations.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Creativity as a Gameplay Mechanic:&lt;/strong&gt; The AI is so fast that the creation process &lt;em&gt;is&lt;/em&gt; part of the gameplay. The user is both the hero designer and the monster designer. This dual role is a novel gameplay loop that's only possible with powerful and flexible multimodal AI.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyhsj8e22rwlfyoenkhkx.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyhsj8e22rwlfyoenkhkx.gif" alt="Grumpy Cat "&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>devchallenge</category>
      <category>googleaichallenge</category>
      <category>ai</category>
      <category>gemini</category>
    </item>
    <item>
      <title>Architecting a Generative AI Pipeline for Automated Sprite Sheet Creation for Animation</title>
      <dc:creator>Usman Mehfooz</dc:creator>
      <pubDate>Thu, 11 Sep 2025 16:39:37 +0000</pubDate>
      <link>https://forem.com/firevibe/architecting-a-generative-ai-pipeline-for-automated-sprite-sheet-creation-3877</link>
      <guid>https://forem.com/firevibe/architecting-a-generative-ai-pipeline-for-automated-sprite-sheet-creation-3877</guid>
      <description>&lt;h3&gt;
  
  
  The Engineering Challenge of Creative Scale
&lt;/h3&gt;

&lt;p&gt;If you've ever delved into game development, you know the drill. Character sprites—those tiny, animated heroes and villains—are a massive investment of time and artistic skill. It's a classic creative bottleneck: a single walking animation can demand dozens of individual frames, each needing to be drawn with perfect consistency.&lt;/p&gt;

&lt;p&gt;I wanted to solve this problem by automating the most painful part of the process. This post isn't a conceptual overview; it's a detailed technical blueprint for building a generative AI pipeline that takes a single character image and programmatically generates a full &lt;strong&gt;16-frame animated sprite sheet&lt;/strong&gt;. &lt;br&gt;
The latest model for vision &lt;strong&gt;nano banana&lt;/strong&gt; by Google AI, it's now quite doable in an automated pipeline.&lt;br&gt;
We'll cover the tech stack, the system architecture, and provide code-level insights into the backend logic that orchestrates this powerful multimodal workflow.&lt;/p&gt;


&lt;h3&gt;
  
  
  The Core Architecture: A System Overview
&lt;/h3&gt;

&lt;p&gt;To build a robust and scalable application, you have to decouple your concerns. My system is broken down into four primary components:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Frontend Client:&lt;/strong&gt; A web UI (React/Next.js) for uploading the source image and displaying the final grid of generated sprites.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Backend API Service:&lt;/strong&gt; The central orchestrator (Node.js/Cloud Run). This is the brain that manages the entire workflow, stores files, makes parallel calls to the AI model, and processes the results.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cloud Storage:&lt;/strong&gt; A scalable object storage service like Google Cloud Storage (GCS) to hold the source image and generated frames.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AI Model Service:&lt;/strong&gt; The external API for the generative model, which in this case is Google's Gemini via Vertex AI.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The data flow is orchestrated entirely by our backend:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;[Frontend Client]&lt;/code&gt; --(Uploads Image)--&amp;gt; &lt;code&gt;[Backend API]&lt;/code&gt; --&amp;gt; &lt;code&gt;[Google Cloud Storage]&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;[Backend API]&lt;/code&gt; --(Triggers 16x API calls w/ GCS URI + Prompts)--&amp;gt; &lt;code&gt;[Vertex AI Gemini API]&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;[Vertex AI Gemini API]&lt;/code&gt; --(Returns 16x Generated Images)--&amp;gt; &lt;code&gt;[Backend API]&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;[Backend API]&lt;/code&gt; --(Saves Images to GCS &amp;amp; Returns URLs)--&amp;gt; &lt;code&gt;[Frontend Client]&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;This decoupled architecture ensures that each component can be scaled and maintained independently.&lt;/p&gt;


&lt;h3&gt;
  
  
  The Tech Stack in Detail
&lt;/h3&gt;

&lt;p&gt;Choosing the right tools is critical for a project like this. Here’s a recommended stack for the pipeline:&lt;/p&gt;
&lt;h4&gt;
  
  
  Frontend
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Framework:&lt;/strong&gt; &lt;strong&gt;Next.js 14&lt;/strong&gt;. Its integrated API routes provide a simple way to build the backend logic, making it a great choice for a full-stack application.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;UI/Styling:&lt;/strong&gt; &lt;strong&gt;Tailwind CSS&lt;/strong&gt; with a component library like &lt;strong&gt;Shadcn/ui&lt;/strong&gt; for building a clean UI quickly.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Data Fetching:&lt;/strong&gt; &lt;strong&gt;React Query (TanStack Query)&lt;/strong&gt; is ideal for managing the asynchronous state of the generation process (loading, errors, etc.).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;File Uploads:&lt;/strong&gt; &lt;strong&gt;React-Dropzone&lt;/strong&gt; for a clean, accessible drag-and-drop interface.&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;
  
  
  Backend &amp;amp; Deployment
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Runtime &amp;amp; Language:&lt;/strong&gt; &lt;strong&gt;Node.js&lt;/strong&gt; with &lt;strong&gt;TypeScript&lt;/strong&gt;. Type safety is invaluable when dealing with API contracts.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Deployment Environment:&lt;/strong&gt; &lt;strong&gt;Google Cloud Run&lt;/strong&gt;. Deploying the Next.js app in a Docker container on Cloud Run provides exceptional scalability, including the ability to scale to zero when not in use.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Image Processing:&lt;/strong&gt; &lt;strong&gt;Sharp&lt;/strong&gt;. A high-performance Node.js library for stitching the final frames into a single sprite sheet on the backend.&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;
  
  
  Cloud Services &amp;amp; AI
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Storage:&lt;/strong&gt; &lt;strong&gt;Google Cloud Storage (GCS)&lt;/strong&gt;. Its tight integration with other Google Cloud services allows us to directly reference GCS objects in our Vertex AI calls.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AI SDK:&lt;/strong&gt; &lt;strong&gt;Google's Vertex AI SDK for Node.js (&lt;code&gt;@google-cloud/aiplatform&lt;/code&gt;)&lt;/strong&gt;. This is the official way to interact with Gemini models.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AI Model:&lt;/strong&gt; The &lt;code&gt;gemini-2.5-flash-image-preview&lt;/code&gt; model, a new model specifically for image editing that Google has nicknamed "nano banana." Its multimodal capabilities, speed, and cost-effectiveness make it the perfect fit for this project.&lt;/li&gt;
&lt;/ul&gt;


&lt;h3&gt;
  
  
  Backend Logic: A Code-Level Deep Dive
&lt;/h3&gt;

&lt;p&gt;This is the heart of the system. Let's walk through the backend orchestration, which would live inside a Next.js API route (e.g., &lt;code&gt;src/app/api/generate-sprites/route.ts&lt;/code&gt;).&lt;/p&gt;
&lt;h4&gt;
  
  
  Step 1: The API Endpoint and File Upload
&lt;/h4&gt;

&lt;p&gt;The endpoint must handle &lt;code&gt;multipart/form-data&lt;/code&gt;. The Next.js &lt;code&gt;req.formData()&lt;/code&gt; method makes this straightforward.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// src/app/api/generate-sprites/route.ts&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;NextRequest&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;NextResponse&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;next/server&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;Storage&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@google-cloud/storage&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;VertexAI&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@google-cloud/aiplatform&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;POST&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;NextRequest&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;formData&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;formData&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;file&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;formData&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;file&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nx"&gt;File&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;file&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;NextResponse&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;error&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;No file provided.&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;status&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;400&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;buffer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;Buffer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;from&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;file&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;arrayBuffer&lt;/span&gt;&lt;span class="p"&gt;());&lt;/span&gt;
    &lt;span class="c1"&gt;// ... rest of the logic follows&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Step 2: Uploading the Source Image to GCS
&lt;/h4&gt;

&lt;p&gt;We must store the source image in GCS so the Gemini API can access it directly via its URI.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// ... inside the POST function&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;storage&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Storage&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;projectId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;your-gcp-project-id&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;bucket&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;storage&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;bucket&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;your-gcs-bucket-name&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;fileName&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;`uploads/&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nb"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()}&lt;/span&gt;&lt;span class="s2"&gt;-&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;file&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;gcsFile&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;bucket&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;file&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;fileName&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;gcsFile&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;save&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;buffer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;contentType&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;file&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="kd"&gt;type&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;gcsUri&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;`gs://&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;bucket&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;/&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;fileName&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Step 3: Orchestrating the 16 Generative Calls
&lt;/h4&gt;

&lt;p&gt;For maximum efficiency, we use &lt;code&gt;Promise.all&lt;/code&gt; to fire off all 16 requests to the Vertex AI API in parallel. The key is to define a suite of prompts, each describing a specific frame in the animation sequence.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;prompts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Using the character from the image, generate a full-body sprite of them walking forward, towards the camera...&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="c1"&gt;// ... add all 15 other detailed prompts here&lt;/span&gt;
&lt;span class="p"&gt;];&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;vertexAI&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;VertexAI&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;project&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;your-gcp-project-id&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;location&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;us-central1&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;generativeModel&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;vertexAI&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getGenerativeModel&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;gemini-2.5-flash-image-preview&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;generationPromises&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;prompts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;request&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="na"&gt;contents&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="na"&gt;role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;user&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="na"&gt;parts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
                    &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;prompt&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
                    &lt;span class="c1"&gt;// Reference the GCS file directly&lt;/span&gt;
                    &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;fileData&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;mimeType&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;file&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;fileUri&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;gcsUri&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
                &lt;span class="p"&gt;]&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="p"&gt;};&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;generativeModel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generateContent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;request&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;responses&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;all&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;generationPromises&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Step 4: Processing Responses and Saving Generated Frames
&lt;/h4&gt;

&lt;p&gt;The responses will contain the generated image data as a base64 string. We decode this, convert it to a buffer, and upload it back to GCS.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;generatedImageUrls&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;[]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[];&lt;/span&gt;
&lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;frameCounter&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;responses&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;base64Data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;candidates&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nx"&gt;content&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;parts&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nx"&gt;fileData&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;imageBuffer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;Buffer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;from&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;base64Data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;base64&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;outputFileName&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;`generated/sprite-&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nb"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()}&lt;/span&gt;&lt;span class="s2"&gt;-&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;frameCounter&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;.png`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;outputFile&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;bucket&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;file&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;outputFileName&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;outputFile&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;save&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;imageBuffer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;contentType&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;image/png&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="c1"&gt;// Create a signed URL for the frontend to access the image&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;publicUrl&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;outputFile&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getSignedUrl&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; 
        &lt;span class="na"&gt;action&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;read&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
        &lt;span class="na"&gt;expires&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;2026-09-12&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; 
    &lt;span class="p"&gt;});&lt;/span&gt;
    &lt;span class="nx"&gt;generatedImageUrls&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;push&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;publicUrl&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;// Finally, return the array of URLs to the client&lt;/span&gt;
&lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;NextResponse&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;urls&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;generatedImageUrls&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  The Frontend: Bringing It to Life
&lt;/h3&gt;

&lt;p&gt;On the frontend, the UI simply calls our API and displays the results in a grid. &lt;strong&gt;React Query&lt;/strong&gt; handles the asynchronous state and renders the images as they're generated. A final server-side step can then download all the images from their URLs, use the &lt;strong&gt;Sharp&lt;/strong&gt; library to composite them into a single 4x4 grid, and return the final sprite sheet for download.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// A simplified React component using TanStack Query&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;useMutation&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@tanstack/react-query&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;SpriteGenerator&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;mutate&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;isPending&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;useMutation&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;mutationFn&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="na"&gt;file&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;File&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;formData&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;FormData&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
      &lt;span class="nx"&gt;formData&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;file&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;file&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
      &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/api/generate-sprites&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;method&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;POST&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;body&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;formData&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
      &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ok&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Network response was not ok&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
      &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;

  &lt;span class="c1"&gt;// ... file upload logic using react-dropzone that calls mutate(file)&lt;/span&gt;

  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;isPending&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;div&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="nx"&gt;Generating&lt;/span&gt; &lt;span class="nx"&gt;your&lt;/span&gt; &lt;span class="nx"&gt;sprite&lt;/span&gt; &lt;span class="nx"&gt;sheet&lt;/span&gt;&lt;span class="p"&gt;...&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="sr"&gt;/div&amp;gt;&lt;/span&gt;&lt;span class="err"&gt;;
&lt;/span&gt;
  &lt;span class="k"&gt;return &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;div&lt;/span&gt; &lt;span class="nx"&gt;className&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;grid grid-cols-4 gap-4&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;urls&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;url&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;img&lt;/span&gt; &lt;span class="nx"&gt;key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="nx"&gt;src&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="nx"&gt;alt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Generated sprite frame&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="o"&gt;/&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;)}&lt;/span&gt;
    &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="sr"&gt;/div&lt;/span&gt;&lt;span class="err"&gt;&amp;gt;
&lt;/span&gt;  &lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  A New Era of Creative Tools
&lt;/h3&gt;

&lt;p&gt;The emergence of powerful multimodal AI models like Gemini marks a paradigm shift. We're moving from a world where creative professionals spend countless hours on repetitive, manual tasks to one where they can focus on high-level vision and ideation.&lt;/p&gt;

&lt;p&gt;Tools like the one I've outlined here pose a direct challenge to traditional creative software companies like &lt;strong&gt;Adobe&lt;/strong&gt; and others in the digital art space. Instead of a user having to master a complex suite of tools—Photoshop for editing, Animate for a single frame, and After Effects for motion—an entire process can now be encapsulated within a single API call. This doesn't eliminate the need for human creativity, but it shifts the focus dramatically. The engineer becomes a co-creator, building tools that can accelerate the artist's workflow by automating the tedious parts. The future of creative software isn't just about a new UI; it's about embedding generative intelligence directly into the core of the tool itself.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>googlecloud</category>
      <category>nanobanana</category>
      <category>spriteanimation</category>
    </item>
    <item>
      <title>Convert Any UI Images to Multi-Page HTML Website With UI Editor</title>
      <dc:creator>Usman Mehfooz</dc:creator>
      <pubDate>Thu, 11 Sep 2025 14:15:46 +0000</pubDate>
      <link>https://forem.com/firevibe/convert-any-ui-images-to-multi-page-html-website-with-google-ai-studio-1in8</link>
      <guid>https://forem.com/firevibe/convert-any-ui-images-to-multi-page-html-website-with-google-ai-studio-1in8</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F73xfwsp3139npvmvs71y.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F73xfwsp3139npvmvs71y.png" alt="Sample HTML Generated"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Final Submission Text&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;This is a submission for the &lt;a href="https://dev.to/challenges/google-ai-studio-2025-09-03"&gt;Google AI Studio Multimodal Challenge&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;  &lt;iframe src="https://www.youtube.com/embed/z3MebyOz1ME"&gt;
  &lt;/iframe&gt;
&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Built
&lt;/h2&gt;

&lt;p&gt;I built &lt;strong&gt;ProtoHTML&lt;/strong&gt;, a web-based tool designed to bridge the gap between design and development. It transforms static website mockups (like screenshots or design files) into fully functional, multi-page HTML websites styled with Tailwind CSS.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz31mvtb9dyz94p0nymr6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz31mvtb9dyz94p0nymr6.png" alt="Main UI"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The problem ProtoHTML solves is the tedious and time-consuming process of manually converting a visual design into code. For developers and designers, this "mockup-to-code" phase can be a major bottleneck. ProtoHTML automates this by using a powerful multimodal AI to analyze the images and write the code, turning a process that could take hours into one that takes just a few seconds.&lt;/p&gt;

&lt;p&gt;Key features include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Multi-Page Site Generation:&lt;/strong&gt; Upload multiple image mockups at once to generate a complete website structure.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AI-Powered Code Generation:&lt;/strong&gt; Leverages the &lt;strong&gt;gemini-2.5-flash-image-preview&lt;/strong&gt; model to produce clean, semantic HTML and Tailwind CSS.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Live Editable Previews:&lt;/strong&gt; Instantly preview the generated pages and edit text content directly in the browser, with the underlying code updating in real-time.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;One-Click Export:&lt;/strong&gt; Package the entire multi-page website into a single, downloadable &lt;code&gt;.zip&lt;/code&gt; file, ready for immediate deployment.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Demo
&lt;/h2&gt;

&lt;p&gt;USED FREE API KEY SO IT MAY NOT BE WORKING &lt;/p&gt;

&lt;p&gt;You can try the live application here: &lt;strong&gt;&lt;a href="https://ai-multi-page-architect-626025278302.us-west1.run.app/" rel="noopener noreferrer"&gt;https://ai-multi-page-architect-626025278302.us-west1.run.app/&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Here is a video walkthrough of the application in action:&lt;/p&gt;

&lt;p&gt;  &lt;iframe src="https://www.youtube.com/embed/z3MebyOz1ME"&gt;
  &lt;/iframe&gt;
&lt;/p&gt;

&lt;p&gt;The application has a clean UI for uploading mockups and editing the results. The AI generates clean HTML and CSS as the output.&lt;/p&gt;

&lt;h2&gt;
  
  
  How I Used Google AI Studio
&lt;/h2&gt;

&lt;p&gt;Google AI Studio was the complete development environment for building and iterating on ProtoHTML. The core of the application is powered by the &lt;strong&gt;gemini-2.5-flash-image-preview&lt;/strong&gt; model (affectionately known as 'nano banana'), used during the free trial period on Sept 6-7. This model was the perfect choice for its powerful and fast multimodal capabilities.&lt;/p&gt;

&lt;p&gt;The key to getting high-quality, consistent output was &lt;strong&gt;prompt engineering&lt;/strong&gt;. I crafted a detailed &lt;code&gt;systemInstruction&lt;/code&gt; that sets the persona for the AI as an "expert senior frontend developer" and provides a strict set of rules it must follow. These rules dictate everything from the output format (raw HTML only) to technical requirements like including the Tailwind CSS CDN link, using semantic HTML5 tags, and implementing responsive design patterns.&lt;/p&gt;

&lt;p&gt;Each API call is a multimodal request, sending both the visual image data and a concise text prompt (e.g., &lt;code&gt;Based on the provided image, generate the complete HTML file for the "About Us" page now.&lt;/code&gt;) to the Gemini model. This combination allows the AI to understand both the visual layout from the image and the specific context for the page from the text.&lt;/p&gt;

&lt;h2&gt;
  
  
  Multimodal Features
&lt;/h2&gt;

&lt;p&gt;The primary multimodal feature of ProtoHTML is &lt;strong&gt;Image-to-Code Generation&lt;/strong&gt;. The application takes a visual input (a webpage mockup) and translates it into a structured, textual output (a complete HTML file with Tailwind CSS classes).&lt;/p&gt;

&lt;p&gt;This functionality fundamentally enhances the user experience in several ways:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Accelerates Prototyping:&lt;/strong&gt; It dramatically reduces the friction between a visual idea and a functional prototype. Users can go from a set of static images to an interactive, multi-page website in minutes, allowing for rapid iteration and feedback.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Empowers Non-Coders:&lt;/strong&gt; Designers or project managers can bring their visions to life without needing to write a single line of code, making web development more accessible.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Creates a Tangible Feedback Loop:&lt;/strong&gt; The most powerful part of the experience is the immediate connection between the visual input and the interactive output. Seeing your static mockup rendered as a live, editable webpage in the "Preview &amp;amp; Edit" tab is a powerful "wow" moment. It makes the AI's "understanding" of the image tangible and gives the user immediate control to refine the result.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>devchallenge</category>
      <category>googleaichallenge</category>
      <category>ai</category>
      <category>gemini</category>
    </item>
  </channel>
</rss>
