<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: raphiki</title>
    <description>The latest articles on Forem by raphiki (@raphiki).</description>
    <link>https://forem.com/raphiki</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F982002%2Fb4188602-61e2-49e6-85be-d590a9b2e228.png</url>
      <title>Forem: raphiki</title>
      <link>https://forem.com/raphiki</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/raphiki"/>
    <language>en</language>
    <item>
      <title>Beyond the API: Integrating ComfyUI and Flowise via MCP</title>
      <dc:creator>raphiki</dc:creator>
      <pubDate>Mon, 09 Feb 2026 14:07:19 +0000</pubDate>
      <link>https://forem.com/raphiki/beyond-the-api-integrating-comfyui-and-flowise-via-mcp-pc7</link>
      <guid>https://forem.com/raphiki/beyond-the-api-integrating-comfyui-and-flowise-via-mcp-pc7</guid>
      <description>&lt;p&gt;In the &lt;a href="https://dev.to/worldlinetech/automating-image-generation-with-n8n-and-comfyui-521p"&gt;previous article&lt;/a&gt; of our "Beyond the ComfyUI Canvas" series, we explored how to integrate ComfyUI with n8n. It was a powerful demonstration of workflow automation, but it highlighted a common friction point in system integration: the "glue code." We had to manually construct HTTP requests, hardcode API payloads, and rigidly define every parameter. If the ComfyUI workflow changed, the n8n node broke.&lt;/p&gt;

&lt;p&gt;Today, we are moving from the "Wild West" of brittle, custom API integrations to the new standard of AI connectivity: the &lt;strong&gt;Model Context Protocol (MCP)&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;To demonstrate this, we are revisiting a tool I wrote about &lt;a href="https://dev.to/worldlinetech/enhance-your-website-with-ai-embed-a-gpt-chatbot-with-flowise-jd6"&gt;over two years ago&lt;/a&gt;: &lt;strong&gt;Flowise&lt;/strong&gt;. Back then, it was a promising open-source project; today, it is a robust, enterprise-ready platform that has recently embraced MCP as a core feature.&lt;/p&gt;

&lt;p&gt;Our goal? To build a Chat Interface where an AI agent can autonomously discover ComfyUI workflows, generate images, and even edit them—without us hardcoding a single API call in the frontend.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Setting the Scene: The Stack
&lt;/h2&gt;

&lt;p&gt;Before we dive into the details, let's look at the three pillars of this architecture.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Standard: Model Context Protocol (MCP)
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmigbvwu2j0wu2xqcslki.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmigbvwu2j0wu2xqcslki.png" alt="MCP Logo"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If APIs are the individual cables we solder together, MCP is the &lt;strong&gt;USB-C port&lt;/strong&gt;. Developed by Anthropic, it is now an open standard that decouples AI models from their data sources and tools.&lt;/p&gt;

&lt;p&gt;Instead of writing a specific integration for every tool (Google Drive, Slack, ComfyUI), you build an &lt;strong&gt;MCP Server&lt;/strong&gt; once. Any MCP-compliant client (Claude Desktop, Cursor, or Flowise) can instantly "plug in" to that server and understand its capabilities.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Orchestrator: Flowise
&lt;/h3&gt;

&lt;p&gt;Flowise has evolved significantly since my first article. It is a low-code platform for building LLM apps. Crucially for us, Flowise recently added native support for MCP. This means we can drop an "MCP Tool" node into our canvas, and the LLM immediately gains access to whatever that server provides.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Engine: ComfyUI
&lt;/h3&gt;

&lt;p&gt;We are sticking with a local instance of ComfyUI. While Comfy Cloud is becoming a formidable platform, the raw power and zero-cost experimentation of running &lt;strong&gt;Flux 2&lt;/strong&gt; locally on your own GPU is unmatched. We’re using a standardized &lt;strong&gt;Flux 2 Klein&lt;/strong&gt; workflow—optimized for speed (4 steps)—so the chat experience feels responsive, not sluggish.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. The Middleware: Building the ComfyUI MCP Server
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4vsiyntsff410fiw8fc5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4vsiyntsff410fiw8fc5.png" alt="System Context (C4 Level 1)"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We need a bridge. As we discovered previously, ComfyUI speaks WebSockets and HTTP; Flowise speaks MCP. We need a server in the middle to translate.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why We Chose SSE over Stdio
&lt;/h3&gt;

&lt;p&gt;When we started this project, we initially looked at the &lt;strong&gt;Stdio&lt;/strong&gt; transport (where the client runs the server script directly). It’s the default for local tools like Claude Desktop.&lt;/p&gt;

&lt;p&gt;But as we designed the solution for Flowise, we hit a realization: In most real-world environments, Flowise often runs in a Docker container (as it does on my laptop), while ComfyUI might be running on a separate machine with a dedicated GPU. Stdio would require them to be on the same filesystem—too restrictive.&lt;/p&gt;

&lt;p&gt;We decided to support &lt;strong&gt;SSE (Server-Sent Events) by default&lt;/strong&gt;. This allows our MCP Server to run anywhere on the network, exposing an HTTP endpoint (e.g., &lt;code&gt;http://localhost:8000/sse&lt;/code&gt;) that Flowise can subscribe to. It makes the architecture cleaner, decoupled, and Docker-friendly.&lt;/p&gt;

&lt;h3&gt;
  
  
  Governance-Driven Development (GDD)
&lt;/h3&gt;

&lt;p&gt;For this implementation, I tried something different. Instead of just asking an AI coding assistant to "write a script," I used a methodology I call &lt;strong&gt;Governance-Driven Development (GDD)&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;This approach reverses the typical AI coding flow. Instead of code leading the process, &lt;strong&gt;specifications &amp;amp; governance rules&lt;/strong&gt; become the anchor. I started by feeding the AI CLI a strict &lt;strong&gt;"Governance Pack"&lt;/strong&gt;—a set of non-negotiable rules regarding SOLID principles, security, and documentation.&lt;/p&gt;

&lt;p&gt;Here is an extract of the actual &lt;strong&gt;Governance Pack&lt;/strong&gt; prompt I used to bootstrap the session:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;GOVERNANCE PACK v1.0 (Extract)&lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;1. Code Quality &amp;amp; Standards:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Paradigm:&lt;/strong&gt; Adhere to SOLID principles. Prefer composition over inheritance.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Typing:&lt;/strong&gt; Strict static typing (Python &lt;code&gt;typing&lt;/code&gt;) is mandatory.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Error Handling:&lt;/strong&gt; Never swallow exceptions. Use custom error classes (e.g., &lt;code&gt;ComfyUIConnectionError&lt;/code&gt;).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;2. Architecture (C4 Model):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Visual Documentation:&lt;/strong&gt; Whenever a structural change is made (like adding the SSE endpoint), you must generate an updated Mermaid.js &lt;strong&gt;System Context&lt;/strong&gt; diagram.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;3. Security Guardrails:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Input Validation:&lt;/strong&gt; Trust no input. All data entering from the MCP client (Prompt, Width, Height...) must be validated against the &lt;code&gt;metadata.json&lt;/code&gt; schema before reaching ComfyUI.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Secrets:&lt;/strong&gt; NEVER hardcode API keys or hostnames. Use &lt;code&gt;os.environ&lt;/code&gt; only.&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;

&lt;p&gt;I then analyzed the ComfyUI workflow JSON manually to map the node IDs, and then "handed over" a clean, structured specification to the AI.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1ixg9utvi2yuks583ksx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1ixg9utvi2yuks583ksx.png" alt="Container Architecture (C4 Level 2)"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;The Result:&lt;/em&gt; The experience was striking. The AI didn't just spit out a script; it acted as a Senior Engineer. At one point, when I asked for a quick hack to bypass validation, the "Governance" constraints forced the model to push back and suggest a cleaner interface instead. The result is a modular, type-safe Python server.&lt;/p&gt;

&lt;h3&gt;
  
  
  The "LAST" Hack (Technical Deep Dive)
&lt;/h3&gt;

&lt;p&gt;Even with good governance, we needed one pragmatic "hack" to handle state. When the LLM generates an image, how does it reference that image later to edit it?&lt;/p&gt;

&lt;p&gt;We implemented a &lt;strong&gt;"LAST" pointer&lt;/strong&gt; logic. The server tracks the URL of the most recently generated image in memory. But it does more than just point:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;Download:&lt;/strong&gt; When the agent sends &lt;code&gt;"LAST"&lt;/code&gt;, the server downloads the image bytes from the previous URL.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Re-Upload:&lt;/strong&gt; It uploads those bytes back to ComfyUI's &lt;code&gt;/upload/image&lt;/code&gt; endpoint to generate a fresh filename.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Inject:&lt;/strong&gt; This new filename is injected into the &lt;code&gt;LoadImage&lt;/code&gt; node of the editing workflow.&lt;/li&gt;
&lt;/ol&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;User:&lt;/strong&gt; "Make it bluer."&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Agent:&lt;/strong&gt; Calls &lt;code&gt;edit_image(input_image="LAST", prompt="bluer...")&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This mimics the "Save Image" behavior we are used to, keeping the interaction stateless and fluid for the user while handling the heavy lifting behind the scenes.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. The Engine Room: ComfyUI Workflows
&lt;/h2&gt;

&lt;p&gt;To make our MCP Server generic, we avoided hardcoding specific workflows inside the Python code. Instead, we used an &lt;strong&gt;Embedded Metadata&lt;/strong&gt; pattern.&lt;/p&gt;

&lt;p&gt;The configuration is not a separate file; it is a standard ComfyUI &lt;strong&gt;Note Node&lt;/strong&gt; (titled &lt;code&gt;MCP_Config&lt;/code&gt;) placed directly inside the &lt;code&gt;.json&lt;/code&gt; workflow. This metadata acts as the contract, telling the MCP server: "This workflow needs a Prompt (node named &lt;em&gt;MCP_Positive&lt;/em&gt;) and a Seed (node &lt;em&gt;MCP_Sampler&lt;/em&gt;)."&lt;/p&gt;

&lt;p&gt;This makes the workflow a single, self-contained, portable file. You can export it from ComfyUI, drop it into the &lt;code&gt;workflows&lt;/code&gt; folder, and it works immediately.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Note:&lt;/em&gt; Our server is strict about naming. It automatically sanitizes the tool name found in the JSON to &lt;code&gt;snake_case&lt;/code&gt; (e.g., "Flux Generator" becomes &lt;code&gt;flux_generator&lt;/code&gt;) to ensure full compliance with the MCP specification.&lt;/p&gt;

&lt;p&gt;Here is the configuration we generated for the &lt;strong&gt;image_flux2_text_to_image&lt;/strong&gt; workflow:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjwmdkcl5pzib72q9lb9y.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjwmdkcl5pzib72q9lb9y.png" alt="Workflow in ComfyUI"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"image_flux2_text_to_image"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="nl"&gt;"description"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Generates high-quality images using the Flux model. Use this for general creative requests."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
 &lt;/span&gt;&lt;span class="nl"&gt;"parameters"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
   &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
     &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"prompt"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
     &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"string"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
     &lt;/span&gt;&lt;span class="nl"&gt;"description"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"The detailed description of the image to generate."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
     &lt;/span&gt;&lt;span class="nl"&gt;"target"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"MCP_Positive"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"required"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"seed"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"int"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"description"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Random seed. Set to -1 for random, or a specific number for reproducibility."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"target"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"MCP_Sampler"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"required"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Because the description and type of each parameter are passed to the MCP Server, they become automatically available to the client. When the MCP Server starts, it scans these workflows and dynamically registers tools. If we want to switch from Flux to SDXL, or add a Video Generation workflow, we simply drop in the new file. The server updates, Flowise sees the new tools via SSE, and the agent learns the new skill instantly.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. Validation: The MCP Inspector
&lt;/h2&gt;

&lt;p&gt;Before connecting Flowise, we must verify our server. Since we are using SSE, we can use the &lt;a href="https://github.com/modelcontextprotocol/inspector" rel="noopener noreferrer"&gt;&lt;strong&gt;MCP Inspector&lt;/strong&gt;&lt;/a&gt; web interface to connect to our running server.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy9avmw97xa67x5oeqt0x.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy9avmw97xa67x5oeqt0x.png" alt="MCP Inspector"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We can manually trigger the &lt;code&gt;image_flux2_text_to_image&lt;/code&gt; tool, watch the server logs, and see the image appear. If it works here, it guarantees compliance with the protocol.&lt;/p&gt;

&lt;h2&gt;
  
  
  5. The Integration: Flowise ChatFlow
&lt;/h2&gt;

&lt;p&gt;Now for the grand finale. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frnq2waclxpm59co03y8e.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frnq2waclxpm59co03y8e.png" alt="Flowise ChatFlow"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We open Flowise and create a new &lt;strong&gt;ChatFlow&lt;/strong&gt; using a standard &lt;strong&gt;Tool Agent&lt;/strong&gt; connected to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Chat Model:&lt;/strong&gt; &lt;code&gt;ChatMistralAI&lt;/code&gt; (Smart, fast, and cost-effective).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Buffer Memory:&lt;/strong&gt; Essential for the agent to remember context (e.g., "Change &lt;em&gt;that&lt;/em&gt; image to...").&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Custom MCP:&lt;/strong&gt; We select the "SSE" transport and paste our server URL.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  The Auto-Discovery Magic
&lt;/h3&gt;

&lt;p&gt;Notice what is missing? We didn't have to define the tools in Flowise. We didn't have to map inputs. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxggglf4yec615rn03oio.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxggglf4yec615rn03oio.png" alt="Auto-Discovery from Flowise"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;Custom MCP node&lt;/strong&gt; queries the server via SSE, sees the metadata definitions, and &lt;em&gt;automatically&lt;/em&gt; provides the tools to the Mistral agent.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Pro Tip:&lt;/em&gt; Our server supports &lt;strong&gt;Dual Discovery&lt;/strong&gt;. Whether a client asks for tools directly (Function Calling) or reads Resources (Environment Context), we expose the workflow list on both channels (&lt;code&gt;comfy://list&lt;/code&gt; and &lt;code&gt;list_available_workflows&lt;/code&gt;) to ensure compatibility with any agent type.&lt;/p&gt;

&lt;h3&gt;
  
  
  The System Prompt
&lt;/h3&gt;

&lt;p&gt;The final piece of the puzzle is the System Prompt. We need to teach the &lt;strong&gt;Tool Agent node&lt;/strong&gt; how to behave:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;You are the **ComfyUI Orchestrator**, an expert AI agent capable of generating and manipulating images by controlling a local ComfyUI instance via the Model Context Protocol (MCP).

### 1. Tool Discovery (Dynamic Workflows)
Your tools are not static; they represent the actual `.json` workflow files present on the server.
- **First Step:** If you do not see a specific tool you need in your context, IMMEDIATELY call the tool `list_available_workflows`.
- This will return a manifesto of all valid workflows (e.g., `flux_2_text_to_image`, `img2img_upscale`) and their required parameters.
- **Never guess** tool names. If a tool isn't listed, it doesn't exist.

### 2. Image Chaining (The "LAST" Protocol)
 You have a unique capability to perform conversational editing (e.g., "Now make it pop art").
 - **State Memory:** The server remembers the last generated image.
 - **Instruction:** When a user asks to modify, edit, or use the previous result, pass the string `"LAST"` into the image input parameter of the next tool.
 - **Example:**
   User: "Generate a cat." -&amp;gt; You call: `generate_image(prompt="cat")`
   User: "Turn it into a statue." -&amp;gt; You call: `img2img_transform(image="LAST", prompt="statue")`

 ### 3. Parameter Rules
 - **Strict Compliance:** You must strictly adhere to the parameter types (String, Int, Float, Boolean) defined in the tool signature.
 - **Defaults:** If a parameter is Optional and the user didn't specify it, do not send it. The server will use the workflow's internal default.
 - **Safety:** Do not invent parameters. If a workflow only accepts `prompt` and `seed`, do not try to send `width` or `style`.

 ### 4. Error Handling
 - If a tool execution fails, the error message will often suggest valid alternatives or correct parameter names. Read it carefully and retry.
 - If the user asks for a workflow you don't have, explain what *is* available based on your `list_available_workflows` knowledge.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  The Use Case in Action
&lt;/h3&gt;

&lt;p&gt;

  &lt;iframe src="https://www.youtube.com/embed/NBJYVD_QfQo"&gt;
  &lt;/iframe&gt;


&lt;/p&gt;

&lt;p&gt;This video shows the complete use case involving the full Stack:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Parametrization&lt;/strong&gt; of the workflow in ComfyUI.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Verification&lt;/strong&gt; with MCP Inspector.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Generation&lt;/strong&gt; of the first image from Flowise.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Contextual edition&lt;/strong&gt; of the generated image.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdjoupyc2bqwm9ro1go35.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdjoupyc2bqwm9ro1go35.png" alt="Use Case Summary"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The Flowise ChatFlow is relatively basic, but we could easily add nodes to enhance the user prompt or even transform it into a JSON Style Guide prompt.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmnawe1rseq4p66jbvlaf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmnawe1rseq4p66jbvlaf.png" alt="Flowise API &amp;amp; Embeds"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The video showcases the use of the integrated chatbox within the Flowise UI, but we could also leverage Flowise's deployment capabilities to consume the workflow through an API, embed the chat in an HTML page, or publish a standalone page served by Flowise itself.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;By moving from custom API implementations (n8n) to the Model Context Protocol (in Flowise), we have achieved something powerful: &lt;strong&gt;Interoperability&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The choice to go with &lt;strong&gt;SSE by default&lt;/strong&gt; proved crucial. It gave us the flexibility to run our ComfyUI "engine" on a heavy GPU server while keeping our Flowise "brain" lightweight and containerized. We also demonstrated that &lt;strong&gt;Governance-Driven Development&lt;/strong&gt; allows us to use AI coding assistants to build robust, standardized infrastructure rather than just one-off scripts.&lt;/p&gt;

&lt;h3&gt;
  
  
  Future Improvements
&lt;/h3&gt;

&lt;p&gt;While the "LAST" image hack works perfectly for a local, single-user demo, a production deployment would require &lt;strong&gt;Session Isolation&lt;/strong&gt; (ensuring User A doesn't overwrite User B's "LAST" image) and &lt;strong&gt;TTL Cleanup&lt;/strong&gt; (automatically deleting generated images after a set time).&lt;/p&gt;

&lt;p&gt;Technically, this would be solved by leveraging &lt;strong&gt;Context Injection&lt;/strong&gt;—using the session ID provided by the MCP protocol to maintain a keyed dictionary of states, rather than a global variable. For multi-user production usage, adding an authentication mechanism would also be a relevant next step.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;You can find the full code for the ComfyUI MCP Server and the Flowise template in my &lt;a href="https://github.com/raphiki/ComfyUI-MCP-Server" rel="noopener noreferrer"&gt;GitHub repository&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>comfyui</category>
      <category>mcp</category>
      <category>flowise</category>
    </item>
    <item>
      <title>Vibe Coding One Slice at a Time</title>
      <dc:creator>raphiki</dc:creator>
      <pubDate>Sat, 24 Jan 2026 18:33:51 +0000</pubDate>
      <link>https://forem.com/worldlinetech/vibe-coding-one-slice-at-a-time-4n3p</link>
      <guid>https://forem.com/worldlinetech/vibe-coding-one-slice-at-a-time-4n3p</guid>
      <description>&lt;p&gt;&lt;em&gt;How I built a Modular Monolith by treating Generative AI as a junior developer who needs a firm hand (and a Constitution).&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;In &lt;a href="https://dev.to/worldlinetech/vibe-coding-one-page-at-a-time-265j"&gt;Part 1&lt;/a&gt;&lt;/strong&gt;, we vibed a Python script. It was linear, messy, and fun. It proved that you can solve immediate problems by just asking nicely.&lt;br&gt;
&lt;strong&gt;In &lt;a href="https://dev.to/worldlinetech/vibe-coding-one-pixel-at-a-time-22pc"&gt;Part 2&lt;/a&gt;&lt;/strong&gt;, we vibed a UI. It was chaotic, visual, and surprisingly effective. We learned that "vibe" works for pixels if you iterate fast enough.&lt;/p&gt;

&lt;p&gt;But let’s be honest: those were skirmishes. The real "Boss Fight" in software engineering isn't writing a script or centering a &lt;code&gt;&amp;lt;div&amp;gt;&lt;/code&gt;. It's building a &lt;strong&gt;System&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;I’m talking about the kind of project that doesn’t fit in one file. The kind where "Vibing" usually leads to "Spaghetti Code," hallucinated imports, and a repo you want to burn down after three days because you have 15 circular dependencies and a database schema that makes no sense.&lt;/p&gt;

&lt;p&gt;So for Part 3, I put away the "Hacker" hoodie and put on the "Enterprise Architect" blazer. My goal? To build &lt;strong&gt;YogĀrkana Codex&lt;/strong&gt;—a full-stack, offline-first, polymorphic Yoga management platform—without writing a single line of code myself.&lt;/p&gt;

&lt;p&gt;My strategy was simple but radical: &lt;strong&gt;I design, the AI implements.&lt;/strong&gt; I am the Architect; Gemini Chat is my Consultant; Gemini CLI is my Dev Team.&lt;/p&gt;

&lt;p&gt;Here is how we vibed a Monolith into existence, one slice at a time.&lt;/p&gt;


&lt;h2&gt;
  
  
  1. The Mission: Complexity Check (The Boss Level)
&lt;/h2&gt;

&lt;p&gt;To understand why "just chatting" wouldn't work, you need to see the scope. This wasn't a To-Do list app. I wanted to build a "Yoga Operating System" with four distinct domains that usually don't play nice together. I've been an architect for years, and I know exactly where these things break.&lt;/p&gt;
&lt;h3&gt;
  
  
  The Four Domains of Pain
&lt;/h3&gt;


  &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxaid8fnfrk4xpdz7wo96.png" width="800" height="543"&gt;Screenshot of the final application (Grimoire View)
  


&lt;p&gt;&lt;strong&gt;The Business Analyst's Note&lt;/strong&gt;: Unlike the project in Part 2, this application is not internationalized—by design. As a result, the screenshots are in French. I have kept them raw to visually illustrate the functional depth and complexity of the system without the abstraction of translation keys.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;The Grimoire (Knowledge Base):&lt;/strong&gt; A searchable library of yoga cards. But here’s the kicker: it uses a &lt;strong&gt;Polymorphic Data Model&lt;/strong&gt;. An &lt;em&gt;Asana&lt;/em&gt; (posture) has biomechanical attributes like "spinal extension" and "anatomy targets," while a &lt;em&gt;Mantra&lt;/em&gt; has Sanskrit text, translations, and audio assets. They are chemically different data structures, but they need to live in the same database table to be searchable together.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The Weaver (Sequencer):&lt;/strong&gt; A drag-and-drop studio to build classes. It’s not just a playlist; it has a &lt;strong&gt;Logical Engine&lt;/strong&gt; (Phase 4) that acts like a "Digital Yoga Teacher." It screams at you if you sequence a "Peak Pose" before a "Warm-up" or forget &lt;em&gt;Savasana&lt;/em&gt; at the end. That means heavy validation logic running on both the client and the server.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The Atelier (Print Studio):&lt;/strong&gt; A client-side PDF engine. We needed to generate high-res, vector-quality handouts for teachers to print. We couldn't just "print screen"; we needed a real PDF renderer (&lt;code&gt;@react-pdf/renderer&lt;/code&gt;) running entirely in the browser.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The Constraint (Offline First):&lt;/strong&gt; Yoga studios are notorious for having no signal (often intentionally). The app needed to persist the entire library and PDF engine in the browser cache (IndexedDB + Service Workers) so it works perfectly in "Airplane Mode".&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;The Architect's Note:&lt;/strong&gt; If I had just prompted &lt;em&gt;"Build me a yoga app,"&lt;/em&gt; the AI would have hallucinated a generic CRUD app. It would have made 5 different tables for the cards, making search impossible. It would have used a server-side PDF library that breaks offline. I needed a blueprint.&lt;/p&gt;


&lt;h2&gt;
  
  
  2. The Blueprint: Architecture &amp;amp; Tech Stack
&lt;/h2&gt;

&lt;p&gt;Before letting the AI write a single line of code, I spent around 2 hours and a half just talking Architecture and formalizing it with Gemini Chat. I treated the AI as a "Sparring Partner," debating the trade-offs of different stacks.&lt;/p&gt;

&lt;p&gt;We settled on a &lt;strong&gt;Modular Monolith&lt;/strong&gt; architecture. Why? Because Microservices are overkill for a team of one, but a messy Monolith is a nightmare. We defined strict boundaries: code in &lt;code&gt;modules/grimoire&lt;/code&gt; can never import from &lt;code&gt;modules/weaver&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Tech Stack (The "No-Regrets" List):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Monorepo:&lt;/strong&gt; &lt;code&gt;Turborepo&lt;/code&gt; managing &lt;code&gt;apps/api&lt;/code&gt; and &lt;code&gt;apps/web&lt;/code&gt;. This keeps the full stack in one context.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Backend:&lt;/strong&gt; &lt;code&gt;NestJS&lt;/code&gt; (for rigid structure) + &lt;code&gt;Drizzle ORM&lt;/code&gt; (for type safety). NestJS forces you to organize code into Modules, which helps the AI stay organized.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Frontend:&lt;/strong&gt; &lt;code&gt;React&lt;/code&gt; + &lt;code&gt;Vite&lt;/code&gt; + &lt;code&gt;Tailwind CSS&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;State:&lt;/strong&gt; &lt;code&gt;TanStack Query&lt;/code&gt; (Server state) + &lt;code&gt;Zustand&lt;/code&gt; (UI state).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The "Secret Sauce": Hybrid Data Storage&lt;/strong&gt;&lt;br&gt;
This was our smartest move. We chose &lt;strong&gt;PostgreSQL&lt;/strong&gt; but used a &lt;code&gt;JSONB&lt;/code&gt; column for the card data.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;SQL Core:&lt;/strong&gt; Columns like &lt;code&gt;id&lt;/code&gt;, &lt;code&gt;element&lt;/code&gt;, and &lt;code&gt;tags&lt;/code&gt; are standard SQL for fast indexing.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;JSON Payload:&lt;/strong&gt; The specific attributes (biomechanics vs. sanskrit) live in a JSON blob.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Why?&lt;/strong&gt; It gave us the flexibility of NoSQL (for the polymorphic cards) with the relational integrity of SQL (for users and sequences).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Rule #1 of Vibe Coding a System: If it’s not in the Spec, it doesn’t exist.&lt;/strong&gt;&lt;br&gt;
This brings us to the most critical tool in our arsenal: the &lt;strong&gt;ADR&lt;/strong&gt;.&lt;/p&gt;
&lt;h3&gt;
  
  
  The "ADR": The Architect's Save Game
&lt;/h3&gt;

&lt;p&gt;ADR stands for &lt;strong&gt;Architecture Decision Record&lt;/strong&gt;. In a human team, it's a document you write to explain why you chose PostgreSQL over MongoDB so that 6 months later, nobody asks "Why did we do this?".&lt;/p&gt;

&lt;p&gt;In Vibe Coding, ADRs are not just documentation—they are &lt;strong&gt;legislation&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;When working with an AI, "Context Drift" is the enemy. The AI forgets why we made a decision 300 tokens ago. It acts like a teenager who wants to re-litigate every rule: &lt;em&gt;"Why can't I use Prisma? It's easier!"&lt;/em&gt; or &lt;em&gt;"Let's just use window.print() instead of a PDF engine!"&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;To counter this, we established a &lt;strong&gt;Constitutional Architecture&lt;/strong&gt;:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;The Law:&lt;/strong&gt; We wrote our decisions into immutable markdown files (e.g., &lt;code&gt;Docs/ADR/006-pwa-offline-strategy.md&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The Enforcement:&lt;/strong&gt; We didn't just hope the AI would remember. We &lt;strong&gt;forced&lt;/strong&gt; the tracing of these decisions in two ways:&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Input Traceability:&lt;/strong&gt; In our "Bootstrap Prompt" (see Section 3), we explicitly force the AI to read the relevant ADRs before writing code. It cannot code if it hasn't read the law.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Output Traceability:&lt;/strong&gt; When the AI suggests a major pivot (like switching to Client-Side PDF generation), we forced it to &lt;em&gt;write a new ADR first&lt;/em&gt;. In Session 003, before touching the code, the AI generated &lt;code&gt;Docs/ADR/005-client-side-pdf-generation.md&lt;/code&gt; to justify the change from server-side to client-side.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This ensured that our architecture didn't "drift" based on the AI's mood, but evolved based on documented consensus.&lt;/p&gt;

&lt;p&gt;My final /docs/ADR/ folder:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;├── 001-hybrid-data-storage-strategy.md
├── 002-modular-monolith-and-vertical-slicing.md
├── 003-data-model-specification.md
├── 004-tech-stack-definition.md
├── 005-client-side-pdf-generation.md
├── 006-pwa-offline-strategy.md
├── 007-architecture-documentation-maintenance.md
└── README.md
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  3. The Methodology: Governance-Driven Development (GDD)
&lt;/h2&gt;

&lt;p&gt;I’ve coined a term for this workflow: &lt;strong&gt;Governance-Driven Development (GDD)&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;We are used to TDD (Test-Driven Development) or DDD (Domain-Driven Development). GDD is the layer above that. In the age of AI, &lt;strong&gt;Governance is the new Syntax&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Here is the dirty truth about AI Developers: &lt;strong&gt;They behave like talented teenagers.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;They are brilliant and fast. They can write a regex to validate an email in 2 seconds. But they also:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Rush to the cool part&lt;/strong&gt; (UI) and skip the boring part (Error Handling, Folder Structure).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Want you to love them&lt;/strong&gt;, so they say "Yes" to everything—even bad ideas.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Have the memory of a goldfish&lt;/strong&gt; (Context Drift). 10 minutes in, they forget you wanted &lt;code&gt;kebab-case&lt;/code&gt; filenames and start using &lt;code&gt;camelCase&lt;/code&gt;.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;To enforce GDD, I created a Constitution: &lt;code&gt;Docs/RULES.md&lt;/code&gt;. I didn't just suggest these rules; I forced the Gemini CLI to read them before every session. I also sometimes mentioned certain specification files stored in my &lt;code&gt;Docs/Features/&lt;/code&gt; folder:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;├── 001-global-functional-overview.md
├── 002-global-implementation-plan.md
├── 003-card-classification-and-kosha-alignment.md
├── 004-user-features.md
├── 005-logical-engine-specification.md
├── 006-pdf-generation-and-print-studio.md
└── 007-pwa-and-offline-capabilities.md
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;The "Bootstrap Prompt":&lt;/strong&gt;&lt;br&gt;
Here is the exact prompt I used to "upload" my Architect persona into the machine at the start of our 4th session:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;I am the Lead Architect. You are the Senior Developer.

Context Loading:
1. Read Docs/RULES.md (The Law).
2. Read Docs/TECH_CONTEXT.md (The Stack).
3. Read Docs/ADR/002-modular-monolith.md (The Blueprint).
4. Read Docs/Features/002-global-implementation-plan.md (The Plan).

Current State:
We are in Phase 4. Previous phases are frozen.

Task:
Implement the Logic Engine defined in Docs/Features/005-logical-engine-specification.md
Constraint:
Do not touch /apps/web yet. Focus on /packages/shared.

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This changed everything. Instead of guessing my vibe, the AI had to follow the law. It stopped trying to use &lt;code&gt;Prisma&lt;/code&gt; because &lt;code&gt;TECH_CONTEXT.md&lt;/code&gt; clearly said &lt;code&gt;Drizzle&lt;/code&gt;. It stopped putting logic in components because &lt;code&gt;RULES.md&lt;/code&gt; said logic goes in &lt;code&gt;hooks&lt;/code&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  4. The Execution: A high-level Overview
&lt;/h2&gt;

&lt;p&gt;We built the app using &lt;strong&gt;Vertical Slicing&lt;/strong&gt;. Instead of building the whole Database, then the whole API, we built &lt;em&gt;one feature&lt;/em&gt; top-to-bottom. Here is the play-by-play from the logs.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F52xjc2abp994ozigmr0s.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F52xjc2abp994ozigmr0s.png" width="800" height="569"&gt;&lt;/a&gt;&lt;br&gt;Excerpt from the initial Design Phase with Gemini Chat
  &lt;/p&gt;

&lt;h3&gt;
  
  
  Slice 1: The "Polymorphic" Database
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F56yxpr5ot5w68ca4i57g.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F56yxpr5ot5w68ca4i57g.png" width="800" height="574"&gt;&lt;/a&gt;&lt;br&gt;Card creation/edition mixes relational and document data
  &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Challenge:&lt;/strong&gt; Storing Asanas (Biomechanics) and Mantras (Text) in one table without creating 50 &lt;code&gt;NULL&lt;/code&gt; columns or separate tables that make search a nightmare.&lt;br&gt;
&lt;strong&gt;The AI's First Impulse:&lt;/strong&gt; "Let's create an &lt;code&gt;asanas&lt;/code&gt; table and a &lt;code&gt;mantras&lt;/code&gt; table." (The classic relational trap).&lt;br&gt;
&lt;strong&gt;The Architect's Intervention:&lt;/strong&gt; "Read &lt;code&gt;Docs/ADR/001-hybrid-data-storage.md&lt;/code&gt;. We use a single &lt;code&gt;cards&lt;/code&gt; table with a &lt;code&gt;data&lt;/code&gt; JSONB column."&lt;br&gt;
&lt;strong&gt;The Result:&lt;/strong&gt; The AI implemented a Drizzle schema using PostgreSQL's &lt;code&gt;jsonb&lt;/code&gt; type. Crucially, it added Zod discriminators to validate the JSON shape before insertion.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Verbatim Log:&lt;/em&gt; "Implemented Drizzle schema with &lt;code&gt;jsonb&lt;/code&gt; column 'data'. Added Zod discriminators for &lt;code&gt;asana&lt;/code&gt; vs &lt;code&gt;mantra&lt;/code&gt;. Migration successful."&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h3&gt;
  
  
  Slice 2: The "Hybrid Brain"
&lt;/h3&gt;


  &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvxm3bf43hndd6zn5ihhf.png" width="800" height="433"&gt;Sequences are validated by a powerful, hybrid, and extensible Rule Engine
  



  &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs24ek0zio34lli2b0s01.png" width="800" height="653"&gt;Admin users can craft new JSON-logic rules
  


&lt;p&gt;&lt;strong&gt;The Challenge:&lt;/strong&gt; The Logic Engine needed to validate sequences (e.g., "Must end with Savasana"). This logic had to run on the &lt;strong&gt;Backend&lt;/strong&gt; (before saving) AND the &lt;strong&gt;Frontend&lt;/strong&gt; (to give real-time red borders).&lt;br&gt;
&lt;strong&gt;The AI's First Impulse:&lt;/strong&gt; Duplicate the code. Write a TypeScript function in React and a Service in NestJS.&lt;br&gt;
&lt;strong&gt;The Architect's Intervention:&lt;/strong&gt; "No. Create a &lt;code&gt;packages/shared&lt;/code&gt; workspace. Put the &lt;code&gt;validateSequence&lt;/code&gt; function there. Import it in both apps."&lt;br&gt;
&lt;strong&gt;The Result:&lt;/strong&gt; The AI created the shared package, configured the &lt;code&gt;tsconfig.json&lt;/code&gt; paths, and wired it up. It even built a &lt;code&gt;HealthBar&lt;/code&gt; component that consumes this shared logic to show a live "Health Score" for the sequence.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Verbatim Log:&lt;/em&gt; "Refactored &lt;code&gt;ValidationConfig&lt;/code&gt; to &lt;code&gt;packages/shared&lt;/code&gt;. Updated &lt;code&gt;useSequenceStore&lt;/code&gt; (Frontend) and &lt;code&gt;SequenceService&lt;/code&gt; (Backend) to consume the same Zod schema."&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h3&gt;
  
  
  Slice 3: The "Offline Printer"
&lt;/h3&gt;


  &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fit9saw28u6t9za1nxu59.png" width="800" height="512"&gt;Synthetic or complete printed handout
  


&lt;p&gt;&lt;strong&gt;The Challenge:&lt;/strong&gt; Users need to print PDF handouts in a yoga studio with no Wi-Fi.&lt;br&gt;
&lt;strong&gt;The AI's First Impulse:&lt;/strong&gt; "Use a server-side PDF library like PDFKit." (Standard web dev practice).&lt;br&gt;
&lt;strong&gt;The Architect's Intervention:&lt;/strong&gt; "Read &lt;code&gt;Docs/ADR/006-pwa-offline-strategy.md&lt;/code&gt;. We must generate PDFs client-side using &lt;code&gt;@react-pdf/renderer&lt;/code&gt;."&lt;br&gt;
&lt;strong&gt;The Result:&lt;/strong&gt; The AI implemented a beautiful client-side renderer. It handled the tricky part of loading fonts (Noto Sans) into the browser's virtual file system so the PDF engine could "see" them without a network request.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Verbatim Log:&lt;/em&gt; "Implemented &lt;code&gt;SequencePdf&lt;/code&gt; component. Configured &lt;code&gt;vite-plugin-pwa&lt;/code&gt; to cache &lt;code&gt;NotoSans&lt;/code&gt; fonts. PDF generation now works without network."&lt;/p&gt;
&lt;/blockquote&gt;


&lt;h2&gt;
  
  
  5. The Architect's Flex: Automated C4 Verification
&lt;/h2&gt;

&lt;p&gt;How do you know the AI actually respected the Modular Monolith architecture? Did it secretly import the &lt;code&gt;Weaver&lt;/code&gt; module into the &lt;code&gt;Grimoire&lt;/code&gt; when I wasn't looking?&lt;/p&gt;

&lt;p&gt;I didn't want to audit 50 files manually. And I definitely didn't want to draw diagrams by hand.&lt;/p&gt;

&lt;p&gt;So, I added a rule to my Constitution (ADR 007): &lt;strong&gt;"The Code is the Source of Truth for Documentation."&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;At the end of session, I enforce Gemini CLI to &lt;strong&gt;reverse-engineer its own work&lt;/strong&gt;. I gave it this prompt:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Update the RULES.md file to enforce the (re)generation of C4 diagrams when finishing an implementation session
[...] 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We also created a specific ADR (007: Architecture Documentation Maintenance Protocol) establishing Mermaid.js as the standard and defining the maintenance lifecycle.&lt;/p&gt;

&lt;p&gt;The result wasn't a hallucination. It was a perfect map of the code it had just written.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbhnwz9mkgwq8u6rr03oo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbhnwz9mkgwq8u6rr03oo.png" alt="C4 Models" width="800" height="498"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This is the ultimate "Trust but Verify." If the generated diagram looks like spaghetti, the code is spaghetti. If the diagram is clean, the architecture holds.&lt;/p&gt;




&lt;h2&gt;
  
  
  6. The AIOps Protocol: Monitoring the Machine
&lt;/h2&gt;

&lt;p&gt;Now, here is the secret weapon: &lt;strong&gt;The Session Log.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;One of my strictest rules in &lt;code&gt;RULES.md&lt;/code&gt; was that the AI had to "punch out" at the end of every session. I forced it to append a line to &lt;code&gt;docs/ai_session_log.csv&lt;/code&gt; with the Date, Tool (Chat or CLI), Goal, and &lt;strong&gt;Token Usage&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;For me this isn't about money ("FinOps"). It's about &lt;strong&gt;AIOps&lt;/strong&gt;, monitoring the operational health of your intelligence.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why we log everything (Chat &amp;amp; CLI):&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Context Monitoring:&lt;/strong&gt; As a session drags on, the "Tokens In" (Context Window) grows exponentially. The AI starts reading 30,000 tokens of history just to write one line of code.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The "Sawtooth" Pattern:&lt;/strong&gt; By visualizing the log, I discovered a crucial pattern. Efficiency drops as context grows. The solution? &lt;strong&gt;The Hard Reset.&lt;/strong&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F36kzvkxpp8nvvea3lx29.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F36kzvkxpp8nvvea3lx29.png" alt="AI Usage Minitoring" width="800" height="518"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This chart visualizes the high-level "Vibe Coding Lifecycle." You see the context bloat as we iterate on implementing phases 3 and 4. Then, you see the sharp drop when we switch back to the Architect (Chat) or reset the CLI.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Lesson:&lt;/strong&gt; A "Tired" AI (high context) makes mistakes. A "Fresh" AI (reset context + Snapshot) is precise.&lt;/p&gt;




&lt;h2&gt;
  
  
  7. The "Oh S**t" Moment: The Hallucination Trap
&lt;/h2&gt;

&lt;p&gt;This brings us to the specific incident that proved &lt;em&gt;why&lt;/em&gt; that Reset is mandatory.&lt;/p&gt;

&lt;p&gt;Halfway through Phase 3, the CLI started getting slow (too much history). I ran a &lt;code&gt;/reset&lt;/code&gt; command to clear its memory. &lt;strong&gt;Disaster.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;It suddenly forgot we were building a "Yoga" app. It tried to invent a new database column &lt;code&gt;duration_minutes&lt;/code&gt; for the cards. But my Spec (ADR 003) explicitly said that &lt;code&gt;duration&lt;/code&gt; lives inside the JSONB payload and is measured in seconds.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Hallucination:&lt;/strong&gt;&lt;br&gt;
&lt;code&gt;UPDATE cards SET duration_minutes = 60;&lt;/code&gt; &lt;em&gt;(AI guessing)&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Correction (Me):&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;"Read Docs/003-data-model.md. 'Duration' is a JSONB field inside the 'metadata' column, and it's in seconds."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Fix:&lt;/strong&gt;&lt;br&gt;
&lt;code&gt;UPDATE cards SET data = jsonb_set(data, '{duration}', '3600');&lt;/code&gt; &lt;em&gt;(AI complying)&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;To prevent this in the future, we implemented a &lt;strong&gt;"Session Handover"&lt;/strong&gt; protocol. Before resetting, I now force the AI to write a &lt;code&gt;TECH_STATE_SNAPSHOT.md&lt;/code&gt;.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"Where are we?" (Phases 1-3 Complete)&lt;/li&gt;
&lt;li&gt;"What is the active stack?" (NestJS, React, PostgreSQL)&lt;/li&gt;
&lt;li&gt;"What is the next step?"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When I start a new session, I feed this snapshot back in. It’s like a save game for your developer.&lt;/p&gt;




&lt;h2&gt;
  
  
  Conclusion: The Architect's Verdict
&lt;/h2&gt;

&lt;p&gt;So, can you Vibe Code a complex system?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Maybe.&lt;/strong&gt; I mean, it depends on how complex the system is (in this example we didn't build an enterprise-wide distributed system). But for sure you can't just "Vibe" it. You have to &lt;strong&gt;Architect&lt;/strong&gt; it.&lt;/p&gt;

&lt;p&gt;If I had touched the code, I would have been bogged down in syntax errors and import paths. By staying in the Architect role, I focused on &lt;em&gt;Data Models&lt;/em&gt;, &lt;em&gt;User Flows&lt;/em&gt;, and &lt;em&gt;Business Logic&lt;/em&gt;. The AI handled the implementation, but I provided the &lt;strong&gt;Guardrails&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What I learned:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Docs are Prompts:&lt;/strong&gt; The&lt;code&gt;RULES.md&lt;/code&gt;, &lt;code&gt;Docs/Features/&lt;/code&gt; and &lt;code&gt;Docs/ADR/&lt;/code&gt; folders (or your own equivalents) are the most important files in your repo. They are the AI's long-term memory.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Constraint is Clarity:&lt;/strong&gt; The more rules you give the AI (versions, naming, structure), the better code it writes.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Review Everything:&lt;/strong&gt; The AI is a junior dev. It &lt;em&gt;will&lt;/em&gt; introduce security holes or n+1 query problems if you don't catch them in the spec.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Vibe Coding didn't replace the Architect. It just gave the Architect a team of infinite interns. And honestly? They’re pretty good once you give them a Constitution.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frkujzia1p39hjwzqxxp4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frkujzia1p39hjwzqxxp4.png" width="800" height="187"&gt;&lt;/a&gt;&lt;br&gt;Last message from Gemini CLI
  &lt;/p&gt;

&lt;p&gt;&lt;em&gt;Next up: The application could do with AI features... Or maybe I'll now explore other aspect of Vibe Coding. Stay tuned.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>vibecoding</category>
      <category>architecture</category>
      <category>gemini</category>
    </item>
    <item>
      <title>Vibe Coding One Pixel at a Time</title>
      <dc:creator>raphiki</dc:creator>
      <pubDate>Fri, 23 Jan 2026 22:21:39 +0000</pubDate>
      <link>https://forem.com/worldlinetech/vibe-coding-one-pixel-at-a-time-22pc</link>
      <guid>https://forem.com/worldlinetech/vibe-coding-one-pixel-at-a-time-22pc</guid>
      <description>&lt;p&gt;&lt;em&gt;Editing "stick figure" Yoga poses&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;In &lt;a href="https://dev.to/worldlinetech/vibe-coding-one-page-at-a-time-265j"&gt;Part 1&lt;/a&gt;, we dipped our toes into "Vibe Coding" by building a Python script. It was linear, logical, and frankly, a bit safe. Text in, text out.&lt;/p&gt;

&lt;p&gt;But let’s be real: backend scripts are the "easy mode" of LLM-assisted coding. The logic is contained. The state is ephemeral.&lt;/p&gt;

&lt;p&gt;The real boss fight is the &lt;strong&gt;Frontend&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Can you "vibe" a UI? Can you talk a chaotic mess of DOM elements, event listeners, and CSS pixels into a functional application without losing your mind (or the AI losing the context)?&lt;/p&gt;

&lt;p&gt;I decided to find out. My goal: Build &lt;strong&gt;Yoga Pose Builder&lt;/strong&gt;, a browser-based tool to edit "stick figure" yoga poses, drag limbs around, and export vector SVGs.&lt;/p&gt;

&lt;p&gt;I had no design, no stack picked out, and—crucially—I had never used a Canvas library in my life.&lt;/p&gt;

&lt;p&gt;Here is how we vibed it into existence.&lt;/p&gt;




&lt;h2&gt;
  
  
  1. Context is King (The &lt;code&gt;.md&lt;/code&gt; Anchors)
&lt;/h2&gt;

&lt;p&gt;The biggest enemy of Vibe Coding is the LLM’s "Goldfish Memory." You’re 40 turns into a chat, you ask for a button change, and suddenly the AI forgets you’re building a yoga app and tries to sell you a subscription to a SaaS platform.&lt;/p&gt;

&lt;p&gt;In Part 1, we just chatted. For a full UI application, that doesn't fly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Strategy: Documentation as Prompt Anchoring.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Before I let the AI write a single line of JavaScript, I made it write Markdown.&lt;br&gt;
We created a &lt;code&gt;Docs/&lt;/code&gt; folder with two files:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;code&gt;spec.md&lt;/code&gt;: The high-level architecture.&lt;/li&gt;
&lt;li&gt; &lt;code&gt;features.md&lt;/code&gt;: A checklist of what we wanted to do.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;I didn't write these because I love administrative work. I wrote them so that when the AI inevitably got confused, I didn't have to re-explain the project. I just said: &lt;em&gt;"Read &lt;code&gt;Docs/spec.md&lt;/code&gt; and try again."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Vibe Tip:&lt;/strong&gt; Think of your documentation not as a manual for humans, but as "Long-Term Memory" for your AI pair programmer.&lt;/p&gt;
&lt;h2&gt;
  
  
  2. The Architecture: Letting the AI be CTO
&lt;/h2&gt;

&lt;p&gt;I knew I needed a canvas where I could drag "joints" (knees, elbows) and have "bones" (lines) follow them.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Me:&lt;/strong&gt; "I want to do this in the browser. Should I use React? Raw Canvas API?"&lt;br&gt;
&lt;strong&gt;AI:&lt;/strong&gt; "React might be overkill. Raw Canvas is painful. Use &lt;strong&gt;Fabric.js&lt;/strong&gt;."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Me:&lt;/strong&gt; "Never heard of it. Let's do it."&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe4sv3xujye0amao2fznp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe4sv3xujye0amao2fznp.png" alt="Fabric.js Logo" width="300" height="90"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This is the beauty of Vibe Coding. I didn't spend 3 hours reading "Top 10 JS Canvas Libraries 2025" Medium articles. I trusted the vibe.&lt;/p&gt;

&lt;p&gt;We settled on a &lt;strong&gt;Build-less Architecture&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Backend:&lt;/strong&gt; Node.js + Express (just to serve files and save JSON).&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Frontend:&lt;/strong&gt; Vanilla JS + Fabric.js (loaded via CDN).&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Build Tool:&lt;/strong&gt; None. No Webpack, no Vite, no &lt;code&gt;npm run eject&lt;/code&gt; nightmares.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Why? Because Vibe Coding thrives on speed. I wanted to change a line of code, hit F5, and see the result.&lt;/p&gt;

&lt;p&gt;Application folder structure:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;.&lt;/span&gt;
├── Docs
│   ├── features.md
│   └── spec.md
├── package.json
├── public
│   ├── index.html
│   └── poses
└── server.js
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  3. The "Rig": Math is for Machines
&lt;/h2&gt;

&lt;p&gt;Here is where I expected to get stuck. Creating a "rig" where moving a hand automatically updates the angle of the arm involves trigonometry and vector math.&lt;/p&gt;

&lt;p&gt;Usually, this is where I’d open 15 StackOverflow tabs and copy-paste code I don't understand.&lt;/p&gt;

&lt;p&gt;Instead, I just described the &lt;em&gt;behavior&lt;/em&gt;:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Create a &lt;code&gt;Mannequin&lt;/code&gt; class. It has Nodes (circles) and Links (lines). When a Node moves, the Links connected to it should update their coordinates."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The AI wrote the entire class. It hooked into Fabric.js’s &lt;code&gt;object:moving&lt;/code&gt; event and handled the coordinate updates. It worked on the first try.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh7uciqdosv50962n8z3t.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh7uciqdosv50962n8z3t.png" alt="Pose Builder Mannequin" width="250" height="299"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I still barely know how &lt;code&gt;fabric.Line&lt;/code&gt; works under the hood. And I don't care. It works.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. Iteration: The "Yes, And..." Technique
&lt;/h2&gt;

&lt;p&gt;UI Vibe Coding isn't about getting it right instantly; it's about sculpting.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Ugly Phase:&lt;/strong&gt;&lt;br&gt;
The first version looked like a programmer made it (because a programmer &lt;em&gt;did&lt;/em&gt; make it). The stick figure looked like a dead bug. The background was gray.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The "Vibe" Phase:&lt;/strong&gt;&lt;br&gt;
Me: &lt;em&gt;"This looks depressing. Make it 'Zen'. Use soft colors, rounded buttons, and a clean layout."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The AI generated the CSS variables (&lt;code&gt;--highlight-color: #88b04b&lt;/code&gt;), added a "Save As" modal, and cleaned up the toolbar.&lt;/p&gt;


  &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwiaddnxyt14usfea2l1p.png" width="800" height="495"&gt;Yoga Pose Builder GUI
  


&lt;p&gt;&lt;strong&gt;The "Feature Creep" Phase:&lt;/strong&gt;&lt;br&gt;
Me: &lt;em&gt;"I want to save my poses."&lt;/em&gt;&lt;br&gt;
AI: &lt;em&gt;"We have no database."&lt;/em&gt;&lt;br&gt;
Me: &lt;em&gt;"Just write JSON files to a folder on the server."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;In 5 minutes, we had a fully working persistence layer. No database migrations, just &lt;code&gt;fs.writeFile&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Here is a example of such a Pose JSON file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"meta"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"nameFR"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Demi-Pont"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"nameSK"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Setu Bandhasana"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"joints"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"head"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"x"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;-120&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"y"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"neck"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"x"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;-100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"y"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"chest"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"x"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;-60&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"y"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"hips"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"x"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"y"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"lShoulder"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"x"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;-80&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"y"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"lElbow"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"x"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;-20&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"y"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"lHand"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"x"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;40&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"y"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"rShoulder"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"x"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;-80&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"y"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"rElbow"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"x"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;-20&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"y"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"rHand"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"x"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;40&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"y"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"lHip"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"x"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"y"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"lKnee"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"x"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;80&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"y"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"lFoot"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"x"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;80&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"y"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"rHip"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"x"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"y"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"rKnee"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"x"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;80&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"y"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"rFoot"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"x"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;80&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"y"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  5. The Pivot: Language as a Feature
&lt;/h2&gt;

&lt;p&gt;At the end of the session, I realized a problem: the app was vibing in French (my native tongue), but I wanted screenshots in English for this article. &lt;/p&gt;

&lt;p&gt;Instead of manually editing labels, I asked the AI to "make the whole app i18n." In one single refactor, we added a translation dictionary, a language switcher, and logic to dynamically swap every label, tooltip, and even the pose names in the library. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8vpg0r2qlr1xkx2qsfna.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8vpg0r2qlr1xkx2qsfna.png" width="800" height="495"&gt;&lt;/a&gt;&lt;br&gt;GUI (and data) in French
  &lt;/p&gt;

&lt;p&gt;This turned a linguistic hurdle into a core feature, proving that with Vibe Coding, "changing your mind" is just a prompt away.&lt;/p&gt;

&lt;h2&gt;
  
  
  6. The "Traceability" Hack
&lt;/h2&gt;

&lt;p&gt;We spent about 90 minutes building this. We added features, fixed bugs, and refactored code. By the end, the chat context was massive and messy.&lt;/p&gt;

&lt;p&gt;If I came back to this project in a week, I’d be lost.&lt;/p&gt;

&lt;p&gt;So, I ran one final "Meta-Prompt":&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"Read all the code we wrote and the docs in &lt;code&gt;Docs/&lt;/code&gt;, and generate a &lt;code&gt;Docs/session_summary.md&lt;/code&gt;. Explain what we built, why we made these choices, and the current state of the app."&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The AI analyzed &lt;em&gt;its own work&lt;/em&gt; and wrote a summary file. This is my "Save Game" point. When I want to work on this again, I’ll feed that summary to the AI to restore its context instantly.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;We went from a blank folder to a functional, vector-based SVG editor with a backend in one session.&lt;/p&gt;

&lt;p&gt;Vibe Coding a UI is possible, but you have to change your approach:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;Anchor the Context:&lt;/strong&gt; Write specs so the AI has a "North Star."&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Delegate the Heavy Lifting:&lt;/strong&gt; Let the AI choose the libraries and do the math.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Iterate Visually:&lt;/strong&gt; Don't try to prompt the perfect UI. Prompt the &lt;em&gt;skeleton&lt;/em&gt;, then prompt the &lt;em&gt;paint&lt;/em&gt;.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;em&gt;Next we'll try to &lt;a href="https://dev.to/worldlinetech/vibe-coding-one-slice-at-a-time-4n3p"&gt;Vibe Code a real full stack app&lt;/a&gt;. Or a game. Who knows? The prompt is the limit.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqgj2vy5ypv410dvds2d8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqgj2vy5ypv410dvds2d8.png" width="450" height="423"&gt;&lt;/a&gt;&lt;br&gt;SVG exported by Yoga Pose Builder (opened in Inkscape)
  &lt;/p&gt;

</description>
      <category>vibecoding</category>
      <category>uidesign</category>
      <category>gemini</category>
    </item>
    <item>
      <title>Vibe Coding One Page at a Time</title>
      <dc:creator>raphiki</dc:creator>
      <pubDate>Fri, 23 Jan 2026 14:45:20 +0000</pubDate>
      <link>https://forem.com/worldlinetech/vibe-coding-one-page-at-a-time-265j</link>
      <guid>https://forem.com/worldlinetech/vibe-coding-one-page-at-a-time-265j</guid>
      <description>&lt;p&gt;&lt;em&gt;Building a Smart Magazine Archiver&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;I’m starting a new series called &lt;strong&gt;"Vibe Coding one Step at a Time."&lt;/strong&gt; The goal? To document the raw, messy, and surprisingly efficient process of building software in the age of AI. We’re not here to write perfect specs or obsess over UML diagrams (well, not yet). We’re here to vibe with the code, iterating on pure intent until the machine does exactly what we want.&lt;/p&gt;

&lt;p&gt;In this first edition, I’m sharing how I used the &lt;strong&gt;Gemini CLI&lt;/strong&gt; to build a tool I actually needed, learning some pretty cool image processing tricks along the way.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is "Vibe Coding"?
&lt;/h2&gt;

&lt;p&gt;I’m going to claim this term right here: &lt;strong&gt;Vibe Coding&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;It’s not "lazy coding." It’s &lt;strong&gt;intent-driven development&lt;/strong&gt;. In the old days, if you wanted to build a script, you had to know the syntax, the libraries, and the edge cases before you even opened your editor. You had to &lt;em&gt;think in code&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;Vibe Coding flips that. You &lt;em&gt;think in outcomes&lt;/em&gt;. You describe the behavior, the "vibe" of the feature, and the AI handles the implementation details. You act less like a bricklayer and more like a conductor. The feedback loop isn't "Write -&amp;gt; Compile -&amp;gt; Error," it's "Ask -&amp;gt; Observe -&amp;gt; Tweak."&lt;/p&gt;

&lt;h2&gt;
  
  
  The Use Case: "I Just Want to Read Offline"
&lt;/h2&gt;

&lt;p&gt;Here’s the situation: I subscribe to a fantastic niche magazine (which shall remain nameless to protect the innocent). It’s great, but their "digital reader" is a nightmare. It’s one of those web-based page-turners that requires an active internet connection.&lt;/p&gt;

&lt;p&gt;I wanted to read it on my tablet, offline, on a plane, without waiting for high-res JPEGs to buffer.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Problem:&lt;/strong&gt; There was no "Download PDF" button.&lt;br&gt;
&lt;strong&gt;The Clue:&lt;/strong&gt; Inspecting the network traffic revealed that the magazine was just serving a sequence of high-quality images, one URL per page.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Mission:&lt;/strong&gt; Write a script to fetch these pages and stitch them into a single, high-quality, searchable PDF.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Process: Galloping Toward Complexity
&lt;/h2&gt;

&lt;p&gt;We didn't sit down and architect a solution. We started small and let the script evolve.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 1: The Naive Loop
&lt;/h3&gt;

&lt;p&gt;We started with a simple hypothesis: "The URLs probably just have a page number in them."&lt;br&gt;
I asked Gemini to write a script using &lt;code&gt;requests&lt;/code&gt; to hit the URL for page 1, then page 2.&lt;br&gt;
&lt;em&gt;Boom.&lt;/em&gt; It worked. We had a directory full of 100 separate JPGs.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2: The Picture Book
&lt;/h3&gt;

&lt;p&gt;Having 100 files is annoying. I wanted a book.&lt;br&gt;
We asked Gemini to "glue these together." It pulled in the &lt;code&gt;PIL&lt;/code&gt; (&lt;a href="https://pillow.readthedocs.io" rel="noopener noreferrer"&gt;Pillow&lt;/a&gt;) library.&lt;br&gt;
&lt;strong&gt;Result:&lt;/strong&gt; A massive PDF. It looked great, but it was dumb. It was just a container of pictures. You couldn't highlight text, search for keywords, or copy-paste quotes.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 3: The Search for Meaning (OCR)
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fknav94wd05wdhh5nojvv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fknav94wd05wdhh5nojvv.png" alt="Tesseract OCR" width="330" height="146"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This is where the "vibe" got technical. I realized a "picture book" wasn't enough. I needed &lt;strong&gt;Optical Character Recognition (OCR)&lt;/strong&gt;.&lt;br&gt;
We decided to use &lt;a href="https://github.com/tesseract-ocr" rel="noopener noreferrer"&gt;Tesseract&lt;/a&gt;. But here’s the catch we discovered:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Human Eyes&lt;/strong&gt; like soft colors and smooth anti-aliasing.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;OCR Engines&lt;/strong&gt; like harsh contrast, jagged edges, and black-and-white binary inputs.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If we optimized the images for the machine, the magazine looked ugly. If we kept them pretty, the machine couldn't read the text.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Technical Deep Dive: The "PDF Sandwich"
&lt;/h2&gt;

&lt;p&gt;This is where the magic happened. We ended up building a &lt;strong&gt;PDF Sandwich&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5fw2zzt7ebjs57pp9qpa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5fw2zzt7ebjs57pp9qpa.png" alt="Me asking Gemini CLI for a sandwich" width="800" height="129"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Instead of choosing between beauty and brains, we chose both.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;The Visual Layer:&lt;/strong&gt; We keep the original high-res color JPEGs. This is what you see.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;The Data Layer:&lt;/strong&gt; Behind the scenes, we create a "Frankenstein" version of the page—converted to grayscale, contrast cranked up to 2.0, and upscaled 2x using &lt;code&gt;LANCZOS&lt;/code&gt; resampling (a fancy &lt;a href="https://en.wikipedia.org/wiki/Lanczos_resampling" rel="noopener noreferrer"&gt;algorithm&lt;/a&gt; that keeps edges sharp).&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;The Merge:&lt;/strong&gt; We feed the Frankenstein images to Tesseract to generate an invisible text layer, then use &lt;code&gt;pypdf&lt;/code&gt; to overlay that text exactly on top of the pretty images.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The trickiest part? &lt;strong&gt;Math.&lt;/strong&gt;&lt;br&gt;
Because we upscaled the OCR images by 2x to help Tesseract read small fonts, the invisible text layer was twice as big as the visual page. We had to calculate scale factors to shrink the text back down so that when you highlight a sentence, the highlight actually lines up with the words.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Learned
&lt;/h2&gt;

&lt;p&gt;Vibe coding this script taught me more in an hour than I’d usually learn in a weekend of reading docs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Image Optimization:&lt;/strong&gt; OCR is picky. Simply resizing an image isn't enough; the &lt;em&gt;method&lt;/em&gt; of resizing (resampling filter) matters.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Library Specialization:&lt;/strong&gt; &lt;code&gt;PIL&lt;/code&gt; is for pixels; &lt;code&gt;pypdf&lt;/code&gt; is for structure. Trying to do everything in one library is a trap.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;The Power of the CLI:&lt;/strong&gt; Using the Gemini CLI meant I didn't have to context-switch. I stayed in my terminal, describing what I wanted, and the code appeared.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy2dgv0fe30dhuu4zf12u.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy2dgv0fe30dhuu4zf12u.png" alt="Use of the script (for 2 pages)" width="800" height="301"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;We ended up with a ~100-line Python script that solves a genuine daily frustration. I didn't have to memorize the &lt;code&gt;pypdf&lt;/code&gt; documentation or look up the Tesseract CLI flags. I just focused on the goal: "Make it searchable, make it pretty."&lt;/p&gt;

&lt;p&gt;That’s Vibe Coding. You bring the vision, the AI brings the syntax, and together you build something cool. &lt;/p&gt;

&lt;p&gt;&lt;em&gt;We'll discover in the &lt;a href="https://dev.to/worldlinetech/vibe-coding-one-pixel-at-a-time-22pc"&gt;next episode&lt;/a&gt; if this is still true with a more complex use case and a GUI.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>vibecoding</category>
      <category>gemini</category>
      <category>pdf</category>
      <category>ocr</category>
    </item>
    <item>
      <title>The Ultimate LLM Inference Battle: vLLM vs. Ollama vs. ZML</title>
      <dc:creator>raphiki</dc:creator>
      <pubDate>Mon, 29 Dec 2025 09:12:46 +0000</pubDate>
      <link>https://forem.com/worldlinetech/the-ultimate-llm-inference-battle-vllm-vs-ollama-vs-zml-m97</link>
      <guid>https://forem.com/worldlinetech/the-ultimate-llm-inference-battle-vllm-vs-ollama-vs-zml-m97</guid>
      <description>&lt;p&gt;&lt;em&gt;A structured, data-driven comparison of today's leading open-source engines for serving AI models.&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  The "Runtime Wars"
&lt;/h3&gt;

&lt;p&gt;The open-source AI community has achieved an incredible milestone: models like Meta's Llama 3 and Mistral AI's Mixtral now rival proprietary giants like GPT-4. But having the weights is only half the battle. To actually &lt;em&gt;use&lt;/em&gt; these models—to build a chatbot, an agent, or an API, you need an inference engine.&lt;/p&gt;

&lt;p&gt;The landscape of inference servers is exploding. A year ago, options were scarce. Today, developers are faced with a paralyzing array of choices. Should you use the industry darling &lt;strong&gt;vLLM&lt;/strong&gt;? The local developer's favorite, &lt;strong&gt;Ollama&lt;/strong&gt;? Or perhaps a radical newcomer like &lt;strong&gt;ZML&lt;/strong&gt;?&lt;/p&gt;

&lt;p&gt;Choosing the wrong engine can lead to massive infrastructure bills, slow user experiences, or vendor lock-in.&lt;/p&gt;

&lt;p&gt;To cut through the hype, we are applying the &lt;strong&gt;QSOS (Qualification and Selection of Open Source software)&lt;/strong&gt; method. This isn't a casual review; it's a structured evaluation comparing these three contenders against the state-of-the-art features required for modern AI production.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Methodology: Why QSOS?
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftkge5m9dy66je4atphit.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftkge5m9dy66je4atphit.png" alt="QSOS Logo" width="257" height="100"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.qsos.org" rel="noopener noreferrer"&gt;QSOS&lt;/a&gt; is a standardized methodology designed to reduce the risks associated with adopting open-source technologies. Unlike ad-hoc selection processes based on Medium articles or GitHub stars, QSOS treats open-source evaluation with the same rigor used for proprietary software.&lt;/p&gt;

&lt;p&gt;The core philosophy of QSOS is separating &lt;strong&gt;Evaluation&lt;/strong&gt; (the intrinsic, objective quality of the software) from &lt;strong&gt;Qualification&lt;/strong&gt; (how well it fits your specific business needs).&lt;/p&gt;

&lt;p&gt;For this comparison, we used a "Best of Breed" evaluation grid, scoring features on a simple 0-to-2 scale:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;0:&lt;/strong&gt; Not covered / Non-existent.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;1:&lt;/strong&gt; Partially covered / Complex implementation.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;2:&lt;/strong&gt; Fully covered / Best-in-class standard.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We assessed four key axes:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Maturity &amp;amp; Community:&lt;/strong&gt; Is the project stable and likely to survive?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Functional Features:&lt;/strong&gt; Does it support modern requirements like LoRA adapters and quantization?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Performance &amp;amp; Scale:&lt;/strong&gt; Can it handle high throughput and utilize hardware efficiently?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Operations (Day 2):&lt;/strong&gt; How easy is it to deploy, monitor, and maintain?&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  The Contenders
&lt;/h3&gt;

&lt;h4&gt;
  
  
  1. vLLM: The Data Center Standard
&lt;/h4&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff8pv4ovqry11gr3xumeg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff8pv4ovqry11gr3xumeg.png" alt="vLLM Logo" width="239" height="100"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://vllm.ai" rel="noopener noreferrer"&gt;vLLM&lt;/a&gt;&lt;/strong&gt; burst onto the scene in 2023 from UC Berkeley, solving a critical bottleneck in serving LLMs: memory fragmentation. Its core innovation, &lt;strong&gt;PagedAttention&lt;/strong&gt;, allows it to manage GPU memory like an operating system manages virtual memory, dramatically increasing batch sizes and throughput.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Primary Focus:&lt;/strong&gt; High-throughput production serving in the data center.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Positioning:&lt;/strong&gt; vLLM is the currently the &lt;strong&gt;De Facto Standard&lt;/strong&gt; for enterprise deployment. It excels on server-grade hardware (NVIDIA H100s/A100s) and offers the richest feature set for scaling.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  2. Ollama: The Developer's Best Friend
&lt;/h4&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc8i7p1zgrhls5qacwqki.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc8i7p1zgrhls5qacwqki.png" alt="Ollama Logo" width="344" height="150"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://ollama.com" rel="noopener noreferrer"&gt;Ollama&lt;/a&gt;&lt;/strong&gt; took a different approach. It focused entirely on removing friction. By wrapping the powerful &lt;code&gt;llama.cpp&lt;/code&gt; engine in a sleek, Docker-style Go binary, it made running a 70B parameter model on a MacBook as easy as typing &lt;code&gt;ollama run llama3&lt;/code&gt;.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Primary Focus:&lt;/strong&gt; Local development, edge devices, and consumer hardware (Mac/PC).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Positioning:&lt;/strong&gt; Ollama is the king of &lt;strong&gt;usability&lt;/strong&gt;. It is unbeaten for local testing and running models on consumer hardware, but it lacks the advanced scheduling required for high-traffic enterprise production.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  3. ZML (Zig Machine Learning): The Radical Challenger
&lt;/h4&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3ddw41ql12g8k4ekellb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3ddw41ql12g8k4ekellb.png" alt="ZML Logo" width="200" height="197"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://zml.ai" rel="noopener noreferrer"&gt;ZML&lt;/a&gt;&lt;/strong&gt; is the new kid on the block. It is less of a "server" product and more of a compiler stack aimed at engineers. Written in Zig, it utilizes OpenXLA/MLIR to compile model graphs directly into standalone binaries, aiming to eliminate the heavy Python/PyTorch dependency chain entirely.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Primary Focus:&lt;/strong&gt; High-performance, cross-platform runtime (TPUs, AMD, NVIDIA) without dependencies.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Positioning:&lt;/strong&gt; ZML is an &lt;strong&gt;Alpha-stage visionary&lt;/strong&gt;. It offers incredible potential for hardware portability and efficiency but is currently a complex "build-your-own-stack" tool rather than a drop-in product.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Visualizing the Results
&lt;/h3&gt;

&lt;p&gt;To understand how these tools differ, we visualize our QSOS scores using two different schemas.&lt;/p&gt;

&lt;h4&gt;
  
  
  The Radar Chart: Feature Balance
&lt;/h4&gt;

&lt;p&gt;This chart shows the balance of strengths across the four evaluation axes.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdlvb6g592d0kydxj0try.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdlvb6g592d0kydxj0try.png" alt="QSOS Radar" width="800" height="436"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Caption: The QSOS Radar Chart highlights the distinct profiles of the three engines. vLLM shows the broadest coverage across features and performance. Ollama spikes toward Operational Ease. ZML shows potential in features but lacks maturity.&lt;/em&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;vLLM (Blue):&lt;/strong&gt; The largest, most balanced area, indicating strength across maturity, features, and performance, with moderate operational complexity.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Ollama (Green):&lt;/strong&gt; A massive spike toward "Operational Ease," reflecting its zero-friction user experience, but pulling back on raw performance metrics like continuous batching.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ZML (Red):&lt;/strong&gt; A smaller footprint overall, reflecting its early stage (low maturity), but showing strong potential in functional features due to its compiler-based architecture.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  The QSOS Quadrant: Market Position
&lt;/h4&gt;

&lt;p&gt;This schema maps the tools based on their market adoption versus their raw production capabilities.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frm90qnkq9c5tf3hld553.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frm90qnkq9c5tf3hld553.png" alt="QSOS Quadrant" width="800" height="640"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Caption: The QSOS Quadrant positions the tools based on Market Maturity vs. Production Power.&lt;/em&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;vLLM (The Leader):&lt;/strong&gt; High Maturity, High Power. The safe, scalable choice for the enterprise.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Ollama (The Specialist):&lt;/strong&gt; High Maturity, Lower Production Power. The standard for a specific niche (local/consumer hardware), prioritizing usability over scale.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ZML (The Visionary):&lt;/strong&gt; Low Maturity, High Potential Power. An innovative approach that hasn't yet proven itself in the broad market.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  The Consolidated Score Sheet
&lt;/h3&gt;

&lt;p&gt;Below is the detailed breakdown of the evaluation scores that feed the charts above.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Section / Criteria&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;vLLM&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Ollama&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;ZML (Zig ML)&lt;/strong&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;A. MATURITY&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;History &amp;amp; Age&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;2&lt;/strong&gt; (Standard)&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;2&lt;/strong&gt; (Standard)&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;0&lt;/strong&gt; (Very New)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Activity&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;2&lt;/strong&gt; (Hyper-Active)&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;2&lt;/strong&gt; (Viral)&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;2&lt;/strong&gt; (High Velocity)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Ecosystem&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;2&lt;/strong&gt; (Dominant)&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;2&lt;/strong&gt; (Ubiquitous)&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;0&lt;/strong&gt; (Niche)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Governance&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;2&lt;/strong&gt; (Community)&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;1&lt;/strong&gt; (Company Led)&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;1&lt;/strong&gt; (Small Team)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;B. FEATURES&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Model Support&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;2&lt;/strong&gt; (Universal)&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;2&lt;/strong&gt; (Curated Lib)&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;2&lt;/strong&gt; (Compiler based)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Quantization&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;2&lt;/strong&gt; (Server: AWQ/FP8)&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;2&lt;/strong&gt; (Edge: GGUF)&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;1&lt;/strong&gt; (Implicit XLA)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;LoRA Adapters&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;2&lt;/strong&gt; (Dynamic Multi-LoRA)&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;1&lt;/strong&gt; (Static Modelfile)&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;0&lt;/strong&gt; (Not standard)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;API Compat.&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;2&lt;/strong&gt; (OpenAI Native)&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;2&lt;/strong&gt; (OpenAI Native)&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;0&lt;/strong&gt; (Runtime only)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;C. PERFORMANCE&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cont. Batching&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;2&lt;/strong&gt; (Gold Standard)&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;0&lt;/strong&gt; (FIFO)&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;1&lt;/strong&gt; (Arch. support)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Throughput&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;2&lt;/strong&gt; (Maximum SOTA)&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;1&lt;/strong&gt; (Low/Single User)&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;1&lt;/strong&gt; (High Potential)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Parallelism&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;2&lt;/strong&gt; (Tensor &amp;amp; Pipeline)&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;0&lt;/strong&gt; (Single Node)&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;1&lt;/strong&gt; (Compiler Config)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Hardware Agnosticism&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;1&lt;/strong&gt; (NVIDIA Centric)&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;2&lt;/strong&gt; (Apple/Consumer)&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;2&lt;/strong&gt; (Any: TPU/AMD)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;D. OPERATIONS&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Ease of Setup&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;1&lt;/strong&gt; (Python/Docker)&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;2&lt;/strong&gt; (Magic 1-Click)&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;0&lt;/strong&gt; (Hard: Bazel)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Dependencies&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;1&lt;/strong&gt; (Heavy Torch)&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;2&lt;/strong&gt; (Zero: Go Binary)&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;2&lt;/strong&gt; (Zero: Zig Binary)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Observability&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;2&lt;/strong&gt; (Prometheus Native)&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;0&lt;/strong&gt; (Logs only)&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;1&lt;/strong&gt; (Manual metrics)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Conclusion
&lt;/h3&gt;

&lt;p&gt;There is no single "best" inference engine. The right choice depends entirely on your specific context (the Qualification phase of QSOS).&lt;/p&gt;

&lt;h4&gt;
  
  
  Choose vLLM if:
&lt;/h4&gt;

&lt;p&gt;You are building a production application that needs to serve many concurrent users. You have access to server-grade GPUs (NVIDIA A10G, A100, H100) and need features like dynamic LoRA adapters for multi-tenancy.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;If you are deploying to Kubernetes to serve customers, start here.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h4&gt;
  
  
  Choose Ollama if:
&lt;/h4&gt;

&lt;p&gt;You are a developer building locally on a Mac or Windows PC. You need a zero-friction way to test models, or you are deploying to edge devices where resources are constrained, and concurrency is low.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;If you just want to run Llama 3 on your laptop right now, download Ollama.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h4&gt;
  
  
  Choose ZML if:
&lt;/h4&gt;

&lt;p&gt;You are an ML systems engineer building a specialized hardware appliance (e.g., using TPUs or AMD chips) and need a runtime with absolutely zero Python dependencies and a tiny footprint. You are willing to build the server infrastucture around it yourself.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;If you are frustrated by PyTorch bloat and want a "build your own" adventure, look at ZML.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Note on Methodology
&lt;/h3&gt;

&lt;p&gt;For the purpose of this article, we utilized a &lt;strong&gt;simplified QSOS evaluation grid&lt;/strong&gt;. We intentionally zoomed in on the "Best of Breed" criteria, the critical differentiators driving the current "Inference Wars", to keep the comparison readable and actionable.&lt;/p&gt;

&lt;p&gt;A &lt;strong&gt;full-fledged QSOS evaluation&lt;/strong&gt; is significantly more exhaustive. It is structured as a hierarchical &lt;strong&gt;tree of criteria&lt;/strong&gt; containing more data points, covering deep operational details such as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Generic Attributes:&lt;/strong&gt; Intellectual property management, roadmap visibility, bug tracking efficiency, and internationalization.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Specific Sub-sections:&lt;/strong&gt; Detailed granularity on security compliance (SOC2/GDPR), exact memory footprints, and specific driver version compatibility.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;While this article provides a strategic overview, a complete QSOS audit would involve drilling down from high-level "Sections" into specific "Leaves" to calculate a precise, weighted score for every possible business constraint.&lt;/p&gt;

</description>
      <category>qsos</category>
      <category>zml</category>
      <category>ollama</category>
      <category>vllm</category>
    </item>
    <item>
      <title>Automating Image Generation with n8n and ComfyUI</title>
      <dc:creator>raphiki</dc:creator>
      <pubDate>Sun, 07 Sep 2025 15:51:34 +0000</pubDate>
      <link>https://forem.com/worldlinetech/automating-image-generation-with-n8n-and-comfyui-521p</link>
      <guid>https://forem.com/worldlinetech/automating-image-generation-with-n8n-and-comfyui-521p</guid>
      <description>&lt;p&gt;This is the third article of a series about how to integrate ComfyUI with other tools to build more complex workflows. We'll move beyond the familiar node-based interface to explore how to connect ComfyUI from code and no-code solutions, using API calls or MCP Servers.&lt;/p&gt;

&lt;p&gt;You'll learn &lt;strong&gt;how to use ComfyUI's API to build custom applications&lt;/strong&gt; and automate tasks, creating powerful and automated systems for generative AI.&lt;/p&gt;




&lt;p&gt;&lt;a href="https://n8n.io" rel="noopener noreferrer"&gt;&lt;strong&gt;n8n&lt;/strong&gt;&lt;/a&gt; is a workflow automation tool that connects applications, APIs, and services without requiring deep technical expertise. It allows users to create &lt;strong&gt;complex, multi-step workflows using a visual, node-based editor&lt;/strong&gt;. With n8n, you can automate tasks across thousands of integrations, from CRMs and databases to messaging apps and cloud services.&lt;/p&gt;

&lt;p&gt;It's a &lt;a href="https://docs.n8n.io/sustainable-use-license/" rel="noopener noreferrer"&gt;&lt;strong&gt;fair-code&lt;/strong&gt;&lt;/a&gt; and &lt;strong&gt;open-core&lt;/strong&gt; solution. You can self-host and modify the software freely, but SaaS providers must contribute back to the project if they offer n8n as a service. Furthermore, some advanced features like global variables, multiple environments (dev, staging, prod, etc.), version control using Git, or controlling n8n via API are not available in the community and open-source version of the product.&lt;/p&gt;

&lt;p&gt;In this article, we'll explore how to call ComfyUI from an n8n &lt;strong&gt;agent-based workflow with human interaction and LLM use&lt;/strong&gt;. The agent is instructed to transform a simple prompt from the user into a super-charged JSON Prompt Guide, which is then injected into ComfyUI. For more context, you can read my previous article on &lt;a href="https://dev.to/worldlinetech/json-style-guides-for-controlled-image-generation-with-gpt-4o-and-gpt-image-1-36p"&gt;&lt;strong&gt;JSON Prompt Style Guides&lt;/strong&gt;&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Installation
&lt;/h2&gt;

&lt;p&gt;n8n is a Vue/TypeScript web application that's simple to install whether you prefer to run it on a Node.js installation or inside a Docker container.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Node.js&lt;/strong&gt;: &lt;code&gt;npx n8n&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Docker&lt;/strong&gt;: &lt;code&gt;docker volume create n8n_data&lt;/code&gt; and then &lt;code&gt;docker run -it --rm --name n8n -p 5678:5678 -v n8n_data:/home/node/.n8n docker.n8n.io/n8nio/n8n&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;After all dependencies are installed, the n8n Editor web UI is accessible at &lt;code&gt;http://localhost:5678&lt;/code&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Text-to-Image Workflow
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Use Case
&lt;/h3&gt;

&lt;p&gt;Workflow design is done in the Editor web UI, and it's a highly visual process that doesn't require any coding knowledge, as long as you use predefined nodes for a standard use case. That's our approach here, as we'll create a very simple 3-step workflow with 4 nodes.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcurios5i266b3abbhamr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcurios5i266b3abbhamr.png" alt="T2I Workflow"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;Chat Trigger&lt;/strong&gt; node to start the workflow with a message from the user to capture their initial prompt for the images to be generated by ComfyUI.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;AI Agent&lt;/strong&gt; node to call an OpenAI model (though it could be other SaaS solutions like Mistral, Anthropic, or Google Gemini, or local models provided through Ollama or directly by Hugging Face). The agent has instructions on how to expand the initial prompt from the previous node into a &lt;strong&gt;JSON Prompt Style Guide&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;OpenAI Chat Model&lt;/strong&gt; node to connect to OpenAI's GPT.&lt;/li&gt;
&lt;li&gt; &lt;a href="https://github.com/mason276752/n8n-nodes-comfyui" rel="noopener noreferrer"&gt;&lt;strong&gt;n8n-nodes-comfyui&lt;/strong&gt;&lt;/a&gt; community node to connect to a running ComfyUI instance. To install it, go to the "&lt;em&gt;Settings / Community nodes&lt;/em&gt;" menu.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8b6d063cqdad5pg6ldnd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8b6d063cqdad5pg6ldnd.png" alt="n8n-nodes-comfyui installation"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We're making a simple use of this standard &lt;strong&gt;AI Agent&lt;/strong&gt; node and don't require memory or external tools.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1nwqgr4ljxe28ioaagy4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1nwqgr4ljxe28ioaagy4.png" alt="AI Agent node"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The most important parameter is the &lt;strong&gt;system message&lt;/strong&gt; given to the LLM to expand the initial user prompt. The &lt;strong&gt;OpenAI Chat Model&lt;/strong&gt; node handles the credentials to connect to OpenAI and allows us to select the GPT 4.1 mini model.&lt;/p&gt;

&lt;p&gt;The LLM response is then sent to the final node, which is interconnected with ComfyUI.&lt;/p&gt;

&lt;h3&gt;
  
  
  ComfyUI Community Node
&lt;/h3&gt;

&lt;p&gt;Once installed, this community node is quite straightforward to use.&lt;/p&gt;

&lt;p&gt;First, we configure the credentials to connect to ComfyUI.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;API URL&lt;/strong&gt;: In this example, it's &lt;code&gt;http://127.0.0.1:8188&lt;/code&gt;, but it could also be a remote instance of ComfyUI.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;API Key&lt;/strong&gt;: This is used if you have configured one on the ComfyUI side.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F71s9f9ybqx03fuf6hro9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F71s9f9ybqx03fuf6hro9.png" alt="ComfyUI node"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Next, we specify the output format (&lt;strong&gt;PNG&lt;/strong&gt; or &lt;strong&gt;JPEG&lt;/strong&gt;) and the timeout for communication with ComfyUI. In the &lt;strong&gt;Workflow JSON&lt;/strong&gt; textarea, we copy the content of the workflow exported from ComfyUI (by using the "&lt;em&gt;File / Export (API)&lt;/em&gt;" menu).&lt;/p&gt;

&lt;p&gt;This means that n8n will send the workflow to be executed to the ComfyUI API in JSON format. We need to modify the ComfyUI workflow by using an expression containing the &lt;em&gt;$node["AI Agent"].data&lt;/em&gt; variable. Its value is dynamically set to the prompt provided by the previous node during n8n execution.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl5ebjyjsljgxmnzux3zz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl5ebjyjsljgxmnzux3zz.png" alt="Prompt insertion"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The exact location to inject the prompt depends on the JSON workflow exported from ComfyUI. Here, it's inside the &lt;strong&gt;"39.6"&lt;/strong&gt; node of type &lt;strong&gt;CLIP Text Encode&lt;/strong&gt;, but it might have a different name in your own workflows.&lt;/p&gt;

&lt;h3&gt;
  
  
  Execution
&lt;/h3&gt;

&lt;p&gt;We're all set! We check that ComfyUI is running and ready to launch the workflow from the n8n UI by entering a prompt in the chat box.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuwti8h4he4btgoug62kp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuwti8h4he4btgoug62kp.png" alt="User Chat"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Here's a short video of the workflow execution. n8n displays real-time progress, and the generated images can be visualized inside the ComfyUI node.&lt;/p&gt;

&lt;p&gt;  &lt;iframe src="https://www.youtube.com/embed/sBpbzYwr8Y4"&gt;
  &lt;/iframe&gt;
&lt;/p&gt;

&lt;p&gt;Here are two images generated by from this prompt: "&lt;em&gt;A dramatic, cinematic shot of an ancient library at night, where the books are alive and their pages flutter like birds, forming constellations in the air.&lt;/em&gt;"&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpxy3p49bdwwkbsrk3jui.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpxy3p49bdwwkbsrk3jui.png" alt="1st image generated"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm10o7wju67n4zhfq5gzg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm10o7wju67n4zhfq5gzg.png" alt="2nd image generated"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Of course, this 3-step workflow is very simple. The true power of coupling n8n and ComfyUI will become apparent with more complex use cases, leveraging n8n's extensive integration capabilities with many other components and solutions.&lt;/p&gt;

&lt;h2&gt;
  
  
  Image-to-Image Workflow
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Use Case
&lt;/h3&gt;

&lt;p&gt;Let's now create another workflow to transform an existing image based on user instructions. We'll intentionally keep this example super simple for clarity, but your use case might include a more complex workflow leveraging n8n's power. &lt;/p&gt;

&lt;p&gt;Here, we'll use only three nodes:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdv9t72y5kmgoxtvdphr9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdv9t72y5kmgoxtvdphr9.png" alt="I2I workflow"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;n8n Form / n8n Form trigger&lt;/strong&gt; node to start the workflow by displaying an HTML form for the user to upload the image to modify and specify what changes to apply.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;ComfyUI Image Transformer&lt;/strong&gt; community node to connect to a running ComfyUI instance. To install it, go to the "&lt;strong&gt;Settings / Community nodes&lt;/strong&gt;" menu and search for &lt;a href="https://www.npmjs.com/package/n8n-nodes-comfyui-image-to-image" rel="noopener noreferrer"&gt;&lt;strong&gt;n8n-nodes-comfyui-image-to-image&lt;/strong&gt;&lt;/a&gt;. The example workflow exported from ComfyUI uses the Kontext Edit model to modify an existing image.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;n8n Form / Form Ending&lt;/strong&gt; node to notify the user when the image is generated and offer it for download.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  ComfyUI Image Transformer Node
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp7dx2rdhqs96an3r5xyw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp7dx2rdhqs96an3r5xyw.png" alt="ComfyUI Image Transformer Node"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This node is quite similar to the &lt;strong&gt;n8n-nodes-comfyui&lt;/strong&gt; node we used before, with the insertion of the &lt;em&gt;$json.Promt&lt;/em&gt; expression into the exported ComfyUI JSON workflow to inject instructions from the user.&lt;/p&gt;

&lt;p&gt;The main difference concerns how the input image to be modified is handled:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Input Type&lt;/strong&gt; defines how the image is obtained from the previous form node; we'll choose &lt;strong&gt;Binary&lt;/strong&gt; instead of &lt;strong&gt;URL&lt;/strong&gt; or &lt;strong&gt;Base64&lt;/strong&gt; text.&lt;/li&gt;
&lt;li&gt;The property containing the binary file must be specified, which is the &lt;strong&gt;data&lt;/strong&gt; field here.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Image Node ID&lt;/strong&gt; is used to identify—within the exported ComfyUI JSON workflow — the node in charge of loading the input image (it must be of type &lt;strong&gt;LoadImage&lt;/strong&gt;).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We've added the last node to finalize the form management started with the first node, retrieve the modified image, return it in binary format, and offer the user the option to save it locally.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0am0bq9jsxflv1i5hz92.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0am0bq9jsxflv1i5hz92.png" alt="Form Ending"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Execution
&lt;/h3&gt;

&lt;p&gt;Let's execute the workflow. n8n displays a form for us to enter both the image and the associated instructions for its modification.&lt;/p&gt;

&lt;p&gt;Here is a short video of the workflow execution.&lt;/p&gt;

&lt;p&gt;  &lt;iframe src="https://www.youtube.com/embed/Os7Fp7jop7w"&gt;
  &lt;/iframe&gt;
&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Initial Image&lt;/strong&gt;:&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh1ih2bnwq5v1klxyyxb8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh1ih2bnwq5v1klxyyxb8.png" alt="Initial Image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Modified Image&lt;/strong&gt; with the prompt "&lt;em&gt;Make the scene at night with full moon and moonlight&lt;/em&gt;":&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi2icl18zbcsfe1n3i3t8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi2icl18zbcsfe1n3i3t8.png" alt="Modified Image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This second example workflow is so simple that we could do the exact same thing directly using the ComfyUI UI. It's here simply to illustrate how integration with n8n can be achieved. A more value-added workflow might, for instance, include a loop that allows the user to keep modifying the image outputs until they are satisfied.&lt;/p&gt;

&lt;p&gt;Also, note that the &lt;strong&gt;n8n-nodes-comfyui&lt;/strong&gt; package offers other custom nodes for integration into your workflows, such as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Dual Image Transformer&lt;/li&gt;
&lt;li&gt;Single Image to Video&lt;/li&gt;
&lt;li&gt;Dual Image Video Generator&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It's also worth noting that even though n8n offers Form nodes, it's primarily intended to be used in the backend through API calls. This feature, however, is limited to Enterprise licensees. &lt;/p&gt;




&lt;p&gt;With these two workflows, we've demonstrated how n8n can serve as a powerful orchestrator for ComfyUI. By leveraging its visual editor and extensive library of integrations, we transformed a simple user prompt into a rich, structured guide for image generation and created a seamless image-to-image transformation process.&lt;/p&gt;

&lt;p&gt;While our examples were simple to illustrate the concepts, the true value of n8n lies in its ability to connect ComfyUI with a vast ecosystem of tools, from databases and CRMs to messaging services and other AI models. This opens up new possibilities for building sophisticated, end-to-end applications that go far beyond what a standalone ComfyUI interface can offer.&lt;/p&gt;

&lt;p&gt;In the next article of this series, we'll explore another paradigm for connecting ComfyUI with agent-based solutions. We will delve into the &lt;strong&gt;Model Context Protocol (MCP)&lt;/strong&gt;, designed to streamline and standardize the way AI models communicate and share contextual information. This will offer a new, more efficient method for agents to interact with and control ComfyUI.&lt;/p&gt;

</description>
      <category>comfyui</category>
      <category>n8n</category>
      <category>genai</category>
      <category>agents</category>
    </item>
    <item>
      <title>WebSockets &amp; ComfyUI: Building Interactive AI Applications</title>
      <dc:creator>raphiki</dc:creator>
      <pubDate>Fri, 05 Sep 2025 09:17:07 +0000</pubDate>
      <link>https://forem.com/worldlinetech/websockets-comfyui-building-interactive-ai-applications-1j1g</link>
      <guid>https://forem.com/worldlinetech/websockets-comfyui-building-interactive-ai-applications-1j1g</guid>
      <description>&lt;p&gt;This is the second article of a series about how to integrate ComfyUI with other tools to build more complex workflows. We'll move beyond the familiar node-based interface to explore how to connect ComfyUI from code and no-code solutions, using API calls or MCP Servers.&lt;/p&gt;

&lt;p&gt;You'll learn &lt;strong&gt;how to use ComfyUI's API to build custom applications&lt;/strong&gt; and automate tasks, creating powerful and automated systems for generative AI.&lt;/p&gt;




&lt;p&gt;In the &lt;a href="https://dev.to/worldlinetech/unlocking-comfyuis-power-a-guide-to-the-http-api-in-jupyter-1mpi"&gt;previous article&lt;/a&gt; of the &lt;em&gt;Beyond the ComfyUI Canvas&lt;/em&gt; series, we demonstrated how to connect ComfyUI with Jupyter Notebook using basic HTTP API calls. While functional, this approach had a significant limitation: it relied on a time.sleep() function to wait for workflow completion, requiring manual adjustments based on the complexity of each workflow, a far from ideal solution.&lt;/p&gt;

&lt;p&gt;To overcome this inefficiency, we’ll &lt;strong&gt;leverage ComfyUI’s WebSocket API&lt;/strong&gt; (/ws endpoint), which enables real-time, bidirectional communication between Jupyter and ComfyUI. This upgrade unlocks a seamless experience by providing:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Instant execution progress updates to track workflow status,&lt;/li&gt;
&lt;li&gt;Live node execution feedback for monitoring each step,&lt;/li&gt;
&lt;li&gt;Immediate error messages and debugging insights for troubleshooting,&lt;/li&gt;
&lt;li&gt;Dynamic queue status updates to respond to changes on the fly.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;By adopting WebSockets, we eliminate guesswork and create a responsive, interactive workflow.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Use Case
&lt;/h2&gt;

&lt;p&gt;Let's simplify our previous use-case by dropping the OpenAI Assistant and focusing on how to eliminate manual polling or delays. The process is designed to be both intuitive and efficient:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Workflow Setup&lt;/strong&gt;: A pre-defined ComfyUI workflow (loaded from a JSON file) serves as the foundation for image generation.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Prompt Customization&lt;/strong&gt;: The user provides a text prompt which is dynamically inserted into the workflow.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Real-Time Execution&lt;/strong&gt;: Using ComfyUI’s WebSocket API, the notebook sends the workflow to the server and monitors its progress in real time—receiving live updates on execution status, node activity, and completion.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Result Retrieval&lt;/strong&gt;: Once generation finishes, the resulting images are automatically fetched and displayed directly in the notebook, creating a seamless end-to-end experience.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Let’s dive into the implementation.&lt;/p&gt;

&lt;h2&gt;
  
  
  Get prompt from user
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Please enter your prompt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;user_prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;input&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Please enter your prompt
A penguin in a tuxedo, DJing at a club for dancing jellyfish
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h2&gt;
  
  
  Trigger the Workflow from Jupyter Notebook
&lt;/h2&gt;

&lt;p&gt;Below, you’ll find a detailed breakdown of the code designed for use in a Jupyter Notebook, complete with helpful comments to guide you through each step and explain its functionality&lt;/p&gt;
&lt;h3&gt;
  
  
  Imports and main functions
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;websocket&lt;/span&gt;  &lt;span class="c1"&gt;# For WebSocket communication
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;uuid&lt;/span&gt;       &lt;span class="c1"&gt;# For generating unique client IDs
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;       &lt;span class="c1"&gt;# For JSON data handling
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;   &lt;span class="c1"&gt;# For HTTP requests (replaces urllib)
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;PIL&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Image&lt;/span&gt;  &lt;span class="c1"&gt;# For image processing
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;io&lt;/span&gt;         &lt;span class="c1"&gt;# For handling binary data streams
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;IPython.display&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;display&lt;/span&gt;  &lt;span class="c1"&gt;# For displaying images in Jupyter
&lt;/span&gt;
&lt;span class="c1"&gt;# Server configuration
&lt;/span&gt;&lt;span class="n"&gt;server_address&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;127.0.0.1:8188&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;  &lt;span class="c1"&gt;# Local server address and port
&lt;/span&gt;&lt;span class="n"&gt;client_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;uuid&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;uuid4&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;      &lt;span class="c1"&gt;# Unique client ID for this session
&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;queue_prompt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;prompt_id&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Send a prompt to the server for execution.

    Args:
        prompt (dict): The workflow/prompt to execute.
        prompt_id (str): Unique ID for tracking the prompt.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;prompt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;client_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;client_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;prompt_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;prompt_id&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http://&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;server_address&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/prompt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_image&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;filename&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;subfolder&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;folder_type&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Fetch an image from the server.

    Args:
        filename (str): Name of the image file.
        subfolder (str): Subfolder where the image is stored.
        folder_type (str): Type of folder (e.g., &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;output&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;).

    Returns:
        bytes: Binary image data.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;params&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;filename&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;filename&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;subfolder&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;subfolder&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;folder_type&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http://&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;server_address&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/view&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_history&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt_id&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Retrieve the execution history for a given prompt ID.

    Args:
        prompt_id (str): ID of the prompt whose history is requested.

    Returns:
        dict: History data for the prompt.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http://&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;server_address&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/history/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;prompt_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_images&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ws&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Execute a prompt and collect the resulting images.

    Args:
        ws (websocket.WebSocket): Active WebSocket connection.
        prompt (dict): The workflow/prompt to execute.

    Returns:
        dict: Dictionary of node IDs and their output images.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;prompt_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;uuid&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;uuid4&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
    &lt;span class="nf"&gt;queue_prompt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;prompt_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;output_images&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;

    &lt;span class="c1"&gt;# Listen for WebSocket messages until execution is complete
&lt;/span&gt;    &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;out&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ws&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;recv&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;isinstance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;out&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="n"&gt;message&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;out&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;executing&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;data&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
                &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;node&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;prompt_id&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;prompt_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                    &lt;span class="k"&gt;break&lt;/span&gt;  &lt;span class="c1"&gt;# Execution is done
&lt;/span&gt;        &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="c1"&gt;# Binary previews are ignored here
&lt;/span&gt;            &lt;span class="k"&gt;continue&lt;/span&gt;

    &lt;span class="c1"&gt;# Retrieve and organize output images
&lt;/span&gt;    &lt;span class="n"&gt;history&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get_history&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt_id&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="n"&gt;prompt_id&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;node_id&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;history&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;outputs&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
        &lt;span class="n"&gt;node_output&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;history&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;outputs&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="n"&gt;node_id&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="n"&gt;images_output&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;images&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;node_output&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;image&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;node_output&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;images&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
                &lt;span class="n"&gt;image_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get_image&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;filename&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;subfolder&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
                &lt;span class="n"&gt;images_output&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;image_data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;output_images&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;node_id&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;images_output&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;output_images&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Load the workflow and inject the user prompt
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;t2i-krea.json&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;workflow&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;load&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Update the prompt text in the workflow
&lt;/span&gt;&lt;span class="n"&gt;workflow&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;39:6&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;inputs&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;user_prompt&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Communication with ComfyUI through WebSockets
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Establish WebSocket connection
&lt;/span&gt;&lt;span class="n"&gt;ws&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;websocket&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;WebSocket&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;ws&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;connect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ws://&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;server_address&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/ws?clientId=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;client_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Execute the workflow and collect images
&lt;/span&gt;&lt;span class="n"&gt;images&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get_images&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ws&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;workflow&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;ws&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;close&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Display the output images in Jupyter
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;node_id&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;images&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;image_data&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;images&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;node_id&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
        &lt;span class="n"&gt;image&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Image&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;io&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;BytesIO&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;image_data&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="c1"&gt;# Display each image in the notebook
&lt;/span&gt;        &lt;span class="n"&gt;display&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;display&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcggv6lxyx0blcavqo285.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcggv6lxyx0blcavqo285.jpg" alt="1st Generated Image" width="800" height="577"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4z5yh0qq85an74w72q24.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4z5yh0qq85an74w72q24.png" alt="2nd Generated Image" width="800" height="577"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;This article demonstrated the power of using &lt;strong&gt;WebSockets&lt;/strong&gt; for real-time, bidirectional communication with ComfyUI. By moving beyond &lt;strong&gt;simple HTTP requests&lt;/strong&gt;, we eliminated the need for manual time delays and created a truly dynamic, responsive workflow. This allowed us to monitor the execution of our AI pipeline in real-time, ensuring a more reliable and efficient integration. The result is a seamless experience where we can send a prompt and watch as the generated images appear automatically in our notebook.&lt;/p&gt;

&lt;p&gt;Having now explored two different ways to integrate ComfyUI with Python code executed in Jupyter, we've laid a strong foundation for building custom, high-level generative AI applications. But what if you're not a developer, or you simply prefer a visual, no-code approach to orchestration? In the next article of the series, we'll shift our focus from code to a &lt;strong&gt;no-code solution like n8n&lt;/strong&gt; to show you how to build powerful ComfyUI workflows without writing a single line of code. &lt;/p&gt;

</description>
      <category>comfyui</category>
      <category>jupyter</category>
      <category>websockets</category>
      <category>genai</category>
    </item>
    <item>
      <title>Unlocking ComfyUI's Power: A Guide to the HTTP API in Jupyter</title>
      <dc:creator>raphiki</dc:creator>
      <pubDate>Thu, 04 Sep 2025 15:28:06 +0000</pubDate>
      <link>https://forem.com/worldlinetech/unlocking-comfyuis-power-a-guide-to-the-http-api-in-jupyter-1mpi</link>
      <guid>https://forem.com/worldlinetech/unlocking-comfyuis-power-a-guide-to-the-http-api-in-jupyter-1mpi</guid>
      <description>&lt;p&gt;This is the first article of a series about how to integrate ComfyUI with other tools to build more complex workflows. We'll move beyond the familiar node-based interface to explore how to connect ComfyUI from code and no-code solutions, using API calls or MCP Servers. &lt;/p&gt;

&lt;p&gt;You'll learn how to use ComfyUI's API to build custom applications and automate tasks, creating powerful and automated systems for generative AI.&lt;/p&gt;




&lt;p&gt;&lt;a href="https://github.com/comfyanonymous/ComfyUI" rel="noopener noreferrer"&gt;ComfyUI&lt;/a&gt; is a powerful, modular interface for generative models, allowing users to create complex AI image, video and sound generation workflows with a node-based editor. &lt;a href="https://jupyter.org/" rel="noopener noreferrer"&gt;Jupyter Notebook&lt;/a&gt;, on the other hand, is a popular interactive environment for data analysis, visualization, and prototyping.&lt;/p&gt;

&lt;p&gt;By integrating ComfyUI with Jupyter Notebook, you can leverage the flexibility of ComfyUI’s workflows directly within your Python scripts or data science pipelines. This first article focuses on a simple approach using Basic HTTP API calls.&lt;/p&gt;

&lt;p&gt;Most of this article is exported from an actual Jupyter Notebook. Both content, Python code and execution results are displayed.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Use Case
&lt;/h2&gt;

&lt;p&gt;Our goal is to build a high-level generative AI workflow that combines the power of an intelligent agent with the robust image generation capabilities of ComfyUI. The process unfolds in a few simple steps, all orchestrated within a Jupyter Notebook:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;User Input:&lt;/strong&gt; The workflow begins with a simple, high-level prompt entered directly into the notebook.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Agent-Powered Expansion:&lt;/strong&gt; An &lt;strong&gt;OpenAI Assistant&lt;/strong&gt; then takes this basic prompt and transforms it into a detailed, structured &lt;strong&gt;JSON Prompt Style Guide&lt;/strong&gt;. This process enriches the initial idea with specific creative instructions, such as style, composition, and lighting.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Initiating Generation:&lt;/strong&gt; This expanded JSON guide is automatically injected into a pre-defined ComfyUI workflow. A single API call to the &lt;strong&gt;ComfyUI server&lt;/strong&gt; starts the image generation process.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Displaying the Result:&lt;/strong&gt; Once the generation is complete, we make a second API call to fetch the resulting images. The images are then displayed directly within the Jupyter Notebook, completing our automated pipeline. &lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Prepare a ComfyUI Workflow
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Create or load a workflow in ComfyUI.&lt;/li&gt;
&lt;li&gt;Save the workflow as a .json file from the "&lt;em&gt;File / Export (API)&lt;/em&gt;" menu (e.g., &lt;code&gt;t2i-krea.json&lt;/code&gt;).&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Get initial prompt from user
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Please enter your prompt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;user_prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;input&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Please enter your prompt
Hanuman flying over a modern city at night
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h2&gt;
  
  
  Generate JSON Prompt Style Guide with an Assistant
&lt;/h2&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;dotenv&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;load_dotenv&lt;/span&gt;

&lt;span class="nf"&gt;load_dotenv&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;OPENAI_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="c1"&gt;# Create a thread
&lt;/span&gt;&lt;span class="n"&gt;thread&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;beta&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;threads&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# Send a message
&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;beta&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;threads&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;thread_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;thread&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;user_prompt&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Run the assistant
&lt;/span&gt;&lt;span class="n"&gt;run&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;beta&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;threads&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;runs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;thread_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;thread&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;assistant_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;asst_Uj0Qr0rG0bz8NVk1LWiS9UKv&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Wait for completion and retrieve the response
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;
&lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="n"&gt;run&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;completed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;run&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;beta&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;threads&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;runs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;retrieve&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;thread_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;thread&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;run_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;run&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Get the response
&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;beta&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;threads&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;thread_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;thread&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;json_prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;json_prompt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"style_name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Urban Deus Ex Hanuman"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"inspiration"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"Modern Urban Aesthetics"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"Hindu Mythology"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"Superhero Comics"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"Cyberpunk Lighting"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"scene"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Hanuman, the Hindu god, flying over a bustling modern city radiating bright lights under the cloak of night sky"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"subjects"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Hanuman"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"description"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Strong, muscular figure with a monkey face, holding a gada(mace)."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"position"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"midground"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"pose"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"flying with one hand extended"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"size"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"large"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"expression"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"determined"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"interaction"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"flying over the city"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"city"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"description"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"modern urban skyline with skyscrapers, neon billboards, and busy traffic"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"position"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"background"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"size"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"expansive"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"style"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"comic-realistic"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"color_palette"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"primary"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"#202020"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"secondary"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"#505050"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"highlight"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"#ff6a00"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"shadow"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"#0d0d0d"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"background_gradient"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"#0d0d0d"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="s2"&gt;"#303030"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"lighting"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Glistening city lights with diffused neon glow and soft moonlight"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"mood"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"powerful and captivating"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"background"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"scenery"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"details"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Modern urban cityscape with skyscrapers, roads, traffic and massive billboards with neon signs"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"composition"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Slightly off-center focus with Hanuman taking up prominent space"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"camera"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"angle"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"low angle"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"distance"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"medium shot"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"lens"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"wide-angle"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"focus"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"sharp subject, blurred background"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"medium"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Digital Painting"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"textures"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"smooth skin of Hanuman"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"rough concrete of buildings"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"glossy glass of skyscrapers"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"resolution"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"4K"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"details"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"clothing"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Hanuman is dressed in traditional golden and red garment"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"weather"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Night with clear sky and a soft moonlight"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"effects"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"Bokeh effect for city lights"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"Glow effect for neon lights"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"themes"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"Divinity"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"Strength"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"Modernization"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"Contrast"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"Juxtaposition of Tradition with Modernity"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"usage_notes"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"The style is effective in creating a surprising juxtaposition of traditional divinity with modern landscapes. Use this style for high impact illustrations where contrasts need to be highlighted."&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Trigger the Workflow from Jupyter Notebook
&lt;/h2&gt;

&lt;p&gt;Use the requests library to send a POST request to the ComfyUI API:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;

&lt;span class="c1"&gt;# ComfyUI server URL
&lt;/span&gt;&lt;span class="n"&gt;comfy_url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http://127.0.0.1:8188&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;prompt_url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;comfy_url&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/prompt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="c1"&gt;# Load your workflow JSON
&lt;/span&gt;&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;t2i-krea.json&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;workflow&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;load&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Replace the prompt
&lt;/span&gt;&lt;span class="n"&gt;workflow&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;39:6&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;inputs&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json_prompt&lt;/span&gt;

&lt;span class="c1"&gt;# Define the payload
&lt;/span&gt;&lt;span class="n"&gt;payload&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;prompt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;workflow&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;client_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;jupyter_notebook&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;# Send the request
&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt_url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Get the prompt_id
&lt;/span&gt;&lt;span class="n"&gt;prompt_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;prompt_id&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;c1a2ced4-772c-4aeb-ac45-bfa183d03a88
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h2&gt;
  
  
  Retrieve the generated images
&lt;/h2&gt;

&lt;p&gt;ComfyUI processes the workflow asynchronously. &lt;/p&gt;

&lt;p&gt;To fetch the result, poll the /history endpoint:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;IPython.display&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Image&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;display&lt;/span&gt;    

&lt;span class="c1"&gt;# Wait for the workflow to complete
&lt;/span&gt;&lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;25&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Adjust based on workflow complexity
&lt;/span&gt;
&lt;span class="c1"&gt;# Fetch the latest result for our prompt
&lt;/span&gt;&lt;span class="n"&gt;history_url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;comfy_url&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/history/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;prompt_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;history&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;history_url&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# Navigate to the list of image outputs and display them
&lt;/span&gt;&lt;span class="n"&gt;image_outputs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;history&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;prompt_id&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;outputs&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;9&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;images&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;image&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;image_outputs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;filename&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;filename&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;image_url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;comfy_url&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/view?filename=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;filename&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="nf"&gt;display&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Image&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;image_url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;width&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz4g9e4aq1u1ilozs2xj0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz4g9e4aq1u1ilozs2xj0.png" alt="First Generated Image" width="800" height="577"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj0l7hxignffemzrvzy6k.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj0l7hxignffemzrvzy6k.png" alt="Second Generated Image" width="800" height="577"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;In this article, we've seen how to leverage the power of ComfyUI directly from a Jupyter Notebook. By making simple API calls, we were able to transform a user's basic text prompt into a rich, detailed JSON guide using an OpenAI Assistant, and then feed that guide into a ComfyUI workflow to generate images. This approach demonstrates how you can move beyond the graphical interface to build automated, intelligent systems for creative tasks. The combination of Python's flexibility and ComfyUI's robust backend opens up a world of possibilities for custom, high-level generative AI workflows.&lt;/p&gt;

&lt;p&gt;In the &lt;a href="https://dev.to/worldlinetech/websockets-comfyui-building-interactive-ai-applications-1j1g"&gt;next article&lt;/a&gt;, we'll take our integration a step further by exploring how to use &lt;strong&gt;WebSockets&lt;/strong&gt; for Real-Time Interaction with ComfyUI.&lt;/p&gt;

</description>
      <category>comfyui</category>
      <category>api</category>
      <category>jupyter</category>
      <category>genai</category>
    </item>
    <item>
      <title>Enhancing QR Codes in the Age of GenAI</title>
      <dc:creator>raphiki</dc:creator>
      <pubDate>Fri, 23 May 2025 09:46:49 +0000</pubDate>
      <link>https://forem.com/worldlinetech/enhancing-qr-codes-in-the-age-of-genai-4fa6</link>
      <guid>https://forem.com/worldlinetech/enhancing-qr-codes-in-the-age-of-genai-4fa6</guid>
      <description>&lt;h2&gt;
  
  
  Traditional QR Codes
&lt;/h2&gt;

&lt;p&gt;Quick Response (QR) codes were developed in 1994 by Masahiro Hara and are now recognized as an ISO/IEC standard. They represent an evolution of 2D barcodes, capable of encoding numeric, alphanumeric, binary, or Kanji data in the form of a pattern of black squares on a white background. These codes are available in various sizes (or versions), ranging from version 1 (21 x 21 squares) to version 40 (177 x 177 squares).&lt;/p&gt;

&lt;p&gt;Numerous libraries and tools exist for generating QR codes. My preferred open-source library is &lt;a href="https://nayuki.io/page/qr-code-generator-library" rel="noopener noreferrer"&gt;QR Code Generator&lt;/a&gt;, which supports all standard features and is available in Java, TypeScript/JavaScript, Python, Rust, C++, and C. Additionally, my favorite all-in-one open-source tool is &lt;a href="https://qrcode.antfu.me" rel="noopener noreferrer"&gt;QR Toolkit&lt;/a&gt;, a Vue/Nuxt application offering marker and module customization, along with verification and comparison mechanisms, an invaluable resource when tweaking QR codes.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs0gic8761tmz58m94fum.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs0gic8761tmz58m94fum.png" alt="QR Toolkit"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;QR codes comprise several critical components to ensure readability by scanners, including three positional markers, alignment and timing patterns, and a masking system. While I will not delve into these details now, I will instead focus on the built-in error correction mechanism. This employs Reed-Solomon codes - also used in storage media (CD/DVD, RAID6) and network technologies (DSL, satellite) — by adding extra codewords to the QR grid for error correction. The standard defines four levels of error correction, each associated with a different tolerance percentage:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Level&lt;/th&gt;
&lt;th&gt;Approximate Error Tolerance&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Low&lt;/td&gt;
&lt;td&gt;~7%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;td&gt;~15%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Quartile&lt;/td&gt;
&lt;td&gt;~25%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;td&gt;~30%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;This means a QR code with High error correction can still be scanned if up to 30% of the image becomes unreadable. This feature is often utilized to embed images within QR codes: the embedded image is treated as errors during scanning.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmg2g8xovbspfxv5wha6y.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmg2g8xovbspfxv5wha6y.png" alt="Image embedded in QR Code"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For years, this technique has been used for personalizing QR codes. This article explores an innovative approach to customizing QR codes by leveraging Generative AI instead.&lt;/p&gt;

&lt;h2&gt;
  
  
  Harnessing Generative AI
&lt;/h2&gt;

&lt;p&gt;My proposal involves using a Stable Diffusion model integrated within the ComfyUI graphical interface to design and execute local generation workflows on a GPU-equipped PC. For detailed guidance on these components, refer to this &lt;a href="https://dev.to/worldlinetech/the-yoga-of-image-generation-part-1-1gan"&gt;article&lt;/a&gt; or this &lt;a href="https://www.youtube.com/watch?v=kXraePyAT-c" rel="noopener noreferrer"&gt;video&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;To modify and refine existing QR codes while maintaining their scannability, we will use a specialized ControlNet called &lt;a href="https://huggingface.co/monster-labs/control_v1p_sd15_qrcode_monster" rel="noopener noreferrer"&gt;QR Code Monster&lt;/a&gt;. ControlNets are auxiliary neural network models that inject targeted guidance into the generation process by focusing on specific features of an input image. Each ControlNet emphasizes particular aspects, such as structure (pose, edges, segmentation, depth), texture, content layout (bounding boxes, masks), or style (color maps, textures). In our scenario, we’ll focus on maintaining or modifying QR code contrast features.&lt;/p&gt;

&lt;p&gt;Let’s proceed to create a workflow in ComfyUI, employing Stable Diffusion 1.5, the QR Code Monster ControlNet, and a QR code generated via QR Toolkit.&lt;/p&gt;

&lt;p&gt;  &lt;iframe src="https://www.youtube.com/embed/uAvAZFG9sWY"&gt;
  &lt;/iframe&gt;
&lt;/p&gt;

&lt;p&gt;Adjusting parameters such as the ControlNet’s strength and start/end positions, along with the sampling process (e.g., 50 steps), I obtained a result that remains scannable and aligns with my input prompt: &lt;em&gt;“A beautiful landscape, blue sky, grass, flowers.”&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9dkcvkdk6ubustifznx0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9dkcvkdk6ubustifznx0.png" alt="A beautiful landscape, blue sky, grass, flowers"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This demonstrates how Stable Diffusion combined with ControlNet preserved the original pattern while injecting desired visual elements. Using QR Toolkit’s comparison feature, we can assess the QR code’s readability by examining the difference markers.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0u5qapa6debvb97b2shf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0u5qapa6debvb97b2shf.png" alt="QR Toolkit - Comparison"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Next, we can modify the prompt to produce multiple variants of our QR code. For example:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flt1fl294wsircszdkaur.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flt1fl294wsircszdkaur.png" alt="Different Prompts"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;While changing the overall style is straightforward (first example), embedding specific content within the QR code remains more challenging than with traditional tools (second example). To explore this further, we'll examine two axes separately: Style and Content, before combining them.&lt;/p&gt;

&lt;h2&gt;
  
  
  Customizing Style
&lt;/h2&gt;

&lt;p&gt;Enhancing the prompt allows for more precise control over the QR code’s aesthetic. For instance, leveraging a large language model (LLM) to generate detailed prompts:&lt;/p&gt;

&lt;p&gt;&lt;em&gt;“A pattern forged from molten lava, glowing with an intense fiery orange and red hue. Cracks in the surface reveal volcanic heat, with small embers rising around it.”&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzhm5r4e6g34iujqg1fqt.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzhm5r4e6g34iujqg1fqt.png" alt="A pattern forged from molten lava"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Similarly, for a more intricate and mystical style:&lt;/p&gt;

&lt;p&gt;&lt;em&gt;“An elegant, glowing elven door adorned with intricate, nature-inspired patterns and shimmering silver runes. Delicate vines and luminescent flowers intertwine with the carvings, pulsating with soft emerald and sapphire light. The archway, crafted from ethereal white stone, radiates a mystical aura, with faint golden mist swirling at its base, hinting at an ancient portal to a hidden realm.”&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3yau7ofwv4o3l7ndn48v.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3yau7ofwv4o3l7ndn48v.png" alt="An elegant, glowing elven door"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Predefined styles can also be injected into prompts using the &lt;a href="https://github.com/MohammadAboulEla/ComfyUI-iTools" rel="noopener noreferrer"&gt;iTools Prompt Styler Extra&lt;/a&gt; node in ComfyUI:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9tanzmn396uo87bmm10j.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9tanzmn396uo87bmm10j.png" alt="iTools"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This node offers reusable prompts categorized by various artistic styles: 3D, Art, Craft, Design, Drawing, Illustration, Painting, Sculpture, Vector, and more. Incorporating it into our workflow makes testing different styles effortless without altering other parameters.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fweute0t5tv1pv5cs9qwb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fweute0t5tv1pv5cs9qwb.png" alt="iTools Workflow"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Below are examples of QR codes generated with different styles:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhfx1jkmc3aace0acwq81.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhfx1jkmc3aace0acwq81.png" alt="iTools Examples"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Additionally, combining styles with custom prompts allows for highly personalized designs, enabling limitless customization of your QR codes’ appearance.&lt;/p&gt;

&lt;h2&gt;
  
  
  Embedding Content
&lt;/h2&gt;

&lt;p&gt;Having mastered style adjustments, the next step is to embed specific generated content into QR codes. For example, I wish to insert an image of a yoga pose. If you’ve read my previous articles on AI image generation, you’ll understand the transfer of poses through workflows. Details are available &lt;a href="https://dev.to/worldlinetech/the-yoga-of-image-generation-part-2-42c"&gt;here&lt;/a&gt; for further reference.&lt;/p&gt;

&lt;p&gt;We’ll start with an abstract image of the target pose, add Depth and Canny Edge ControlNets to our workflow, and specify in the prompt: &lt;em&gt;“man, mixed race, short curly hair, black hair, 40 years old, white T-shirt, black yoga pants, short sleeves, smiling, viewing glasses, white background, barefoot.”&lt;/em&gt; Essentially, I aim to generate an image resembling myself.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3gab0lxpdd4znj8n0tpl.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3gab0lxpdd4znj8n0tpl.png" alt="Pose Transfer Workflow"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;To ensure a realistic likeness, additional steps include incorporating the FaceID IP Adapter and the FaceDetailer post-processing model into the workflow. Refer to this &lt;a href="https://dev.to/worldlinetech/the-yoga-of-image-generation-part-3-5517"&gt;article&lt;/a&gt; for comprehensive guidance on implementing face transfer. The outcome preserves scannability and creates a QR code embedding the desired pose and identity:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fom9y6qblxw6npixsrl4i.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fom9y6qblxw6npixsrl4i.png" alt="Pose and Face Transfer"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Using QR Toolkit again, the comparison displays about 26 mismatch nodes, primarily around the facial features and body.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frg9a0ozszeet7h0huzee.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frg9a0ozszeet7h0huzee.png" alt="QR Toolkit Comparison"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Integrating Style and Content
&lt;/h2&gt;

&lt;p&gt;All previous steps can be combined by adding the iTools node to the final workflow:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3xi18981onenilcvxojs.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3xi18981onenilcvxojs.png" alt="Combined Example Outputs"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Making the QR Code Animate
&lt;/h2&gt;

&lt;p&gt;Given that I can embed a face into the QR code, I can also animate facial expressions using specialized nodes. The &lt;a href="https://github.com/PowerHouseMan/ComfyUI-AdvancedLivePortrait" rel="noopener noreferrer"&gt;Advanced Live Portrait&lt;/a&gt; tool is designed for editing, inserting, and animating facial expressions in images. By inputting our generated QR code, we can animate my face to produce a smiling expression or nodding motion.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fft384rduzqeufd8f0lnc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fft384rduzqeufd8f0lnc.png" alt="Advanced Live Portrait"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The resulting animation can be exported as an animated GIF or video:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftpes98vmdcx08j3w5uts.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftpes98vmdcx08j3w5uts.gif" alt="Animated GIF"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;This short tutorial has demonstrated how to significantly enhance both the stylistic and content-related aspects of a QR code. You are now equipped to craft engaging, customized QR codes that align with your personal or branding style. &lt;/p&gt;

&lt;p&gt;The only limits are your patience and imagination, so have fun experimenting!&lt;/p&gt;

&lt;p&gt;  &lt;iframe src="https://www.youtube.com/embed/STYLfK_xeEo"&gt;
  &lt;/iframe&gt;
&lt;/p&gt;

</description>
      <category>comfyui</category>
      <category>stablediffusion</category>
      <category>qrcode</category>
      <category>genai</category>
    </item>
    <item>
      <title>The Yoga of Image Generation – Part 3</title>
      <dc:creator>raphiki</dc:creator>
      <pubDate>Mon, 19 May 2025 14:16:11 +0000</pubDate>
      <link>https://forem.com/worldlinetech/the-yoga-of-image-generation-part-3-5517</link>
      <guid>https://forem.com/worldlinetech/the-yoga-of-image-generation-part-3-5517</guid>
      <description>&lt;p&gt;In the first two parts of this series, we explored Stable Diffusion, ComfyUI, and how to build Text-to-Image and Image-to-Image workflows to generate images of Yoga poses. With the help of ControlNets, we learned how to transfer a pose from an abstract reference image to our final generated image.&lt;/p&gt;

&lt;p&gt;A Yoga sequence consists of several connected poses, which means we need visual consistency across all generated images in the sequence. This consistency must first cover the &lt;em&gt;style&lt;/em&gt; which we addressed in the previous part of the series but also the &lt;em&gt;facial features&lt;/em&gt; of the person depicted.&lt;/p&gt;

&lt;h2&gt;
  
  
  LoRAs (Low-Rank Adapters)
&lt;/h2&gt;

&lt;p&gt;Let’s now introduce a new component into our workflow to tackle this challenge: Low-Rank Adapters (LoRAs). LoRAs make slight adaptations to the base model they are trained on by modifying only a small subset of neural network parameters. This is a highly efficient technique, as it enables faster training, smaller file sizes, and lower memory usage. You can think of a LoRA as a patch applied at runtime to the base model. Multiple LoRAs can be chained together.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9r13kwq1q4qn21y64fwg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9r13kwq1q4qn21y64fwg.png" alt="LoRA nodes in ComfyUI"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;LoRAs are typically used to specialize an existing model with certain image features such as style, poses, concepts, or characters. They are triggered in prompts using specific keywords defined by the LoRA creator during training. The community offers numerous LoRAs available for download from sites like civitai.com, which can be integrated into your local ComfyUI workflows.&lt;/p&gt;

&lt;p&gt;Here are two examples of images generated using a "Pencil drawing" LoRA, with two different keywords and all other parameters unchanged:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjg4x7on42hv6w2zk1lkl.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjg4x7on42hv6w2zk1lkl.png" alt="LoRA for Style"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The community also offers countless LoRAs for generating images resembling celebrities. Let’s try using some of these to achieve facial consistency. We’ll start by testing Celebrity LoRAs with very light pose transfer (ControlNet strength set to 10%) to see how closely the generated faces match.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuc8k8p6y5izmqphabpz7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuc8k8p6y5izmqphabpz7.png" alt="Testing a Celebrity LoRA"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Promising results! Note that the poses aren’t identical across images, this is due to the low ControlNet strength we used.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyrh55yd5mcbidrr6958g.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyrh55yd5mcbidrr6958g.png" alt="LoRA for Celebrities"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Next, let’s incorporate these LoRAs into our previous pose generation workflow. I stacked two LoRAs: one for facial identity and another for a graphite drawing style. I also kept the two ControlNets we introduced earlier for pose transfer.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxd425jemmxfdrqlq8j3n.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxd425jemmxfdrqlq8j3n.png" alt="Workflow with LoRAs and ControlNets"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;With this setup, we can generate sequences that are consistent in both style and facial identity.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo8g6tei2gi419rx4a80v.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo8g6tei2gi419rx4a80v.png" alt="Sequence with LoRAs and ControlNets"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Of course, we can change the celebrity reference or even chain multiple LoRAs together, adjusting their strengths to blend features of different identities. However, using public figures still feels a bit uncomfortable, potentially raising ethical concerns around deepfakes.&lt;/p&gt;

&lt;p&gt;A better approach is to create your own LoRA, avoiding such issues. So I decided to train a LoRA using images of my wife. I first experimented with the DreamBooth method, using a &lt;a href="https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/SDXL_DreamBooth_LoRA_.ipynb" rel="noopener noreferrer"&gt;Colab Notebook&lt;/a&gt; and Google GPUs. I trained the model on 28 images of her, using an SDXL base model, over 2 epochs, taking around 1.5 hours.&lt;/p&gt;

&lt;p&gt;The results were... promising 😉&lt;br&gt;
Here are some of the best images generated with my first custom LoRA:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F52ee92h7esq473587ncz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F52ee92h7esq473587ncz.png" alt="Using my very first LoRA (Dreambooth)"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The resemblance is there, but not quite enough, and the image quality was lacking. So I tried again, this time training the LoRA locally on my PC using the &lt;a href="https://github.com/bmaltais/kohya_ss" rel="noopener noreferrer"&gt;Kohya_ss&lt;/a&gt; open source tool. I selected the PowerPuffMix model (a fine-tuned of SDXL), trained on just 15 images but for 20 epochs. The process took about 3.5 hours and yielded better results.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2nbc304ijk0nnduhxpnm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2nbc304ijk0nnduhxpnm.png" alt="Using my second LoRA (Kohya_ss)"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This time, both image quality and facial identity were strong enough to integrate into our generation workflow.&lt;/p&gt;

&lt;p&gt;Here are some outputs using the new LoRA. While the face doesn’t perfectly resemble my wife (likely due to the influence of ControlNets) the identity consistency we needed is clearly present.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fls40mbheeq3l4pxbhwww.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fls40mbheeq3l4pxbhwww.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The lighting is still a bit unstable, and overall image quality remains imperfect. I could improve this by training on more images and increasing the number of epochs. However, the final LoRA is still fundamentally linked to the base model and can't be applied to another one.&lt;/p&gt;
&lt;h2&gt;
  
  
  Image Prompt Adapters (IP Adapters)
&lt;/h2&gt;

&lt;p&gt;Let’s now try another technique: Image Prompt Adaptation, which is more decoupled from the base model. It functions similarly to a ControlNet but alters the model directly. Think of an IP Adapter as a one-image LoRA.&lt;/p&gt;

&lt;p&gt;The FaceID IP Adapter, specialized in facial recognition and feature extraction, is a perfect fit for our needs.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffrafai3wpql1n74jnqw9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffrafai3wpql1n74jnqw9.png" alt="FaceDetailer"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;While exploring facial enhancement tools, I also discovered FaceDetailer, which improves facial features (eyes, nose, lips, expression) after image generation. I decided to integrate both of these components into our workflow. FaceDetailer’s enhancements are based on the FaceID input, so they remain faithful to the original facial reference.&lt;/p&gt;

&lt;p&gt;Here is the complete workflow:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxtwwnt7mbvvs0ycnm0no.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxtwwnt7mbvvs0ycnm0no.png" alt="Final workflow"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We now finally achieve our desired outcome:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Control over &lt;em&gt;style&lt;/em&gt; via prompts and embeddings&lt;/li&gt;
&lt;li&gt;Control over &lt;em&gt;pose&lt;/em&gt; via ControlNets&lt;/li&gt;
&lt;li&gt;Control over &lt;em&gt;identity&lt;/em&gt; via the FaceID IP Adapter and FaceDetailer&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This setup allows us to generate precise and coherent Yoga sequences.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fon7rt9aqx4fy76u9jzt2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fon7rt9aqx4fy76u9jzt2.png" alt="Sequence with FaceID, FaceDetailer and ControlNets"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Another advantage of this workflow is how easily we can switch the base model. For instance, here’s an example using the &lt;a href="https://civitai.com/models/198051/cheyenne" rel="noopener noreferrer"&gt;Cheyenne&lt;/a&gt; model, which specializes in cartoon and graphic novel styles:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fczdzm232i2ij47gr9ezm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fczdzm232i2ij47gr9ezm.png" alt="Changing the Base Model"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It’s also incredibly easy to change the subject’s identity. Since FaceID only requires a single image and no training phase, here are examples generated with the exact same workflow, using my own face as input for facial identity:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcbs82a97l89pvqkkvsjh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcbs82a97l89pvqkkvsjh.png" alt="Changing the Persona"&gt;&lt;/a&gt;&lt;/p&gt;



&lt;p&gt;This concludes our three-part series. My initial goal — generating accurate yoga poses and full sequences using only a local machine — has been achieved. &lt;/p&gt;

&lt;p&gt;In Part 1, we introduced Stable Diffusion and ComfyUI to build simple Text-to-Image workflows using prompts and embeddings. In Part 2, we explored pose transfer using Image-to-Image workflows and ControlNets. In this final installment, we addressed facial consistency, first with LoRAs, then with the FaceID IP Adapter and the post-processing FaceDetailer.&lt;/p&gt;

&lt;p&gt;You’re now ready to create custom workflows tailored to your specific visual goals. Enjoy experimenting with generative AI to express your creativity with precision!&lt;/p&gt;

&lt;p&gt;Stay tuned for more image generation tutorials and in the meantime, feel free to explore my YouTube channel for more content.&lt;/p&gt;

&lt;p&gt;  &lt;iframe src="https://www.youtube.com/embed/9QRz5cKQCUg"&gt;
  &lt;/iframe&gt;
&lt;/p&gt;

</description>
      <category>comfyui</category>
      <category>stablediffusion</category>
      <category>lora</category>
      <category>ipadapter</category>
    </item>
    <item>
      <title>🎨JSON Style Guides for Controlled Image Generation with GPT-4o and GPT-Image-1</title>
      <dc:creator>raphiki</dc:creator>
      <pubDate>Thu, 08 May 2025 20:33:19 +0000</pubDate>
      <link>https://forem.com/worldlinetech/json-style-guides-for-controlled-image-generation-with-gpt-4o-and-gpt-image-1-36p</link>
      <guid>https://forem.com/worldlinetech/json-style-guides-for-controlled-image-generation-with-gpt-4o-and-gpt-image-1-36p</guid>
      <description>&lt;p&gt;Image generation with GPT-4o and GPT-Image-1 can yield visually stunning results—but without clear instructions, results may vary. Using &lt;strong&gt;JSON style guides&lt;/strong&gt; is a powerful way to bring &lt;strong&gt;clarity, structure, and repeatability&lt;/strong&gt; to your prompts. This tutorial will walk you through why JSON style guides matter, how to use them effectively, and provide a complete reference to all parameters you can define.&lt;/p&gt;




&lt;h2&gt;
  
  
  🚀 Why Use a JSON Style Guide?
&lt;/h2&gt;

&lt;p&gt;Natural language is powerful but often ambiguous. By organizing your image prompts using JSON:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;✅ You &lt;strong&gt;eliminate ambiguity&lt;/strong&gt; with structured fields.&lt;/li&gt;
&lt;li&gt;✅ You &lt;strong&gt;ensure consistency&lt;/strong&gt; across multiple generations.&lt;/li&gt;
&lt;li&gt;✅ You can &lt;strong&gt;automate or scale&lt;/strong&gt; prompt creation for batch processing.&lt;/li&gt;
&lt;li&gt;✅ You &lt;strong&gt;separate content from style&lt;/strong&gt;, making iterations easier.&lt;/li&gt;
&lt;li&gt;✅ Developers and designers can work together using shared, machine-readable formats.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  🛠️ How to Use a JSON Style Guide
&lt;/h2&gt;

&lt;p&gt;A JSON prompt is simply a structured document specifying everything you want the model to include. Here’s a simple example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"scene"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"a magical forest clearing"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"subjects"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"fox"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"description"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"wearing a wizard hat, sitting on a tree stump"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"position"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"center"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"style"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"storybook illustration"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"color_palette"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"forest green"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"gold"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"midnight blue"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"lighting"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"soft dappled sunlight"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"mood"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"whimsical and cozy"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"background"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"glowing mushrooms and tall trees"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"composition"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"eye-level view, centered subject"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This structure gives the model explicit, interpretable instructions for what to render and how.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F397u1qvhz9oktlwljiau.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F397u1qvhz9oktlwljiau.png" alt="Fox in Magical forest" width="800" height="1200"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  📚 Parameter Reference
&lt;/h2&gt;

&lt;p&gt;Here’s a breakdown of possible fields you can use in a JSON style guide.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. &lt;code&gt;scene&lt;/code&gt;
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;A short overview of the entire setting or environment.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;ul&gt;
&lt;li&gt;Example: &lt;code&gt;"a futuristic city at sunset"&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2. &lt;code&gt;subjects&lt;/code&gt; &lt;em&gt;(array of objects)&lt;/em&gt;
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;Describes each key subject in the image. Each subject can include:&lt;br&gt;
&lt;/p&gt;
&lt;/blockquote&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"robot"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"description"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"silver body with glowing blue eyes"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"position"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"foreground"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"pose"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"standing upright"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"size"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"large"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"expression"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"neutral"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"interaction"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"looking at a floating screen"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3. &lt;code&gt;style&lt;/code&gt;
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;The artistic or visual rendering style.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;ul&gt;
&lt;li&gt;Examples: &lt;code&gt;"photorealistic"&lt;/code&gt;, &lt;code&gt;"watercolor"&lt;/code&gt;, &lt;code&gt;"pixel art"&lt;/code&gt;, &lt;code&gt;"cyberpunk"&lt;/code&gt;, &lt;code&gt;"anime"&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  4. &lt;code&gt;color_palette&lt;/code&gt;
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;An array of dominant and accent colors.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;ul&gt;
&lt;li&gt;Example: &lt;code&gt;["emerald green", "burnt orange", "charcoal"]&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  5. &lt;code&gt;lighting&lt;/code&gt;
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;How the image is lit.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;ul&gt;
&lt;li&gt;Examples: &lt;code&gt;"sunset backlight"&lt;/code&gt;, &lt;code&gt;"soft studio lighting"&lt;/code&gt;, &lt;code&gt;"glow from below"&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  6. &lt;code&gt;mood&lt;/code&gt;
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;The emotional tone or atmosphere.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;ul&gt;
&lt;li&gt;Examples: &lt;code&gt;"peaceful"&lt;/code&gt;, &lt;code&gt;"dramatic"&lt;/code&gt;, &lt;code&gt;"eerie"&lt;/code&gt;, &lt;code&gt;"playful"&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  7. &lt;code&gt;background&lt;/code&gt;
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;The scenery or backdrop.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;ul&gt;
&lt;li&gt;Examples: &lt;code&gt;"mountain landscape"&lt;/code&gt;, &lt;code&gt;"white cyclorama"&lt;/code&gt;, &lt;code&gt;"dreamy nebula sky"&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  8. &lt;code&gt;composition&lt;/code&gt;
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;Overall layout and positioning.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;ul&gt;
&lt;li&gt;Examples: &lt;code&gt;"symmetrical"&lt;/code&gt;, &lt;code&gt;"rule of thirds"&lt;/code&gt;, &lt;code&gt;"top-down shot"&lt;/code&gt;, &lt;code&gt;"portrait orientation"&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  9. &lt;code&gt;camera&lt;/code&gt;
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;Virtual photography settings.&lt;br&gt;
&lt;/p&gt;
&lt;/blockquote&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"angle"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"eye-level"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"distance"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"medium shot"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"lens"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"wide-angle"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"focus"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"sharp subject, blurred background"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  10. &lt;code&gt;medium&lt;/code&gt;
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;Simulated medium or format.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;ul&gt;
&lt;li&gt;Examples: &lt;code&gt;"oil painting"&lt;/code&gt;, &lt;code&gt;"3D render"&lt;/code&gt;, &lt;code&gt;"ink drawing"&lt;/code&gt;, &lt;code&gt;"chalkboard sketch"&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  11. &lt;code&gt;textures&lt;/code&gt;
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;Surface qualities and tactile impressions.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;ul&gt;
&lt;li&gt;Examples: &lt;code&gt;"soft velvet"&lt;/code&gt;, &lt;code&gt;"rusty metal"&lt;/code&gt;, &lt;code&gt;"wet pavement"&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  12. &lt;code&gt;resolution&lt;/code&gt;
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;Intended resolution or output size.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;ul&gt;
&lt;li&gt;Examples: &lt;code&gt;"4K"&lt;/code&gt;, &lt;code&gt;"web banner"&lt;/code&gt;, &lt;code&gt;"Instagram square"&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  13. &lt;code&gt;details&lt;/code&gt;
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;Extra fine-tuned attributes.&lt;br&gt;
&lt;/p&gt;
&lt;/blockquote&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"clothing"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"flowing red cape"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"weather"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"light snowfall"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"facial_features"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"freckles and sharp jawline"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"material"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"glass and brass"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"ornaments"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"glasses, ring"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  14. &lt;code&gt;effects&lt;/code&gt;
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;Special effects or visual treatments.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;ul&gt;
&lt;li&gt;Examples: &lt;code&gt;"lens flare"&lt;/code&gt;, &lt;code&gt;"bokeh blur"&lt;/code&gt;, &lt;code&gt;"double exposure"&lt;/code&gt;, &lt;code&gt;"film grain"&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  15. &lt;code&gt;inspirations&lt;/code&gt;
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;Known references to guide visual style.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;ul&gt;
&lt;li&gt;Examples: &lt;code&gt;"inspired by Studio Ghibli"&lt;/code&gt;, &lt;code&gt;"in the style of Van Gogh"&lt;/code&gt;, &lt;code&gt;"similar to Blade Runner"&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  🧪 Example Use Cases
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Fantasy Character Concept Art
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"scene"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"mountaintop at sunrise"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"subjects"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"warrior elf"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"description"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"leather armor, long silver hair"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"pose"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"standing with sword raised"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"position"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"foreground"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"style"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"digital painting"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"color_palette"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"misty gray"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"light gold"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"teal"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"lighting"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"sunrise backlight"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"mood"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"heroic and calm"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"background"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"foggy mountains"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"composition"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"rule of thirds"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"camera"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"angle"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"low angle"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"distance"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"medium shot"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"focus"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"sharp on character"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbqkhb6curyugo1k7f0hg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbqkhb6curyugo1k7f0hg.png" alt="Fantasy Character" width="800" height="1200"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Product Mockup
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"scene"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"minimalist white studio"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"subjects"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"smartwatch"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"description"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"silver frame with red strap"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"position"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"center"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"pose"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"lying flat"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"style"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"photorealistic"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"lighting"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"diffused light from above"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"mood"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"clean and sleek"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"background"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"white gradient"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"composition"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"centered product with top view"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"resolution"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"4K"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fozoa22j5kdvl1vetlrby.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fozoa22j5kdvl1vetlrby.png" alt="Smartwatch" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Realistic Scene with two Characters
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"scene"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"urban café terrace in Paris during golden hour"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"subjects"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"young woman"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"description"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"30s, Black hair in a bun, wearing a white blouse and tan trench coat, holding a coffee cup"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"pose"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"sitting at a café table, leaning forward slightly"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"position"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"left foreground"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"expression"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"engaged, smiling softly"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"young man"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"description"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"30s, light brown curly hair, wearing a navy blue jacket and scarf, gesturing with one hand"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"pose"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"sitting across from the woman, mid-conversation"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"position"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"right foreground"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"expression"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"animated, talking"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"style"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"hyper-realistic photography"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"lighting"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"natural golden hour light with soft shadows and sun flare"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"mood"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"warm and intimate"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"background"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"elements"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"street with bicycles"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"café signage"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"distant pedestrians"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"depth_of_field"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"shallow, blurred background"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"composition"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"framed using the rule of thirds, both characters centered with table between them"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"camera"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"angle"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"eye level"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"distance"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"medium close-up"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"focus"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"sharp on characters' faces"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"color_palette"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"warm gold"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"beige"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"navy"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"soft rose"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"espresso brown"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"props"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"ceramic coffee cups"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"croissants on a small plate"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"notebook and pen on table"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"resolution"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"4K"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fizn9jplwlqa4vyzt62qa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fizn9jplwlqa4vyzt62qa.png" alt="Coffee Break in Paris" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;Using JSON style guides gives you a consistent, modular, and precise way to control image generation. Whether you're creating a portfolio of characters, designing branded assets, or prototyping environments, structured prompts give you the power to &lt;strong&gt;communicate with clarity&lt;/strong&gt; and &lt;strong&gt;scale with confidence&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;And don’t hesitate to use ChatGPT to refine or co-create your JSON Style Guides! It can turn vague ideas into structured, generation-ready prompts in seconds.&lt;/p&gt;

</description>
      <category>genai</category>
      <category>image</category>
      <category>json</category>
      <category>gpt</category>
    </item>
    <item>
      <title>The Yoga of Image Generation – Part 2</title>
      <dc:creator>raphiki</dc:creator>
      <pubDate>Fri, 02 May 2025 07:57:39 +0000</pubDate>
      <link>https://forem.com/worldlinetech/the-yoga-of-image-generation-part-2-42c</link>
      <guid>https://forem.com/worldlinetech/the-yoga-of-image-generation-part-2-42c</guid>
      <description>&lt;p&gt;In the first part of this series on image generation, we explored how to set up a simple Text-to-Image workflow using Stable Diffusion and ComfyUI, running it locally. We also introduced embeddings to enhance prompts and adjust image styles. However, we found that embeddings alone were not sufficient for our specific use case: generating accurate yoga poses.&lt;/p&gt;

&lt;h2&gt;
  
  
  Simple Image-to-Image Workflow
&lt;/h2&gt;

&lt;p&gt;Let’s now take it a step further and provide an image alongside the text prompts to serve as a base for the generation process.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzr8ct3lbqiedrryol1w4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzr8ct3lbqiedrryol1w4.png" alt="Input Image"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Here is the workflow in ComfyUI:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2cz2wl7xe2cjvv6l2lwc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2cz2wl7xe2cjvv6l2lwc.png" alt="Simple Image-to-Image Workflow"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I use an image of myself and combine it with a prompt specifying that the output image should depict a woman wearing blue yoga pants instead.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx5f1sb4apql6y7esiw7i.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx5f1sb4apql6y7esiw7i.png" alt="I2I Prompt"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This image is converted into the latent space and used in the generation process instead of starting from a fully noisy latent image. I apply only 55% denoising.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbh5kdmin4144ofq2fqh6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbh5kdmin4144ofq2fqh6.png" alt="Denoising 55%"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We can see that the output image resembles the input image. The pose is identical, the subject is now a woman, but the surroundings are similar, and she is not wearing blue pants.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1tqdi85qqxs4hh4l2x9z.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1tqdi85qqxs4hh4l2x9z.png" alt="Output Image with 55% denoising"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Of course, I can tweak the prompt and change the generation seed. I can also adjust the denoising percentage. Here is the result with a 70% value:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6m6o8dix4qxusxoq7fri.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6m6o8dix4qxusxoq7fri.png" alt="Output Image with 70% denoising"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The image quality is better, and the pants are more blue, but the pose has changed slightly: her head is tilted down, and her left hand is not in the same position. There’s a trade-off between pose accuracy and the creative freedom given to the model.&lt;/p&gt;

&lt;h2&gt;
  
  
  ControlNets
&lt;/h2&gt;

&lt;p&gt;Rather than injecting the entire input image into the generation process, it’s more efficient to transfer only specific characteristics. That’s where Control Networks (or ControlNets) come in.&lt;/p&gt;

&lt;p&gt;ControlNets are additional neural networks that extract features from an image and inject them directly into the latent space and the generation process.&lt;/p&gt;

&lt;p&gt;Control methods specialize in detecting different types of image features, such as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Structural&lt;/strong&gt;: pose, edge detection, segmentation, depth
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Texture &amp;amp; Detail&lt;/strong&gt;: scribbles/sketches, stylization from edges
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Content &amp;amp; Layout&lt;/strong&gt;: bounding boxes, inpainting masks
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Abstract &amp;amp; Style&lt;/strong&gt;: color maps, textural fields
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Most ControlNets work with preprocessors that extract specific features from input images. Here are some examples:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq3o95amkymmcw8lie9kb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq3o95amkymmcw8lie9kb.png" alt="Preprocessors"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Here’s our workflow updated to include a Depth ControlNet:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F538sow1k5nh0p29i3gay.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F538sow1k5nh0p29i3gay.png" alt="ControlNet Workflow"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I’ve reverted to an empty latent image so we can focus only on the depth features detected by the preprocessor and injected into the latent space by the ControlNet.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdeejcmpkbaqgtu72xwfo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdeejcmpkbaqgtu72xwfo.png" alt="ControlNet nodes in ComfyUI"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The main parameters to tune are the strength of the ControlNet (here, 50%) and when it is applied during the generation (here, throughout the entire process). By tweaking these settings, you can adjust how much the ControlNet influences the final image and, once again, find the best balance between control and creativity.&lt;/p&gt;

&lt;p&gt;I can still apply an embedding to achieve a specific style—for example, a comic style:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu4n2j5zt5j5hg0av1m2b.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu4n2j5zt5j5hg0av1m2b.png" alt="Output with ControlNet &amp;amp; Embedding"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;There is even an OpenPose ControlNet, specifically trained to detect and apply human poses, but unfortunately, it is not accurate enough for yoga poses.&lt;/p&gt;

&lt;h2&gt;
  
  
  Advanced Image-to-Image Workflow
&lt;/h2&gt;

&lt;p&gt;Now that we’re extracting only certain features, we can use more abstract images as inputs—focusing on the pose and letting Stable Diffusion handle the rest.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb4w1hemtlg1kcjxienw3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb4w1hemtlg1kcjxienw3.png" alt="Input Image for the pose"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;After multiple tests, I decided to combine two ControlNets: one for Edge Detection (Canny Edge, 40% strength) and one for Depth (30% strength).&lt;/p&gt;

&lt;p&gt;Here’s the resulting workflow:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuk6e03cd1pxx6gn5px52.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuk6e03cd1pxx6gn5px52.png" alt="Advanced Image-to-Image Workflow"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Watch this video to see the process in action with two fine-tuned SDXL models:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://civitai.com/models/133005/juggernaut-xl" rel="noopener noreferrer"&gt;Juggernaut XL&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://civitai.com/models/198051/cheyenne" rel="noopener noreferrer"&gt;Cheyenne&lt;/a&gt;, specialized in comic and graphic novel styles&lt;/li&gt;
&lt;/ul&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8z4e91fuhdt4xzcqjdrh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8z4e91fuhdt4xzcqjdrh.png" alt="Output Image by Cheyenne"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Neat! I can now control the pose using ControlNets and influence the rest of the image with prompts and embeddings.&lt;/p&gt;

&lt;p&gt;I just need to change the input image in the workflow to generate an entire series. Here are a few examples using image-compare mode:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foxfsloqpdv22gek5hnpd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foxfsloqpdv22gek5hnpd.png" alt="Output for Tree pose"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1ydxcfud6mhmsm6osvla.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1ydxcfud6mhmsm6osvla.png" alt="Output for Chair pose"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This is super convenient since my use case involves generating sequences—or even full yoga classes. But how can I ensure that the woman in each pose remains the same? How do I maintain visual identity and consistency across the sequence of images?&lt;/p&gt;

&lt;p&gt;We’ll cover that in the final part of this series. So stay tuned—and check out my YouTube tutorials as well.&lt;/p&gt;

&lt;p&gt;  &lt;iframe src="https://www.youtube.com/embed/9QRz5cKQCUg"&gt;
  &lt;/iframe&gt;
&lt;/p&gt;

</description>
      <category>comfyui</category>
      <category>stablediffusion</category>
      <category>imagetoimage</category>
      <category>controlnet</category>
    </item>
  </channel>
</rss>
