<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Sienna</title>
    <description>The latest articles on Forem by Sienna (@sienna).</description>
    <link>https://forem.com/sienna</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3028016%2F234bba86-87e4-4652-97bd-b9584b8eca56.jpg</url>
      <title>Forem: Sienna</title>
      <link>https://forem.com/sienna</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/sienna"/>
    <language>en</language>
    <item>
      <title>ValRequest - Turn Feelings Into Words</title>
      <dc:creator>Sienna</dc:creator>
      <pubDate>Fri, 06 Feb 2026 02:44:38 +0000</pubDate>
      <link>https://forem.com/sienna/valrequest-turn-feelings-into-words-2mnn</link>
      <guid>https://forem.com/sienna/valrequest-turn-feelings-into-words-2mnn</guid>
      <description>&lt;p&gt;&lt;strong&gt;Love is a feeling; expressing it is an art.&lt;/strong&gt; Use &lt;a href="https://valrequest.net/" rel="noopener noreferrer"&gt;ValRequest&lt;/a&gt; to craft personalized, heartfelt messages that capture your unique story. Make every word count.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is ValRequest?
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://valrequest.net/" rel="noopener noreferrer"&gt;ValRequest&lt;/a&gt; is an AI-powered tool that creates short, personalized romantic messages. You choose the recipient, style, and keywords; &lt;a href="https://valrequest.net/" rel="noopener noreferrer"&gt;ValRequest&lt;/a&gt; gives you three unique options in seconds.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to Use ValRequest
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://valrequest.net/" rel="noopener noreferrer"&gt;ValRequest&lt;/a&gt; turns your feelings into words in three simple steps:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Choose Recipient &amp;amp; Style
&lt;/h3&gt;

&lt;p&gt;Select who the message is for (partner, crush, or friend) and pick a tone:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;💕 &lt;strong&gt;Heartfelt&lt;/strong&gt; - Sincere and emotional&lt;/li&gt;
&lt;li&gt;😄 &lt;strong&gt;Humorous&lt;/strong&gt; - Light and funny
&lt;/li&gt;
&lt;li&gt;🎭 &lt;strong&gt;Shakespeare&lt;/strong&gt; - Poetic and classical&lt;/li&gt;
&lt;li&gt;🥰 &lt;strong&gt;Cute&lt;/strong&gt; - Sweet and playful&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2. Add Your Keywords
&lt;/h3&gt;

&lt;p&gt;Type a few words that describe your relationship or what you want to say—&lt;a href="https://valrequest.net/" rel="noopener noreferrer"&gt;ValRequest&lt;/a&gt; uses them to personalize your greeting.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Generate &amp;amp; Copy
&lt;/h3&gt;

&lt;p&gt;Click Generate. &lt;a href="https://valrequest.net/" rel="noopener noreferrer"&gt;ValRequest&lt;/a&gt; will create three options. Copy your favorite or save it as an image.&lt;/p&gt;

&lt;h2&gt;
  
  
  Message Styles &amp;amp; Examples
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://valrequest.net/" rel="noopener noreferrer"&gt;ValRequest&lt;/a&gt; offers various romantic styles:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Classic Romance (Pride &amp;amp; Prejudice style)&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"In a world of fleeting moments, you are my forever. My heart knew you before my eyes ever did."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Epic Love (The Notebook style)&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"I would cross every ocean, climb every mountain, just to see you smile. You are worth every journey."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Poetic Soul (Shakespeare style)&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"You are the poem I never knew how to write, the song my heart always wanted to sing."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Sweet &amp;amp; Playful (Rom-com style)&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"You're my favorite notification, my best plot twist, the reason I smile at my phone like an idiot."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Who Is ValRequest For?
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://valrequest.net/" rel="noopener noreferrer"&gt;ValRequest&lt;/a&gt; is perfect for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Romantics&lt;/strong&gt; - Anyone who wants to say "I love you" in a way that feels uniquely theirs&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Last-minute senders&lt;/strong&gt; - Need a sincere message fast? &lt;a href="https://valrequest.net/" rel="noopener noreferrer"&gt;ValRequest&lt;/a&gt; gives you three options in seconds&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Friends &amp;amp; crushes&lt;/strong&gt; - Perfect for Valentine's notes to friends or that special someone you're still getting to know&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Pricing &amp;amp; Credits
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://valrequest.net/" rel="noopener noreferrer"&gt;ValRequest&lt;/a&gt; uses a credit system:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Earn credits by signing up&lt;/li&gt;
&lt;li&gt;Generating messages uses credits per request&lt;/li&gt;
&lt;li&gt;Additional credits available for purchase&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Privacy &amp;amp; Security
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://valrequest.net/" rel="noopener noreferrer"&gt;ValRequest&lt;/a&gt; protects your privacy by:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Only using your inputs (recipient type, style, keywords) to generate messages&lt;/li&gt;
&lt;li&gt;Not storing or sharing your generated text for advertising&lt;/li&gt;
&lt;li&gt;Using secure API technology for message generation&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;strong&gt;Try &lt;a href="https://valrequest.net/" rel="noopener noreferrer"&gt;ValRequest&lt;/a&gt; now&lt;/strong&gt; and turn your feelings into words with AI-powered personalized Valentine's messages that sound uniquely like you.&lt;/p&gt;

</description>
      <category>ai</category>
    </item>
    <item>
      <title>Qwen3-Coder-Next: The Complete 2026 Guide to Running Powerful AI Coding Agents Locally</title>
      <dc:creator>Sienna</dc:creator>
      <pubDate>Wed, 04 Feb 2026 12:35:35 +0000</pubDate>
      <link>https://forem.com/sienna/qwen3-coder-next-the-complete-2026-guide-to-running-powerful-ai-coding-agents-locally-1k95</link>
      <guid>https://forem.com/sienna/qwen3-coder-next-the-complete-2026-guide-to-running-powerful-ai-coding-agents-locally-1k95</guid>
      <description>&lt;h2&gt;
  
  
  🎯 Core Highlights (TL;DR)
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Revolutionary Efficiency&lt;/strong&gt;: Qwen3-Coder-Next achieves Sonnet 4.5-level coding performance with only 3B activated parameters (80B total with MoE architecture)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Local-First Design&lt;/strong&gt;: Runs on consumer hardware (64GB MacBook, RTX 5090, or AMD Radeon 7900 XTX) with 256K context length&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Open Weights&lt;/strong&gt;: Fully open-source model designed specifically for coding agents and local development&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Real-World Performance&lt;/strong&gt;: Scores 44.3% on SWE-Bench Pro, competing with models 10-20x larger in active parameters&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cost Effective&lt;/strong&gt;: Eliminates expensive API costs while maintaining competitive coding capabilities&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Table of Contents
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;What is Qwen3-Coder-Next?&lt;/li&gt;
&lt;li&gt;Key Features and Architecture&lt;/li&gt;
&lt;li&gt;Performance Benchmarks&lt;/li&gt;
&lt;li&gt;Hardware Requirements and Setup&lt;/li&gt;
&lt;li&gt;How to Install and Run Qwen3-Coder-Next&lt;/li&gt;
&lt;li&gt;Integration with Coding Tools&lt;/li&gt;
&lt;li&gt;Quantization Options Explained&lt;/li&gt;
&lt;li&gt;Real-World Use Cases and Performance&lt;/li&gt;
&lt;li&gt;Comparison: Qwen3-Coder-Next vs Claude vs GPT&lt;/li&gt;
&lt;li&gt;Common Issues and Solutions&lt;/li&gt;
&lt;li&gt;FAQ&lt;/li&gt;
&lt;li&gt;Conclusion and Next Steps&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  What is Qwen3-Coder-Next?
&lt;/h2&gt;

&lt;p&gt;Qwen3-Coder-Next is an open-weight language model released by Alibaba's Qwen team in February 2026, specifically designed for &lt;strong&gt;coding agents&lt;/strong&gt; and &lt;strong&gt;local development environments&lt;/strong&gt;. Unlike traditional large language models that require massive computational resources, Qwen3-Coder-Next uses a sophisticated Mixture-of-Experts (MoE) architecture that activates only 3 billion parameters at a time while maintaining a total parameter count of 80 billion.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why It Matters
&lt;/h3&gt;

&lt;p&gt;The model represents a significant breakthrough in making powerful AI coding assistants accessible to individual developers without relying on expensive cloud APIs or subscriptions. With the recent controversies around Anthropic's Claude Code restrictions and OpenAI's pricing models, Qwen3-Coder-Next offers a compelling alternative for developers who want:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Data Privacy&lt;/strong&gt;: Your code never leaves your machine&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cost Control&lt;/strong&gt;: No per-token pricing or monthly subscription limits&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tool Freedom&lt;/strong&gt;: Use any coding agent or IDE integration you prefer&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Offline Capability&lt;/strong&gt;: Work without internet connectivity&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;💡 &lt;strong&gt;Key Innovation&lt;/strong&gt;&lt;br&gt;
The model achieves performance comparable to Claude Sonnet 4.5 on coding benchmarks while using only 3B activated parameters, making it feasible to run on high-end consumer hardware.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Key Features and Architecture
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Technical Specifications
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Specification&lt;/th&gt;
&lt;th&gt;Details&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Total Parameters&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;80B&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Activated Parameters&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;3B (per inference)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Context Length&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;256K tokens (native support)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Architecture&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Hybrid: Gated DeltaNet + MoE + Gated Attention&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Number of Experts&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;512 total, 10 activated per token&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Training Method&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Large-scale executable task synthesis + RL&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Model Type&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Causal Language Model&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;License&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Open weights&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Architecture Breakdown
&lt;/h3&gt;

&lt;p&gt;The model uses a unique &lt;strong&gt;hybrid attention mechanism&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;12 × [3 × (Gated DeltaNet → MoE) → 1 × (Gated Attention → MoE)]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;What makes this special:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Gated DeltaNet&lt;/strong&gt;: Efficient linear attention for long-range dependencies&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Mixture of Experts (MoE)&lt;/strong&gt;: Only activates 10 out of 512 experts per token, dramatically reducing computational cost&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Gated Attention&lt;/strong&gt;: Traditional attention mechanism for critical reasoning tasks&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Shared Experts&lt;/strong&gt;: 1 expert always active for core capabilities&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;⚠️ &lt;strong&gt;Important Note&lt;/strong&gt;&lt;br&gt;
This model does NOT support thinking mode (&lt;code&gt;&amp;lt;think&amp;gt;&amp;lt;/think&amp;gt;&lt;/code&gt; blocks). It generates responses directly without visible reasoning steps.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Training Methodology
&lt;/h3&gt;

&lt;p&gt;Qwen3-Coder-Next was trained using:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Executable Task Synthesis&lt;/strong&gt;: Large-scale generation of verifiable programming tasks&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Environment Interaction&lt;/strong&gt;: Direct learning from execution feedback&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reinforcement Learning&lt;/strong&gt;: Optimization based on task success rates&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Agent-Specific Training&lt;/strong&gt;: Focused on long-horizon reasoning and tool usage&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Performance Benchmarks
&lt;/h2&gt;

&lt;h3&gt;
  
  
  SWE-Bench Results
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;SWE-Bench Verified&lt;/th&gt;
&lt;th&gt;SWE-Bench Pro&lt;/th&gt;
&lt;th&gt;Avg Agent Turns&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Qwen3-Coder-Next&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;42.8%&lt;/td&gt;
&lt;td&gt;44.3%&lt;/td&gt;
&lt;td&gt;~150&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Claude Sonnet 4.5&lt;/td&gt;
&lt;td&gt;45.2%&lt;/td&gt;
&lt;td&gt;46.1%&lt;/td&gt;
&lt;td&gt;~120&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Kimi K2.5&lt;/td&gt;
&lt;td&gt;40.1%&lt;/td&gt;
&lt;td&gt;39.7%&lt;/td&gt;
&lt;td&gt;~50&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GPT-5.2-Codex&lt;/td&gt;
&lt;td&gt;43.5%&lt;/td&gt;
&lt;td&gt;42.8%&lt;/td&gt;
&lt;td&gt;~130&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DeepSeek-V3&lt;/td&gt;
&lt;td&gt;38.9%&lt;/td&gt;
&lt;td&gt;37.2%&lt;/td&gt;
&lt;td&gt;~110&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Other Coding Benchmarks
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;TerminalBench 2.0&lt;/strong&gt;: Competitive performance with frontier models&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Aider Benchmark&lt;/strong&gt;: Strong tool-calling and file editing capabilities&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multilingual Support&lt;/strong&gt;: Excellent performance across Python, JavaScript, Java, C++, and more&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;📊 &lt;strong&gt;Interpretation&lt;/strong&gt;&lt;br&gt;
While Qwen3-Coder-Next takes more agent turns on average (~150 vs ~120 for Sonnet 4.5), it achieves comparable success rates. This suggests it may require more iterations but ultimately solves similar numbers of problems.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Real-World Performance Reports
&lt;/h3&gt;

&lt;p&gt;From community testing:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Speed&lt;/strong&gt;: 20-40 tokens/sec on consumer hardware (varies by quantization)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Context Handling&lt;/strong&gt;: Successfully manages 64K-128K context windows&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tool Calling&lt;/strong&gt;: Reliable function calling with JSON format&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Code Quality&lt;/strong&gt;: Generates production-ready code for most common tasks&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Hardware Requirements and Setup
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Minimum Requirements by Quantization Level
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Quantization&lt;/th&gt;
&lt;th&gt;VRAM/RAM Needed&lt;/th&gt;
&lt;th&gt;Hardware Examples&lt;/th&gt;
&lt;th&gt;Speed (tok/s)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Q2_K&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;~26-30GB&lt;/td&gt;
&lt;td&gt;32GB Mac Mini M4&lt;/td&gt;
&lt;td&gt;15-25&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Q4_K_XL&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;~35-40GB&lt;/td&gt;
&lt;td&gt;64GB MacBook Pro, RTX 5090 32GB&lt;/td&gt;
&lt;td&gt;25-40&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Q6_K&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;~50-55GB&lt;/td&gt;
&lt;td&gt;96GB Workstation, Mac Studio&lt;/td&gt;
&lt;td&gt;30-45&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Q8_0&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;~65-70GB&lt;/td&gt;
&lt;td&gt;128GB Workstation, Dual GPUs&lt;/td&gt;
&lt;td&gt;35-50&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;FP8&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;~90-110GB&lt;/td&gt;
&lt;td&gt;H100, A100, Multi-GPU setup&lt;/td&gt;
&lt;td&gt;40-60&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Recommended Configurations
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Budget Setup (~$2,000-3,000)&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Mac Mini M4 with 64GB unified memory&lt;/li&gt;
&lt;li&gt;Quantization: Q4_K_XL or Q4_K_M&lt;/li&gt;
&lt;li&gt;Expected speed: 20-30 tok/s&lt;/li&gt;
&lt;li&gt;Context: Up to 100K tokens&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Enthusiast Setup (~$5,000-8,000)&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;RTX 5090 (32GB) + 128GB DDR5 RAM&lt;/li&gt;
&lt;li&gt;Quantization: Q6_K or Q8_0&lt;/li&gt;
&lt;li&gt;Expected speed: 30-40 tok/s&lt;/li&gt;
&lt;li&gt;Context: Full 256K tokens&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Professional Setup (~$10,000-15,000)&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Mac Studio M3 Ultra (256GB) OR&lt;/li&gt;
&lt;li&gt;Dual RTX 4090/5090 setup OR&lt;/li&gt;
&lt;li&gt;AMD Radeon 7900 XTX + 256GB RAM&lt;/li&gt;
&lt;li&gt;Quantization: Q8_0 or FP8&lt;/li&gt;
&lt;li&gt;Expected speed: 40-60 tok/s&lt;/li&gt;
&lt;li&gt;Context: Full 256K tokens&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;💡 &lt;strong&gt;Pro Tip&lt;/strong&gt;&lt;br&gt;
MoE models like Qwen3-Coder-Next can efficiently split between GPU (dense layers) and CPU RAM (sparse experts), allowing you to run larger quantizations than your VRAM alone would suggest.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  How to Install and Run Qwen3-Coder-Next
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Method 1: Using llama.cpp (Recommended for Most Users)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Step 1: Install llama.cpp&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# macOS with Homebrew&lt;/span&gt;
brew &lt;span class="nb"&gt;install &lt;/span&gt;llama.cpp

&lt;span class="c"&gt;# Or build from source&lt;/span&gt;
git clone https://github.com/ggerganov/llama.cpp
&lt;span class="nb"&gt;cd &lt;/span&gt;llama.cpp
make
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Step 2: Download the Model&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Using Hugging Face CLI (recommended)&lt;/span&gt;
llama-cli &lt;span class="nt"&gt;-hf&lt;/span&gt; unsloth/Qwen3-Coder-Next-GGUF:UD-Q4_K_XL

&lt;span class="c"&gt;# Or download manually from:&lt;/span&gt;
&lt;span class="c"&gt;# https://huggingface.co/unsloth/Qwen3-Coder-Next-GGUF&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Step 3: Run the Server&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;llama-server &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-hf&lt;/span&gt; unsloth/Qwen3-Coder-Next-GGUF:UD-Q4_K_XL &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--fit&lt;/span&gt; on &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--seed&lt;/span&gt; 3407 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--temp&lt;/span&gt; 1.0 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--top-p&lt;/span&gt; 0.95 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--min-p&lt;/span&gt; 0.01 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--top-k&lt;/span&gt; 40 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--jinja&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--port&lt;/span&gt; 8080
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This creates an OpenAI-compatible API endpoint at &lt;code&gt;http://localhost:8080&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Method 2: Using Ollama (Easiest for Beginners)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Install Ollama&lt;/span&gt;
curl &lt;span class="nt"&gt;-fsSL&lt;/span&gt; https://ollama.com/install.sh | sh

&lt;span class="c"&gt;# Pull and run the model&lt;/span&gt;
ollama pull qwen3-coder-next
ollama run qwen3-coder-next
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Method 3: Using vLLM (Best for Production)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Install vLLM&lt;/span&gt;
pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="s1"&gt;'vllm&amp;gt;=0.15.0'&lt;/span&gt;

&lt;span class="c"&gt;# Start server&lt;/span&gt;
vllm serve Qwen/Qwen3-Coder-Next &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--port&lt;/span&gt; 8000 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--tensor-parallel-size&lt;/span&gt; 2 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--enable-auto-tool-choice&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--tool-call-parser&lt;/span&gt; qwen3_coder
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Method 4: Using SGLang (Fastest Inference)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Install SGLang&lt;/span&gt;
pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="s1"&gt;'sglang[all]&amp;gt;=v0.5.8'&lt;/span&gt;

&lt;span class="c"&gt;# Launch server&lt;/span&gt;
python &lt;span class="nt"&gt;-m&lt;/span&gt; sglang.launch_server &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--model&lt;/span&gt; Qwen/Qwen3-Coder-Next &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--port&lt;/span&gt; 30000 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--tp-size&lt;/span&gt; 2 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--tool-call-parser&lt;/span&gt; qwen3_coder
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;⚠️ &lt;strong&gt;Context Length Warning&lt;/strong&gt;&lt;br&gt;
The default 256K context may cause OOM errors on systems with limited memory. Start with &lt;code&gt;--ctx-size 32768&lt;/code&gt; and increase gradually.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Integration with Coding Tools
&lt;/h2&gt;

&lt;h3&gt;
  
  
  OpenCode (Recommended)
&lt;/h3&gt;

&lt;p&gt;OpenCode is an open-source coding agent that works excellently with Qwen3-Coder-Next:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Install OpenCode&lt;/span&gt;
npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-g&lt;/span&gt; @opencode/cli

&lt;span class="c"&gt;# Configure for local model&lt;/span&gt;
opencode config &lt;span class="nb"&gt;set &lt;/span&gt;model http://localhost:8080/v1
opencode config &lt;span class="nb"&gt;set &lt;/span&gt;api-key &lt;span class="s2"&gt;"not-needed"&lt;/span&gt;

&lt;span class="c"&gt;# Start coding&lt;/span&gt;
opencode
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Cursor Integration
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Open Cursor Settings&lt;/li&gt;
&lt;li&gt;Navigate to "Models" → "Add Custom Model"&lt;/li&gt;
&lt;li&gt;Enter endpoint: &lt;code&gt;http://localhost:8080/v1&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Model name: &lt;code&gt;qwen3-coder-next&lt;/code&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Continue.dev Integration
&lt;/h3&gt;

&lt;p&gt;Edit &lt;code&gt;~/.continue/config.json&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"models"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"title"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Qwen3-Coder-Next"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"provider"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"openai"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"qwen3-coder-next"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"apiBase"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"http://localhost:8080/v1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"apiKey"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"not-needed"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Aider Integration
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;aider &lt;span class="nt"&gt;--model&lt;/span&gt; openai/qwen3-coder-next &lt;span class="se"&gt;\&lt;/span&gt;
      &lt;span class="nt"&gt;--openai-api-base&lt;/span&gt; http://localhost:8080/v1 &lt;span class="se"&gt;\&lt;/span&gt;
      &lt;span class="nt"&gt;--openai-api-key&lt;/span&gt; not-needed
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;💡 &lt;strong&gt;Best Practice&lt;/strong&gt;&lt;br&gt;
Use recommended sampling parameters for optimal results:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Temperature: 1.0&lt;/li&gt;
&lt;li&gt;Top-p: 0.95&lt;/li&gt;
&lt;li&gt;Top-k: 40&lt;/li&gt;
&lt;li&gt;Min-p: 0.01&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Quantization Options Explained
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Understanding Quantization Levels
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Quant Type&lt;/th&gt;
&lt;th&gt;Bits&lt;/th&gt;
&lt;th&gt;Size&lt;/th&gt;
&lt;th&gt;Quality&lt;/th&gt;
&lt;th&gt;Speed&lt;/th&gt;
&lt;th&gt;Best For&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Q2_K&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;2-bit&lt;/td&gt;
&lt;td&gt;~26GB&lt;/td&gt;
&lt;td&gt;Fair&lt;/td&gt;
&lt;td&gt;Fastest&lt;/td&gt;
&lt;td&gt;Testing, limited hardware&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Q4_K_M&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;4-bit&lt;/td&gt;
&lt;td&gt;~38GB&lt;/td&gt;
&lt;td&gt;Good&lt;/td&gt;
&lt;td&gt;Fast&lt;/td&gt;
&lt;td&gt;Balanced performance&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Q4_K_XL&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;4-bit+&lt;/td&gt;
&lt;td&gt;~40GB&lt;/td&gt;
&lt;td&gt;Very Good&lt;/td&gt;
&lt;td&gt;Fast&lt;/td&gt;
&lt;td&gt;Recommended default&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Q6_K&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;6-bit&lt;/td&gt;
&lt;td&gt;~52GB&lt;/td&gt;
&lt;td&gt;Excellent&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;td&gt;High quality needs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Q8_0&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;8-bit&lt;/td&gt;
&lt;td&gt;~68GB&lt;/td&gt;
&lt;td&gt;Near-perfect&lt;/td&gt;
&lt;td&gt;Slower&lt;/td&gt;
&lt;td&gt;Maximum quality&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;MXFP4_MOE&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;4-bit&lt;/td&gt;
&lt;td&gt;~35GB&lt;/td&gt;
&lt;td&gt;Good&lt;/td&gt;
&lt;td&gt;Fast&lt;/td&gt;
&lt;td&gt;NVIDIA GPUs only&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;FP8&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;8-bit&lt;/td&gt;
&lt;td&gt;~95GB&lt;/td&gt;
&lt;td&gt;Perfect&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;td&gt;Production use&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Unsloth Dynamic (UD) Quantization
&lt;/h3&gt;

&lt;p&gt;The &lt;strong&gt;UD-&lt;/strong&gt; prefix indicates Unsloth's dynamic quantization:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Automatically upcasts important layers to higher precision&lt;/li&gt;
&lt;li&gt;Maintains model quality while reducing size&lt;/li&gt;
&lt;li&gt;Uses calibration datasets for optimal layer selection&lt;/li&gt;
&lt;li&gt;Typically provides better quality than standard quants at same size&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Recommended choices:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;General use&lt;/strong&gt;: UD-Q4_K_XL&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;NVIDIA GPUs&lt;/strong&gt;: MXFP4_MOE&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Maximum quality&lt;/strong&gt;: Q8_0 or FP8&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Real-World Use Cases and Performance
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Community Testing Results
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Test 1: Simple HTML Game (Flappy Bird)&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Model: Q8_0 on RTX 6000&lt;/li&gt;
&lt;li&gt;Result: ✅ One-shot success&lt;/li&gt;
&lt;li&gt;Speed: 60+ tok/s&lt;/li&gt;
&lt;li&gt;Code quality: Production-ready&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Test 2: Complex React Application&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Model: Q4_K_XL on Mac Studio&lt;/li&gt;
&lt;li&gt;Result: ⚠️ Required 2-3 iterations&lt;/li&gt;
&lt;li&gt;Speed: 32 tok/s&lt;/li&gt;
&lt;li&gt;Code quality: Good with minor fixes needed&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Test 3: Rust Code Analysis&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Model: Q4_K_XL on AMD 7900 XTX&lt;/li&gt;
&lt;li&gt;Result: ✅ Excellent analysis and suggestions&lt;/li&gt;
&lt;li&gt;Speed: 35-39 tok/s&lt;/li&gt;
&lt;li&gt;Context: 64K tokens handled well&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Test 4: Tower Defense Game (Complex Prompt)&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Model: Various quantizations&lt;/li&gt;
&lt;li&gt;Result: ⚠️ Mixed - better than most local models but not perfect&lt;/li&gt;
&lt;li&gt;Common issues: Game balance, visual effects complexity&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Performance vs Claude Code
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Aspect&lt;/th&gt;
&lt;th&gt;Qwen3-Coder-Next (Local)&lt;/th&gt;
&lt;th&gt;Claude Code&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Speed&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;20-40 tok/s&lt;/td&gt;
&lt;td&gt;50-80 tok/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;First-time success&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;60-70%&lt;/td&gt;
&lt;td&gt;75-85%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Context handling&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Excellent (256K)&lt;/td&gt;
&lt;td&gt;Excellent (200K)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Tool calling&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Reliable&lt;/td&gt;
&lt;td&gt;Very reliable&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Cost&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$0 after hardware&lt;/td&gt;
&lt;td&gt;$100/month&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Privacy&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Complete&lt;/td&gt;
&lt;td&gt;Cloud-based&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Offline use&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;✅ Yes&lt;/td&gt;
&lt;td&gt;❌ No&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;blockquote&gt;
&lt;p&gt;📊 &lt;strong&gt;Reality Check&lt;/strong&gt;&lt;br&gt;
While Qwen3-Coder-Next is impressive, it's not quite at Claude Opus 4.5 level in practice. Think of it as comparable to Claude Sonnet 4.0 or GPT-4 Turbo - very capable but may need more guidance on complex tasks.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Comparison: Qwen3-Coder-Next vs Claude vs GPT
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Feature Comparison Matrix
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;Qwen3-Coder-Next&lt;/th&gt;
&lt;th&gt;Claude Opus 4.5&lt;/th&gt;
&lt;th&gt;GPT-5.2-Codex&lt;/th&gt;
&lt;th&gt;DeepSeek-V3&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Deployment&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Local/Self-hosted&lt;/td&gt;
&lt;td&gt;Cloud only&lt;/td&gt;
&lt;td&gt;Cloud only&lt;/td&gt;
&lt;td&gt;Cloud/Local&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Cost&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Hardware only&lt;/td&gt;
&lt;td&gt;$100/mo&lt;/td&gt;
&lt;td&gt;$200/mo&lt;/td&gt;
&lt;td&gt;$0.14/M tokens&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Speed (local)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;20-40 tok/s&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;td&gt;15-30 tok/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Context&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;256K&lt;/td&gt;
&lt;td&gt;200K&lt;/td&gt;
&lt;td&gt;128K&lt;/td&gt;
&lt;td&gt;128K&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Tool calling&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;✅ Excellent&lt;/td&gt;
&lt;td&gt;✅ Excellent&lt;/td&gt;
&lt;td&gt;✅ Excellent&lt;/td&gt;
&lt;td&gt;✅ Good&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Code quality&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Very Good&lt;/td&gt;
&lt;td&gt;Excellent&lt;/td&gt;
&lt;td&gt;Excellent&lt;/td&gt;
&lt;td&gt;Good&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Privacy&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;✅ Complete&lt;/td&gt;
&lt;td&gt;❌ Cloud&lt;/td&gt;
&lt;td&gt;❌ Cloud&lt;/td&gt;
&lt;td&gt;⚠️ Depends&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Offline&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;✅ Yes&lt;/td&gt;
&lt;td&gt;❌ No&lt;/td&gt;
&lt;td&gt;❌ No&lt;/td&gt;
&lt;td&gt;⚠️ If local&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Open weights&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;✅ Yes&lt;/td&gt;
&lt;td&gt;❌ No&lt;/td&gt;
&lt;td&gt;❌ No&lt;/td&gt;
&lt;td&gt;✅ Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  When to Choose Each Model
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Choose Qwen3-Coder-Next when:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You have sensitive code/IP concerns&lt;/li&gt;
&lt;li&gt;You want zero marginal costs&lt;/li&gt;
&lt;li&gt;You need offline capability&lt;/li&gt;
&lt;li&gt;You have suitable hardware ($2K-10K budget)&lt;/li&gt;
&lt;li&gt;You're comfortable with 90-95% of frontier model capability&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Choose Claude Opus 4.5 when:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You need the absolute best coding quality&lt;/li&gt;
&lt;li&gt;Speed is critical (faster inference)&lt;/li&gt;
&lt;li&gt;You prefer zero setup hassle&lt;/li&gt;
&lt;li&gt;Budget allows $100-200/month&lt;/li&gt;
&lt;li&gt;You work on very complex, novel problems&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Choose GPT-5.2-Codex when:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You want strong reasoning capabilities&lt;/li&gt;
&lt;li&gt;You need excellent documentation generation&lt;/li&gt;
&lt;li&gt;You prefer OpenAI's ecosystem&lt;/li&gt;
&lt;li&gt;You have enterprise ChatGPT access&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Common Issues and Solutions
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Issue 1: Out of Memory (OOM) Errors
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Symptoms&lt;/strong&gt;: Model crashes during loading or inference&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Solutions&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Reduce context size&lt;/span&gt;
&lt;span class="nt"&gt;--ctx-size&lt;/span&gt; 32768  &lt;span class="c"&gt;# Instead of default 256K&lt;/span&gt;

&lt;span class="c"&gt;# Use smaller quantization&lt;/span&gt;
&lt;span class="c"&gt;# Try Q4_K_M instead of Q6_K&lt;/span&gt;

&lt;span class="c"&gt;# Enable CPU offloading&lt;/span&gt;
&lt;span class="nt"&gt;--n-gpu-layers&lt;/span&gt; 30  &lt;span class="c"&gt;# Adjust based on your VRAM&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Issue 2: Slow Inference Speed
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Symptoms&lt;/strong&gt;: &amp;lt; 10 tokens/second&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Solutions&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use MXFP4_MOE on NVIDIA GPUs&lt;/li&gt;
&lt;li&gt;Enable &lt;code&gt;--no-mmap&lt;/code&gt; and &lt;code&gt;--fa on&lt;/code&gt; flags&lt;/li&gt;
&lt;li&gt;Reduce context window&lt;/li&gt;
&lt;li&gt;Check if model is fully loaded to GPU&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Issue 3: Model Gets Stuck in Loops
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Symptoms&lt;/strong&gt;: Repeats same actions or text continuously&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Solutions&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Adjust sampling parameters&lt;/span&gt;
&lt;span class="nt"&gt;--temp&lt;/span&gt; 1.0        &lt;span class="c"&gt;# Default temperature&lt;/span&gt;
&lt;span class="nt"&gt;--top-p&lt;/span&gt; 0.95      &lt;span class="c"&gt;# Nucleus sampling&lt;/span&gt;
&lt;span class="nt"&gt;--top-k&lt;/span&gt; 40        &lt;span class="c"&gt;# Top-k sampling&lt;/span&gt;
&lt;span class="nt"&gt;--repeat-penalty&lt;/span&gt; 1.1  &lt;span class="c"&gt;# Penalize repetition&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Issue 4: Poor Tool Calling with OpenCode/Cline
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Symptoms&lt;/strong&gt;: Model doesn't follow tool schemas correctly&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Solutions&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Ensure you're using &lt;code&gt;--tool-call-parser qwen3_coder&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Update to latest llama.cpp/vLLM version&lt;/li&gt;
&lt;li&gt;Try Q6_K or higher quantization&lt;/li&gt;
&lt;li&gt;Use recommended sampling parameters&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Issue 5: MLX Performance Issues on Mac
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Symptoms&lt;/strong&gt;: Slow prompt processing, frequent re-processing&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Solutions&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use llama.cpp instead of MLX for better KV cache handling&lt;/li&gt;
&lt;li&gt;Try LM Studio which has optimized MLX implementation&lt;/li&gt;
&lt;li&gt;Reduce branching in conversations (avoid regenerating responses)&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;⚠️ &lt;strong&gt;Known Limitation&lt;/strong&gt;&lt;br&gt;
MLX currently has issues with KV cache consistency during conversation branching. Use llama.cpp for better experience on Mac.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Q: Can I run Qwen3-Coder-Next on a MacBook with 32GB RAM?
&lt;/h3&gt;

&lt;p&gt;A: Yes, but you'll need to use aggressive quantization (Q2_K or Q4_K_M) and limit context to 64K-100K tokens. Performance will be around 15-25 tok/s, which is usable but not ideal for intensive coding sessions.&lt;/p&gt;

&lt;h3&gt;
  
  
  Q: Is Qwen3-Coder-Next better than Claude Code?
&lt;/h3&gt;

&lt;p&gt;A: Not quite. In practice, it performs closer to Claude Sonnet 4.0 level. It's excellent for most coding tasks but may struggle with very complex, novel problems that Opus 4.5 handles easily. The trade-off is complete privacy and zero ongoing costs.&lt;/p&gt;

&lt;h3&gt;
  
  
  Q: Can I use this with VS Code Copilot?
&lt;/h3&gt;

&lt;p&gt;A: Not directly as a Copilot replacement, but you can use it with VS Code extensions like Continue.dev, Cline, or Twinny that support custom model endpoints.&lt;/p&gt;

&lt;h3&gt;
  
  
  Q: How does quantization affect code quality?
&lt;/h3&gt;

&lt;p&gt;A: Q4 and above maintain very good quality. Q2 shows noticeable degradation. For production use, Q6 or Q8 is recommended. The UD (Unsloth Dynamic) variants provide better quality at the same bit level.&lt;/p&gt;

&lt;h3&gt;
  
  
  Q: Will this work with my AMD GPU?
&lt;/h3&gt;

&lt;p&gt;A: Yes! llama.cpp supports AMD GPUs via ROCm or Vulkan. Users report good results with Radeon 7900 XTX. MXFP4 quantization is NVIDIA-only, but other quants work fine.&lt;/p&gt;

&lt;h3&gt;
  
  
  Q: Can I fine-tune this model on my own code?
&lt;/h3&gt;

&lt;p&gt;A: Yes, the model supports fine-tuning. Use Unsloth or Axolotl for efficient fine-tuning. However, with 80B parameters, you'll need significant compute (multi-GPU setup recommended).&lt;/p&gt;

&lt;h3&gt;
  
  
  Q: How does this compare to DeepSeek-V3?
&lt;/h3&gt;

&lt;p&gt;A: Qwen3-Coder-Next generally performs better on coding agent tasks and has better tool-calling capabilities. DeepSeek-V3 is more general-purpose and may be better for non-coding tasks.&lt;/p&gt;

&lt;h3&gt;
  
  
  Q: Is there a smaller version for lower-end hardware?
&lt;/h3&gt;

&lt;p&gt;A: Consider Qwen2.5-Coder-32B or GLM-4.7-Flash for more modest hardware. They're less capable but run well on 16-32GB systems.&lt;/p&gt;

&lt;h3&gt;
  
  
  Q: Can I use this commercially?
&lt;/h3&gt;

&lt;p&gt;A: Yes, Qwen3-Coder-Next is released with open weights under a permissive license allowing commercial use. Always check the latest license terms on Hugging Face.&lt;/p&gt;

&lt;h3&gt;
  
  
  Q: Why does it take so many agent turns compared to other models?
&lt;/h3&gt;

&lt;p&gt;A: The model is optimized for reliability over speed. It takes more exploratory steps but maintains consistency. This is actually beneficial for complex tasks where rushing leads to errors.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion and Next Steps
&lt;/h2&gt;

&lt;p&gt;Qwen3-Coder-Next represents a significant milestone in making powerful AI coding assistants accessible to individual developers. While it may not match the absolute peak performance of Claude Opus 4.5 or GPT-5.2-Codex, it offers a compelling combination of:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Strong performance&lt;/strong&gt; (90-95% of frontier models)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Complete privacy&lt;/strong&gt; (runs entirely on your hardware)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Zero marginal costs&lt;/strong&gt; (no per-token pricing)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tool freedom&lt;/strong&gt; (use any coding agent you prefer)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Recommended Action Plan
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Week 1: Testing Phase&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Install llama.cpp or Ollama&lt;/li&gt;
&lt;li&gt;Download Q4_K_XL quantization&lt;/li&gt;
&lt;li&gt;Test with simple coding tasks&lt;/li&gt;
&lt;li&gt;Measure speed and quality on your hardware&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Week 2: Integration Phase&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Choose your preferred coding agent (OpenCode, Aider, Continue.dev)&lt;/li&gt;
&lt;li&gt;Configure optimal sampling parameters&lt;/li&gt;
&lt;li&gt;Test with real projects&lt;/li&gt;
&lt;li&gt;Compare with your current workflow&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Week 3: Optimization Phase&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Experiment with different quantizations&lt;/li&gt;
&lt;li&gt;Optimize context window size&lt;/li&gt;
&lt;li&gt;Fine-tune for your specific use cases (optional)&lt;/li&gt;
&lt;li&gt;Set up automated workflows&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Future Outlook
&lt;/h3&gt;

&lt;p&gt;The gap between open-weight and closed models continues to narrow. With releases like Qwen3-Coder-Next, GLM-4.7-Flash, and upcoming models from DeepSeek and others, we're approaching a future where:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Most developers can run SOTA-level models locally&lt;/li&gt;
&lt;li&gt;Privacy and cost concerns are eliminated&lt;/li&gt;
&lt;li&gt;Innovation happens in open ecosystems&lt;/li&gt;
&lt;li&gt;Tool diversity flourishes without vendor lock-in&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Additional Resources
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Official Documentation&lt;/strong&gt;: &lt;a href="https://qwen.readthedocs.io/" rel="noopener noreferrer"&gt;Qwen Documentation&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Model Repository&lt;/strong&gt;: &lt;a href="https://huggingface.co/Qwen/Qwen3-Coder-Next" rel="noopener noreferrer"&gt;Hugging Face - Qwen/Qwen3-Coder-Next&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GGUF Quantizations&lt;/strong&gt;: &lt;a href="https://huggingface.co/unsloth/Qwen3-Coder-Next-GGUF" rel="noopener noreferrer"&gt;Unsloth GGUF Repository&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Technical Report&lt;/strong&gt;: &lt;a href="https://github.com/QwenLM/Qwen3-Coder/blob/main/qwen3_coder_next_tech_report.pdf" rel="noopener noreferrer"&gt;Qwen3-Coder-Next Technical Report&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Community Discussion&lt;/strong&gt;: &lt;a href="https://www.reddit.com/r/LocalLLaMA/" rel="noopener noreferrer"&gt;r/LocalLLaMA&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;strong&gt;Last Updated&lt;/strong&gt;: February 2026 | &lt;strong&gt;Model Version&lt;/strong&gt;: Qwen3-Coder-Next (80B-A3B) | &lt;strong&gt;Guide Version&lt;/strong&gt;: 1.0&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;💡 &lt;strong&gt;Stay Updated&lt;/strong&gt;&lt;br&gt;
The AI landscape evolves rapidly. Follow Qwen's blog and GitHub repository for updates, and join the LocalLLaMA community for real-world usage tips and optimization techniques.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h3&gt;
  
  
  Related Posts
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://a2aprotocol.ai/blog/2026-glm-ocr-complete-guide" rel="noopener noreferrer"&gt;2026 Complete Guide: How to Use GLM-OCR for Next-Gen Document Understanding&lt;/a&gt; — 0.9B-parameter multimodal OCR model for complex document understanding&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://a2aprotocol.ai/blog/2026-moltworker-complete-guide" rel="noopener noreferrer"&gt;The Complete 2026 Guide: Moltworker — Running Personal AI Agents on Cloudflare Without Hardware&lt;/a&gt; — Deploy AI agents on Cloudflare with no infrastructure costs&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://a2aprotocol.ai/blog/2026-universal-commerce-protocol" rel="noopener noreferrer"&gt;Universal Commerce Protocol (UCP): The Complete 2026 Guide to Agentic Commerce Standards&lt;/a&gt; — Open standard for AI-powered commerce and payment processing&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://a2aprotocol.ai/blog/2026-qwen3-coder-next-complete-guide" rel="noopener noreferrer"&gt;Qwen3-Coder-Next Complete 2026 Guide - Running AI Coding Agents Locally&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
    </item>
    <item>
      <title>2026 Complete Guide: How to Use GLM-OCR for Next-Gen Document Understanding</title>
      <dc:creator>Sienna</dc:creator>
      <pubDate>Wed, 04 Feb 2026 12:24:12 +0000</pubDate>
      <link>https://forem.com/sienna/the-complete-2026-guide-building-interactive-dashboards-with-a2ui-rizzcharts-538j</link>
      <guid>https://forem.com/sienna/the-complete-2026-guide-building-interactive-dashboards-with-a2ui-rizzcharts-538j</guid>
      <description>&lt;h2&gt;
  
  
  🎯 Core Takeaways (TL;DR)
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;GLM-OCR&lt;/strong&gt; is a &lt;strong&gt;0.9B-parameter&lt;/strong&gt; multimodal OCR model built on the GLM-V architecture, designed for &lt;strong&gt;complex document understanding&lt;/strong&gt;, not just text extraction.[1][2]
&lt;/li&gt;
&lt;li&gt;It delivers &lt;strong&gt;structure-first outputs&lt;/strong&gt; (semantic Markdown, JSON, LaTeX), accurately reconstructing &lt;strong&gt;tables, formulas, layout, and even handwriting&lt;/strong&gt; across 100+ languages.[1]
&lt;/li&gt;
&lt;li&gt;GLM-OCR achieves &lt;strong&gt;state-of-the-art performance on OmniDocBench V1.5 (94.62)&lt;/strong&gt; while remaining lightweight and fast, with &lt;strong&gt;~1.86 PDF pages/second&lt;/strong&gt;, making it suitable for research, finance, legal, and developer workflows with &lt;strong&gt;open Apache-2.0 weights&lt;/strong&gt;.[1][2][3]
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Table of Contents
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;What Is GLM-OCR?&lt;/li&gt;
&lt;li&gt;How Does GLM-OCR Work Architecturally?&lt;/li&gt;
&lt;li&gt;What Are the Key Features and Technical Specs?&lt;/li&gt;
&lt;li&gt;How Well Does GLM-OCR Perform? (Benchmarks &amp;amp; Precision)&lt;/li&gt;
&lt;li&gt;Where Can You Use GLM-OCR? Real-World Use Cases&lt;/li&gt;
&lt;li&gt;GLM-OCR vs Other OCR Models (PaddleOCR, DeepSeekOCR, VLMs)&lt;/li&gt;
&lt;li&gt;How to Deploy and Use GLM-OCR in Practice&lt;/li&gt;
&lt;li&gt;Step-by-Step Workflow: From PDF/Image to Structured Data&lt;/li&gt;
&lt;li&gt;Best Practices, Tips, and Caveats&lt;/li&gt;
&lt;li&gt;🤔 Frequently Asked Questions (FAQ)&lt;/li&gt;
&lt;li&gt;Conclusion and Recommended Next Steps&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  What Is GLM-OCR?
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;GLM-OCR&lt;/strong&gt; is a &lt;strong&gt;multimodal OCR model for complex document understanding&lt;/strong&gt;, derived from the &lt;strong&gt;GLM-4V / GLM-V&lt;/strong&gt; vision-language architecture.[1][2] Unlike classic OCR systems that only output raw text, GLM-OCR focuses on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Understanding layouts&lt;/strong&gt; (headings, sections, footnotes, tables, figures)
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Preserving structure&lt;/strong&gt; in semantic formats (Markdown, JSON, LaTeX)
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reasoning about content&lt;/strong&gt;, not just recognizing characters
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Key characteristics:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Lightweight&lt;/strong&gt;: ~&lt;strong&gt;0.9B parameters&lt;/strong&gt;, dramatically smaller than many VLM-based OCR models while keeping SOTA accuracy.[2][3]
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multimodal&lt;/strong&gt;: Consumes PDFs and images (JPG/PNG), outputs rich structured representations.[1][2]
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Open-weight &amp;amp; Apache-2.0 licensed&lt;/strong&gt;: Suitable for commercial use and on-prem deployments.[1][3]
&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;✅ &lt;strong&gt;Best for&lt;/strong&gt;: Teams that need &lt;strong&gt;high-accuracy OCR plus document structure&lt;/strong&gt; (tables, formulas, headings) at &lt;strong&gt;reasonable compute cost&lt;/strong&gt; and want &lt;strong&gt;open-source licensing&lt;/strong&gt;.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  How Does GLM-OCR Work Architecturally?
&lt;/h2&gt;

&lt;p&gt;GLM-OCR uses a &lt;strong&gt;three-stage pipeline&lt;/strong&gt; that combines computer vision and language modeling.[1][2]&lt;/p&gt;

&lt;h3&gt;
  
  
  Architecture Overview
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Stage&lt;/th&gt;
&lt;th&gt;Component / Tech&lt;/th&gt;
&lt;th&gt;What It Does&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;1. Visual Ingestion&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;CogViT visual encoder&lt;/strong&gt;[2]&lt;/td&gt;
&lt;td&gt;Captures pixel-level and layout information from pages&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;2. Multimodal Reasoning&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;GLM-V-based vision-language fusion[1]&lt;/td&gt;
&lt;td&gt;Aligns visual features with language understanding&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;3. Structured Generation&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Decoder with &lt;strong&gt;Multi-Token Prediction (MTP)&lt;/strong&gt;[1]&lt;/td&gt;
&lt;td&gt;Generates structured Markdown/JSON/LaTeX, correcting errors&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Key design ideas:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;CogViT encoder&lt;/strong&gt;: A specialized vision backbone optimized for documents, not generic images.[2]
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GLM-V multimodal reasoning&lt;/strong&gt;: Allows the model to interpret relationships between text blocks, tables, and figures.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multi-Token Prediction (MTP)&lt;/strong&gt;: Predicts multiple tokens per step and uses context to fix errors on the fly—this behaves more like &lt;strong&gt;semantic proofreading&lt;/strong&gt; than naive character recognition.[1]
&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;💡 &lt;strong&gt;Pro Tip&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
MTP is especially valuable on noisy scans or handwriting: GLM-OCR can use surrounding context to infer the correct token sequence instead of rigidly copying visual artifacts.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  What Are the Key Features and Technical Specs?
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Document Understanding Features
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Layout semantics awareness&lt;/strong&gt;

&lt;ul&gt;
&lt;li&gt;Detects and preserves headings, subheadings, section hierarchies, footnotes, captions, and other structural elements.[1]
&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;strong&gt;Tables → Markdown&lt;/strong&gt;

&lt;ul&gt;
&lt;li&gt;Converts complex tables into &lt;strong&gt;Markdown&lt;/strong&gt; (and can be further transformed into CSV/Excel).[1]
&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;strong&gt;Formulas → LaTeX&lt;/strong&gt;

&lt;ul&gt;
&lt;li&gt;Reconstructs complex mathematical expressions into valid &lt;strong&gt;LaTeX&lt;/strong&gt;.[1]
&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;strong&gt;Handwriting interpretation&lt;/strong&gt;

&lt;ul&gt;
&lt;li&gt;Handles handwritten notes and annotations using contextual reasoning.[1]
&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;strong&gt;Contextual perception&lt;/strong&gt;

&lt;ul&gt;
&lt;li&gt;Fixes mis-detections as it generates, using language modeling to ensure globally coherent output.[1]
&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;h3&gt;
  
  
  Language &amp;amp; Format Support
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Input formats&lt;/strong&gt;:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;PDF (up to ~50 MB, up to 100 pages per document)[2]
&lt;/li&gt;
&lt;li&gt;Images: JPG, PNG (up to ~10 MB per image)[2]
&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;&lt;strong&gt;Output formats&lt;/strong&gt;:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Semantic Markdown&lt;/strong&gt; (with headings, tables, lists, code blocks)[1][2]
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;JSON&lt;/strong&gt; (structure-first; ideal for downstream pipelines)[1]
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LaTeX&lt;/strong&gt; for mathematical content and formulas[1]
&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;&lt;strong&gt;Languages&lt;/strong&gt;:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Supports &lt;strong&gt;100+ languages&lt;/strong&gt;, with strong performance in &lt;strong&gt;English, Chinese (中文), Japanese (日本語)&lt;/strong&gt; and major European languages.[1][2]
&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;h3&gt;
  
  
  Core Technical Specs
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Spec&lt;/th&gt;
&lt;th&gt;Value / Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Model size&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;0.9B parameters&lt;/strong&gt;[2]&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Architecture&lt;/td&gt;
&lt;td&gt;GLM-V-based multimodal + CogViT visual encoder[1][2]&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Input modalities&lt;/td&gt;
&lt;td&gt;PDF, images (JPG, PNG)[1][2]&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Max pages per PDF&lt;/td&gt;
&lt;td&gt;~100 pages[2]&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Output formats&lt;/td&gt;
&lt;td&gt;Markdown, LaTeX, JSON[1][2]&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;License&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;Apache-2.0 (open-weight)&lt;/strong&gt;[1][3]&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Deployment frameworks&lt;/td&gt;
&lt;td&gt;VLLM, SGLang, API, local runners[2][3]&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;blockquote&gt;
&lt;p&gt;✅ &lt;strong&gt;Best Practice&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
For automation and integration, prefer &lt;strong&gt;JSON output&lt;/strong&gt;; for human-readable exports and documentation, use &lt;strong&gt;semantic Markdown + LaTeX&lt;/strong&gt;.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  How Well Does GLM-OCR Perform? (Benchmarks &amp;amp; Precision)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  OmniDocBench Performance
&lt;/h3&gt;

&lt;p&gt;GLM-OCR is reported as &lt;strong&gt;state-of-the-art&lt;/strong&gt; on &lt;strong&gt;OmniDocBench V1.5&lt;/strong&gt;, a leading benchmark for document understanding.[2][3][4]&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Score&lt;/strong&gt;: ~&lt;strong&gt;94.62&lt;/strong&gt; on OmniDocBench V1.5[2][3][4]
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Position&lt;/strong&gt;: #1 on that benchmark among document parsing models in its class.[2][3]
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These scores are especially impressive given its &lt;strong&gt;only 0.9B parameters&lt;/strong&gt;, which is much smaller than many competing VLM-based OCR models.[2][3]&lt;/p&gt;

&lt;h3&gt;
  
  
  Throughput &amp;amp; Speed
&lt;/h3&gt;

&lt;p&gt;From official documentation:[2]&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;PDF throughput&lt;/strong&gt;: ~&lt;strong&gt;1.86 pages/second&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Image throughput&lt;/strong&gt;: ~&lt;strong&gt;0.67 images/second&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This makes GLM-OCR viable for &lt;strong&gt;bulk-processing pipelines&lt;/strong&gt; (e.g., nightly jobs over large archives) even on modest hardware.&lt;/p&gt;

&lt;h3&gt;
  
  
  Precision Modes
&lt;/h3&gt;

&lt;p&gt;The official site highlights a &lt;strong&gt;PRECISION_MODE_ON&lt;/strong&gt;, claiming up to &lt;strong&gt;99.9% precision&lt;/strong&gt; in that mode.[1] While exact metric definitions are not fully spelled out, the key takeaway is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;NORMAL mode&lt;/strong&gt; – better for speed, good default.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;PRECISION mode&lt;/strong&gt; – slower but &lt;strong&gt;very high character-level and structure-level precision&lt;/strong&gt;; ideal for legal and financial workloads.
&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;⚠️ &lt;strong&gt;Note&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Exact accuracy numbers for every domain (e.g., receipts vs. scientific PDFs) are not fully broken down publicly, so you should &lt;strong&gt;run your own evaluation&lt;/strong&gt; on representative samples before committing to production.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Where Can You Use GLM-OCR? Real-World Use Cases
&lt;/h2&gt;

&lt;p&gt;The official site and surrounding ecosystem emphasize several primary verticals.[1][5][6]&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Academic Research &amp;amp; Scientific Documents
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Scenario:&lt;/strong&gt;  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Scan of old papers, lecture notes, research articles with formulas, footnotes, and references.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;What GLM-OCR does well:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Captures &lt;strong&gt;complex citations&lt;/strong&gt;, references and section structures.
&lt;/li&gt;
&lt;li&gt;Converts equations into &lt;strong&gt;LaTeX&lt;/strong&gt;, compatible with LaTeX editors and scientific workflows.[1]
&lt;/li&gt;
&lt;li&gt;Outputs to &lt;strong&gt;semantic Markdown&lt;/strong&gt;, enabling easy ingestion into note-taking tools, static sites, or knowledge bases.&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;💡 &lt;strong&gt;Pro Tip&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Use GLM-OCR’s LaTeX + Markdown output to feed directly into &lt;strong&gt;Markdown-based scientific writing setups&lt;/strong&gt; (e.g., Obsidian + Pandoc, MkDocs, or Jupyter notebooks).&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h3&gt;
  
  
  2. Financial Analysis &amp;amp; Reporting
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Scenario:&lt;/strong&gt;  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Financial statements, regulatory filings, multi-page reports with nested tables and complex footnotes.[1][5][6]&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Strengths:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Precisely parses &lt;strong&gt;multi-level tables&lt;/strong&gt; (e.g., consolidated statements, multi-year comparisons).[1]
&lt;/li&gt;
&lt;li&gt;Extracts &lt;strong&gt;hierarchical headings&lt;/strong&gt; and explanatory notes in a structured format.
&lt;/li&gt;
&lt;li&gt;Makes it easier to transform scanned PDFs into &lt;strong&gt;Excel-ready or database-ready&lt;/strong&gt; representations via JSON/Markdown tables.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Examples of workflows include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;ETL pipelines that convert scanned PDFs → JSON → data warehouse.
&lt;/li&gt;
&lt;li&gt;Risk analysis teams ingesting disparate PDF reports into analytics systems.&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  3. Legal Documentation
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Scenario:&lt;/strong&gt;  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Contracts, NDAs, case files, court filings with complex clause structures and cross-references.[1][5][6]&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;What GLM-OCR enables:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Detects and preserves &lt;strong&gt;clause numbering&lt;/strong&gt;, section/subsection hierarchies.
&lt;/li&gt;
&lt;li&gt;Helps identify &lt;strong&gt;critical sections&lt;/strong&gt; (Termination, Liability, Governing Law, etc.) for downstream review models.
&lt;/li&gt;
&lt;li&gt;Structure-first output makes it easier for LLMs to &lt;strong&gt;run contract analysis&lt;/strong&gt; (e.g., deviation detection, obligation extraction).&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;✅ &lt;strong&gt;Best Practice&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Always run GLM-OCR &lt;strong&gt;locally or via a private deployment&lt;/strong&gt; for sensitive legal material to maintain confidentiality.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h3&gt;
  
  
  4. Developer &amp;amp; Product Integrations
&lt;/h3&gt;

&lt;p&gt;GLM-OCR is built to be embedded into &lt;strong&gt;applications, platforms, and AI agents&lt;/strong&gt;.[1][2][3]&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;APIs and SDKs&lt;/strong&gt;: Developer documentation describes API-based usage patterns suited for SaaS tools.[1][2]
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;VLLM / SGLang support&lt;/strong&gt;: Enables &lt;strong&gt;batched, high-throughput inference&lt;/strong&gt; in production.[2][3]
&lt;/li&gt;
&lt;li&gt;Can serve as the &lt;strong&gt;document parsing front-end&lt;/strong&gt; for AI agents, RAG systems, and analytics platforms.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Typical integration scenarios:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;OCR microservice inside a larger AI workflow.
&lt;/li&gt;
&lt;li&gt;First step in an &lt;strong&gt;LLM-powered document QA or summarization pipeline&lt;/strong&gt;.
&lt;/li&gt;
&lt;li&gt;Replacement for brittle regex-based PDF parsers.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  GLM-OCR vs Other OCR Models (PaddleOCR, DeepSeekOCR, VLMs)
&lt;/h2&gt;

&lt;p&gt;While we do not have a single, fully standardized cross-benchmark table that includes GLM-OCR, PaddleOCR, DeepSeekOCR, and proprietary APIs, available information allows some &lt;strong&gt;high-level comparison&lt;/strong&gt;.[2][3][4][7]&lt;/p&gt;

&lt;h3&gt;
  
  
  Conceptual Comparison
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Aspect&lt;/th&gt;
&lt;th&gt;GLM-OCR&lt;/th&gt;
&lt;th&gt;PaddleOCR / PaddleOCR-VL&lt;/th&gt;
&lt;th&gt;DeepSeekOCR&lt;/th&gt;
&lt;th&gt;Large VLMs (e.g., GPT-4 Vision)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Model Size&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;~0.9B&lt;/strong&gt;[2][3]&lt;/td&gt;
&lt;td&gt;Typically 3B–9B for VLM variants[7]&lt;/td&gt;
&lt;td&gt;~2B–6B (varies by config)&lt;/td&gt;
&lt;td&gt;70B+ parameters&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;License&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;Apache-2.0 open weights&lt;/strong&gt;[1][3]&lt;/td&gt;
&lt;td&gt;Largely open-source&lt;/td&gt;
&lt;td&gt;Partly open / commercial&lt;/td&gt;
&lt;td&gt;Closed-source, API-only&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Focus&lt;/td&gt;
&lt;td&gt;Complex document OCR &amp;amp; structure&lt;/td&gt;
&lt;td&gt;General OCR + layout&lt;/td&gt;
&lt;td&gt;Advanced OCR &amp;amp; layout&lt;/td&gt;
&lt;td&gt;General-purpose vision-language&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Output Format&lt;/td&gt;
&lt;td&gt;Markdown, LaTeX, &lt;strong&gt;JSON&lt;/strong&gt;[1][2]&lt;/td&gt;
&lt;td&gt;Text, some layout info&lt;/td&gt;
&lt;td&gt;Text + layout&lt;/td&gt;
&lt;td&gt;Text, limited structure&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Benchmark (OmniDocBench)&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;~94.6 (V1.5)&lt;/strong&gt;[2][3][4]&lt;/td&gt;
&lt;td&gt;Lower scores reported in threads&lt;/td&gt;
&lt;td&gt;Competitive but below GLM-OCR[4][7]&lt;/td&gt;
&lt;td&gt;Strong overall but proprietary&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Throughput&lt;/td&gt;
&lt;td&gt;~1.86 pages/s (PDF)[2]&lt;/td&gt;
&lt;td&gt;Generally slower (larger models)&lt;/td&gt;
&lt;td&gt;Moderate&lt;/td&gt;
&lt;td&gt;Typically slower and more expensive&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Ease of Private Deploy&lt;/td&gt;
&lt;td&gt;High (VLLM, SGLang, Docker)[2][3]&lt;/td&gt;
&lt;td&gt;Medium (framework-specific)&lt;/td&gt;
&lt;td&gt;Varies&lt;/td&gt;
&lt;td&gt;Low (API-only)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;blockquote&gt;
&lt;p&gt;⚠️ &lt;strong&gt;Important&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
The exact numeric comparisons (e.g., speed vs. PaddleOCR/DeepSeekOCR) are sparse in authoritative public benchmarks. Treat relative claims (like “faster than X”) as &lt;strong&gt;directional&lt;/strong&gt;, and always run &lt;strong&gt;your own benchmarks&lt;/strong&gt; on your hardware and documents.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  How to Deploy and Use GLM-OCR in Practice
&lt;/h2&gt;

&lt;p&gt;From the gathered docs and ecosystem resources, GLM-OCR supports several typical deployment patterns.[1][2][3]&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Local / On-Prem Deployment
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Recommended when:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You process &lt;strong&gt;sensitive documents&lt;/strong&gt; (legal, medical, financial).
&lt;/li&gt;
&lt;li&gt;You want &lt;strong&gt;full control&lt;/strong&gt; over hardware and latency.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Common options:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;VLLM backend&lt;/strong&gt;: For batched high-throughput inference.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;SGLang integration&lt;/strong&gt;: Fine-grained orchestration of multimodal calls.[2][3]
&lt;/li&gt;
&lt;li&gt;Docker containers for packaged deployment.&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  2. Cloud or Hosted API
&lt;/h3&gt;

&lt;p&gt;Some sites (e.g., glmocr.com) expose GLM-OCR via a &lt;strong&gt;hosted API&lt;/strong&gt;, often with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Free tiers (e.g., a limited number of pages/month).[1]
&lt;/li&gt;
&lt;li&gt;Simple file upload endpoints returning structured Markdown/JSON.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is best when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You want to &lt;strong&gt;prototype quickly&lt;/strong&gt;.
&lt;/li&gt;
&lt;li&gt;You don’t yet have GPU infrastructure.&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  3. Hybrid Workflows
&lt;/h3&gt;

&lt;p&gt;A common pattern:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Prototype&lt;/strong&gt; using a public/hosted API.
&lt;/li&gt;
&lt;li&gt;Once satisfied, &lt;strong&gt;migrate&lt;/strong&gt; to self-hosted GLM-OCR (via VLLM/SGLang/Docker) for cost and privacy control.
&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Step-by-Step Workflow: From PDF/Image to Structured Data
&lt;/h2&gt;

&lt;p&gt;Below is an implementation-oriented view of how GLM-OCR fits into a typical pipeline, abstracting away specific SDK details:&lt;/p&gt;

&lt;h3&gt;
  
  
  📊 Conceptual Flow (Mermaid-style)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;graph TD
    A[Upload PDF/Image] --&amp;gt; B[Visual Ingestion (CogViT Encoder)]
    B --&amp;gt; C[Multimodal Reasoning (GLM-V)]
    C --&amp;gt; D[Structured Generation (Markdown / JSON / LaTeX)]
    D --&amp;gt; E[Post-Processing (Parsing, ETL, Analytics)]
    E --&amp;gt; F[Downstream Apps (Search, RAG, Dashboards)]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Typical Implementation Steps
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Input acquisition&lt;/strong&gt;  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Accept PDF or image upload from UI, CLI, or batch directory.
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Call GLM-OCR&lt;/strong&gt;  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Send document to GLM-OCR via:

&lt;ul&gt;
&lt;li&gt;Local inference server (VLLM/SGLang)
&lt;/li&gt;
&lt;li&gt;Hosted API endpoint
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Choose output format&lt;/strong&gt;  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;markdown&lt;/code&gt; for human-readable exports
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;json&lt;/code&gt; for extraction-focused workflows
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;latex&lt;/code&gt; for math-heavy documents
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Post-process structured output&lt;/strong&gt;  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Parse JSON or Markdown to extract:

&lt;ul&gt;
&lt;li&gt;Tables → CSV/SQL/Excel
&lt;/li&gt;
&lt;li&gt;Sections → knowledge base chunks
&lt;/li&gt;
&lt;li&gt;Formulas → rendered math or symbolic processing
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Integrate with downstream systems&lt;/strong&gt;  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Search indices, analytics pipelines, RAG systems, or compliance checks.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;blockquote&gt;
&lt;p&gt;✅ &lt;strong&gt;Best Practice&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Always &lt;strong&gt;store the raw GLM-OCR output&lt;/strong&gt; (Markdown/JSON) alongside your processed data for future reprocessing as your downstream logic evolves.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Best Practices, Tips, and Caveats
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;💡 &lt;strong&gt;Professional Tip – Pick the Right Output for the Job&lt;/strong&gt;  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use &lt;strong&gt;JSON&lt;/strong&gt; for automation and AI agent pipelines.
&lt;/li&gt;
&lt;li&gt;Use &lt;strong&gt;Markdown + LaTeX&lt;/strong&gt; for human review, documentation, and publishing.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;✅ &lt;strong&gt;Best Practice – Use Precision Mode for High-Stakes Documents&lt;/strong&gt;  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Turn on &lt;strong&gt;PRECISION_MODE_ON&lt;/strong&gt; for:

&lt;ul&gt;
&lt;li&gt;Legal contracts
&lt;/li&gt;
&lt;li&gt;Financial statements
&lt;/li&gt;
&lt;li&gt;Regulatory filings
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Accept the extra latency in exchange for &lt;strong&gt;maximum accuracy&lt;/strong&gt;.[1]&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;⚠️ &lt;strong&gt;Caution – Preprocess Low-Quality Scans&lt;/strong&gt;  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;For low-DPI or heavily skewed scans, apply:

&lt;ul&gt;
&lt;li&gt;Binarization
&lt;/li&gt;
&lt;li&gt;De-skewing
&lt;/li&gt;
&lt;li&gt;Noise reduction
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;This helps the visual encoder and improves downstream structure detection.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;💡 &lt;strong&gt;Pro Tip – Combine with LLMs for End-to-End Automation&lt;/strong&gt;  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use GLM-OCR for &lt;strong&gt;reliable structure extraction&lt;/strong&gt;.
&lt;/li&gt;
&lt;li&gt;Then feed its Markdown/JSON output into a general-purpose LLM for:

&lt;ul&gt;
&lt;li&gt;Summaries
&lt;/li&gt;
&lt;li&gt;Risk flags
&lt;/li&gt;
&lt;li&gt;Q&amp;amp;A and report generation.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  🤔 Frequently Asked Questions (FAQ)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Q1: What makes GLM-OCR different from traditional OCR engines?
&lt;/h3&gt;

&lt;p&gt;GLM-OCR is built as a &lt;strong&gt;multimodal vision-language model&lt;/strong&gt; instead of a pure character recognizer. It doesn’t just read characters; it &lt;strong&gt;understands document structure and context&lt;/strong&gt;, and generates &lt;strong&gt;semantic outputs&lt;/strong&gt; (Markdown, JSON, LaTeX) that are far easier to use in modern AI and data pipelines.[1][2]&lt;/p&gt;




&lt;h3&gt;
  
  
  Q2: Can GLM-OCR handle handwriting and messy scans?
&lt;/h3&gt;

&lt;p&gt;Yes, to a significant extent. GLM-OCR uses &lt;strong&gt;contextual perception&lt;/strong&gt; and &lt;strong&gt;multi-token prediction&lt;/strong&gt; to interpret handwriting and noisy images by looking at surrounding text and document structure.[1] While extreme cases may still require manual correction, it outperforms many traditional OCR tools in &lt;strong&gt;handwritten annotations and marginalia&lt;/strong&gt;.&lt;/p&gt;




&lt;h3&gt;
  
  
  Q3: Is GLM-OCR suitable for on-prem or air-gapped deployments?
&lt;/h3&gt;

&lt;p&gt;Yes. The model is released as &lt;strong&gt;open weights under the Apache-2.0 license&lt;/strong&gt;, and documentation highlights support for VLLM/SGLang and local inference, making it suitable for &lt;strong&gt;on-prem, air-gapped, and highly regulated environments&lt;/strong&gt;.[1][2][3]&lt;/p&gt;




&lt;h3&gt;
  
  
  Q4: How does GLM-OCR scale to large volumes of documents?
&lt;/h3&gt;

&lt;p&gt;The &lt;strong&gt;0.9B parameter size&lt;/strong&gt; is relatively small for a multimodal model, which helps keep inference efficient.[2][3] Official docs report throughput around &lt;strong&gt;1.86 pages/second for PDFs&lt;/strong&gt; and &lt;strong&gt;0.67 images/second&lt;/strong&gt; on capable hardware.[2] For large-scale workloads, you can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Run multiple instances behind a load balancer.
&lt;/li&gt;
&lt;li&gt;Use VLLM/SGLang for batched inference.
&lt;/li&gt;
&lt;li&gt;Schedule batch jobs for nightly or off-peak processing.&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  Q5: When should I choose GLM-OCR over proprietary cloud OCR (Google, Azure, etc.)?
&lt;/h3&gt;

&lt;p&gt;Choose GLM-OCR when you need:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Full control&lt;/strong&gt; over data (on-prem, private cloud).
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Open-source licensing&lt;/strong&gt; and freedom from per-page vendor lock-in.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Rich structure&lt;/strong&gt; (Markdown/JSON/LaTeX) rather than just text.
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Proprietary clouds may still be preferable if you rely heavily on adjacent proprietary services (e.g., integrated form detection, doc AI suites), but GLM-OCR offers a strong balance of &lt;strong&gt;accuracy, openness, and cost control&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Conclusion and Recommended Next Steps
&lt;/h2&gt;

&lt;p&gt;GLM-OCR is a &lt;strong&gt;modern, lightweight, and open&lt;/strong&gt; solution to one of the toughest problems in AI: &lt;strong&gt;turning messy, real-world documents into structured, actionable data&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why it stands out:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;SOTA accuracy on OmniDocBench V1.5 (~94.62)&lt;/strong&gt; with only &lt;strong&gt;0.9B parameters&lt;/strong&gt;.[2][3][4]
&lt;/li&gt;
&lt;li&gt;Focus on &lt;strong&gt;structure-first outputs&lt;/strong&gt; (Markdown, JSON, LaTeX), ideal for LLMs and data pipelines.[1]
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Open Apache-2.0 license&lt;/strong&gt; and &lt;strong&gt;open weights&lt;/strong&gt;, making it deployable almost anywhere.[1][3]
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Actionable Next Steps
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Evaluate GLM-OCR on your own documents&lt;/strong&gt;  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Gather a representative sample of PDFs/images from your domain.
&lt;/li&gt;
&lt;li&gt;Run them through GLM-OCR (hosted API or local deployment) and compare with your current OCR.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Prototype a minimal pipeline&lt;/strong&gt;  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Input → GLM-OCR → JSON/Markdown → simple downstream script (e.g., CSV export or LLM summary).
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Plan deployment strategy&lt;/strong&gt;  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;For sensitive data: choose &lt;strong&gt;on-prem VLLM/SGLang&lt;/strong&gt; or Docker-based deployment.
&lt;/li&gt;
&lt;li&gt;For quick start: use a &lt;strong&gt;hosted API&lt;/strong&gt; if available.
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Iterate on post-processing&lt;/strong&gt;  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Refine how you parse tables, formulas, and headings from GLM-OCR’s structured output.
&lt;/li&gt;
&lt;li&gt;Add QA checks and confidence thresholds for high-stakes use cases.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Integrate with your AI stack&lt;/strong&gt;  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Feed GLM-OCR output into RAG pipelines, contract analyzers, financial models, or data warehouses.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;By deliberately combining &lt;strong&gt;GLM-OCR’s structured OCR&lt;/strong&gt; with your existing analytics and LLM stack, you can turn unstructured archives—research, contracts, reports—into a &lt;strong&gt;searchable, analyzable, AI-ready knowledge layer&lt;/strong&gt; with far less engineering effort than traditional OCR pipelines.&lt;/p&gt;




&lt;h3&gt;
  
  
  References
&lt;/h3&gt;

&lt;p&gt;[1] GLM-OCR Official Site. &lt;a href="https://glmocr.com/" rel="noopener noreferrer"&gt;https://glmocr.com/&lt;/a&gt;&lt;br&gt;&lt;br&gt;
[2] GLM-OCR – Z.AI Developer Document. &lt;a href="https://docs.z.ai/guides/vlm/glm-ocr" rel="noopener noreferrer"&gt;https://docs.z.ai/guides/vlm/glm-ocr&lt;/a&gt;&lt;br&gt;&lt;br&gt;
[3] zai-org/GLM-OCR (Hugging Face). &lt;a href="https://huggingface.co/zai-org/GLM-OCR" rel="noopener noreferrer"&gt;https://huggingface.co/zai-org/GLM-OCR&lt;/a&gt;&lt;br&gt;&lt;br&gt;
[4] GLM-OCR Benchmark Mentions – X / News Articles. &lt;a href="https://news.aibase.com/news/25178" rel="noopener noreferrer"&gt;https://news.aibase.com/news/25178&lt;/a&gt;&lt;br&gt;&lt;br&gt;
[5] GLM-OCR Use Cases – Official Site Sections. &lt;a href="https://glmocr.com/" rel="noopener noreferrer"&gt;https://glmocr.com/&lt;/a&gt;&lt;br&gt;&lt;br&gt;
[6] GLM OCR | AI Model (Use Case Overview). &lt;a href="https://story321.com/ru/models/zhipu/glm-ocr" rel="noopener noreferrer"&gt;https://story321.com/ru/models/zhipu/glm-ocr&lt;/a&gt;&lt;br&gt;&lt;br&gt;
[7] PaddleOCR-VL and DeepSeekOCR Benchmark Discussions. &lt;a href="https://huggingface.co/PaddlePaddle/PaddleOCR-VL" rel="noopener noreferrer"&gt;https://huggingface.co/PaddlePaddle/PaddleOCR-VL&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;&lt;a href="https://a2aprotocol.ai/blog/2026-glm-ocr-complete-guide" rel="noopener noreferrer"&gt;GLM-OCR for Next-Gen Document Understanding&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
      <category>opensource</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>Moltworker Complete Guide 2026: Running Personal AI Agents on Cloudflare Without Hardware</title>
      <dc:creator>Sienna</dc:creator>
      <pubDate>Fri, 30 Jan 2026 04:47:04 +0000</pubDate>
      <link>https://forem.com/sienna/moltworker-complete-guide-2026-running-personal-ai-agents-on-cloudflare-without-hardware-4a99</link>
      <guid>https://forem.com/sienna/moltworker-complete-guide-2026-running-personal-ai-agents-on-cloudflare-without-hardware-4a99</guid>
      <description>&lt;h2&gt;
  
  
  🎯 Core Takeaways (TL;DR)
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;No Hardware Required&lt;/strong&gt;: Moltworker enables running Moltbot AI agents on Cloudflare's infrastructure, eliminating the need for dedicated Mac minis or VPS servers&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Enterprise-Grade Security&lt;/strong&gt;: Built-in Cloudflare Access authentication, device pairing, and sandbox isolation protect your data and APIs&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cost-Effective&lt;/strong&gt;: Starting at $5/month for Workers Paid plan, with generous free tiers for AI Gateway, R2 storage, and Browser Rendering&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Full Feature Parity&lt;/strong&gt;: Supports all major Moltbot integrations including Telegram, Discord, Slack, and browser automation capabilities&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Production-Ready&lt;/strong&gt;: Leverages Cloudflare's global network with automatic scaling, persistent storage via R2, and comprehensive observability&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Table of Contents
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;What is Moltworker?&lt;/li&gt;
&lt;li&gt;Why Moltworker Matters: The Hardware Problem&lt;/li&gt;
&lt;li&gt;Moltworker Architecture Deep Dive&lt;/li&gt;
&lt;li&gt;How to Deploy Moltworker: Step-by-Step Guide&lt;/li&gt;
&lt;li&gt;Security Considerations and Best Practices&lt;/li&gt;
&lt;li&gt;Moltworker vs Traditional Moltbot Deployment&lt;/li&gt;
&lt;li&gt;Community Feedback and Concerns&lt;/li&gt;
&lt;li&gt;Real-World Use Cases&lt;/li&gt;
&lt;li&gt;Troubleshooting Common Issues&lt;/li&gt;
&lt;li&gt;FAQ&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  What is Moltworker? {#what-is-moltworker}
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Moltworker&lt;/strong&gt; is an open-source middleware solution developed by Cloudflare that enables running Moltbot (formerly Clawdbot) personal AI agents on Cloudflare's Developer Platform instead of dedicated hardware. Released in January 2026, moltworker represents a paradigm shift in how developers can deploy and manage AI agents.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Moltbot Foundation
&lt;/h3&gt;

&lt;p&gt;Before understanding moltworker, it's essential to know what Moltbot is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Personal AI Assistant&lt;/strong&gt;: Moltbot is an open-source AI agent designed to act as a personal assistant&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multi-Platform Integration&lt;/strong&gt;: Supports Telegram, Discord, Slack, and web-based control interfaces&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Extensible Architecture&lt;/strong&gt;: Features a gateway architecture with persistent conversations and agent runtime&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Self-Hosted by Design&lt;/strong&gt;: Originally required users to run it on their own hardware (Mac minis, Linux servers, etc.)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  How Moltworker Transforms Deployment
&lt;/h3&gt;

&lt;p&gt;Moltworker adapts Moltbot to run entirely on Cloudflare's infrastructure through:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Moltworker entrypoint example&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;getSandbox&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@cloudflare/sandbox&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;default&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="na"&gt;request&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;Request&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;env&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;Env&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;Response&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;sandbox&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;getSandbox&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Sandbox&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;user-123&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="c1"&gt;// Moltbot runs inside this isolated sandbox&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;sandbox&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;exec&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;moltbot-gateway start&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;Response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;status&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;running&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;💡 &lt;strong&gt;Key Insight&lt;/strong&gt;&lt;br&gt;
Moltworker is not a fork of Moltbot—it's a compatibility layer that allows the standard Moltbot runtime to operate in Cloudflare's serverless environment.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Why Moltworker Matters: The Hardware Problem {#why-moltworker-matters}
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The Mac Mini Rush of 2026
&lt;/h3&gt;

&lt;p&gt;When Moltbot gained viral attention in January 2026, a peculiar phenomenon occurred: developers rushed to purchase Mac minis specifically to run their personal AI agents. This created several problems:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Challenge&lt;/th&gt;
&lt;th&gt;Traditional Approach&lt;/th&gt;
&lt;th&gt;Moltworker Solution&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Initial Cost&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$599+ for Mac mini&lt;/td&gt;
&lt;td&gt;$5/month Workers plan&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Maintenance&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Manual updates, monitoring&lt;/td&gt;
&lt;td&gt;Automatic platform updates&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Uptime&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Dependent on home internet&lt;/td&gt;
&lt;td&gt;99.9%+ SLA on global network&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Security&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;DIY firewall, VPN setup&lt;/td&gt;
&lt;td&gt;Built-in Access, Zero Trust&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Scaling&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Buy more hardware&lt;/td&gt;
&lt;td&gt;Automatic resource allocation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Power Consumption&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;24/7 electricity costs&lt;/td&gt;
&lt;td&gt;Pay-per-use serverless model&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  The Cloudflare Advantage
&lt;/h3&gt;

&lt;p&gt;Moltworker leverages Cloudflare's Developer Platform, which has evolved to support complex applications:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Node.js Compatibility&lt;/strong&gt;: 98.5% of top 1,000 NPM packages now work natively in Workers&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Sandbox SDK&lt;/strong&gt;: Secure isolated environments for running untrusted code&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Global Network&lt;/strong&gt;: 300+ data centers ensure low-latency access worldwide&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Integrated Services&lt;/strong&gt;: AI Gateway, R2 storage, Browser Rendering, and Zero Trust Access work seamlessly together&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;⚠️ &lt;strong&gt;Important Note&lt;/strong&gt;&lt;br&gt;
Moltworker is currently a proof-of-concept, not an official Cloudflare product. It's maintained as an open-source project to showcase platform capabilities.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Moltworker Architecture Deep Dive {#architecture-deep-dive}
&lt;/h2&gt;

&lt;h3&gt;
  
  
  High-Level System Design
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;graph TB
    A[User Request] --&amp;gt; B[Cloudflare Worker]
    B --&amp;gt; C[Cloudflare Access Auth]
    C --&amp;gt; D[Admin UI / API Router]
    D --&amp;gt; E[Sandbox Container]
    E --&amp;gt; F[Moltbot Gateway Runtime]
    F --&amp;gt; G[AI Gateway]
    F --&amp;gt; H[Browser Rendering]
    F --&amp;gt; I[R2 Storage]
    G --&amp;gt; J[Anthropic Claude]
    H --&amp;gt; K[Headless Chrome]
    I --&amp;gt; L[Persistent Data]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Core Components Explained
&lt;/h3&gt;

&lt;h4&gt;
  
  
  1. Entrypoint Worker (API Router &amp;amp; Proxy)
&lt;/h4&gt;

&lt;p&gt;The moltworker Worker serves multiple roles:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Simplified moltworker routing logic&lt;/span&gt;
&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;default&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;request&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;URL&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="c1"&gt;// Admin UI routes&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;pathname&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;startsWith&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/_admin/&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;handleAdminUI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;request&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="c1"&gt;// CDP proxy for browser automation&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;pathname&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;startsWith&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/cdp/&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;handleCDPProxy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;request&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="c1"&gt;// WebSocket connection to Moltbot&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;pathname&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/ws&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;handleWebSocket&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;request&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="c1"&gt;// Control UI&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;handleControlUI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;request&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Key Responsibilities:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Route HTTP/WebSocket requests to appropriate handlers&lt;/li&gt;
&lt;li&gt;Proxy Chrome DevTools Protocol (CDP) commands to Browser Rendering&lt;/li&gt;
&lt;li&gt;Serve the administrative interface&lt;/li&gt;
&lt;li&gt;Validate authentication tokens and Access JWTs&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  2. Cloudflare Sandbox Container
&lt;/h4&gt;

&lt;p&gt;The Sandbox SDK provides the isolated environment where Moltbot actually runs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Creating and managing the sandbox&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;sandbox&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;getSandbox&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Sandbox&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;userId&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;// Install Moltbot in the container&lt;/span&gt;
&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;sandbox&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;exec&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;npm install -g moltbot&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;// Mount R2 bucket for persistence&lt;/span&gt;
&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;sandbox&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;mountBucket&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;R2_BUCKET&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/root/.moltbot&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;// Start the gateway process&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;sandbox&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;startProcess&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;moltbot-gateway&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;env&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;ANTHROPIC_API_KEY&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ANTHROPIC_API_KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;GATEWAY_TOKEN&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;MOLTBOT_GATEWAY_TOKEN&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Sandbox Features:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Isolation&lt;/strong&gt;: Each user gets their own secure container&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Filesystem Access&lt;/strong&gt;: Full read/write capabilities within the container&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Process Management&lt;/strong&gt;: Run background services like the Moltbot gateway&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Network Access&lt;/strong&gt;: Controlled outbound connections to AI providers and chat platforms&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  3. AI Gateway Integration
&lt;/h4&gt;

&lt;p&gt;Moltworker routes all AI model requests through Cloudflare AI Gateway:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Configuration for AI Gateway&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;ANTHROPIC_BASE_URL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}/anthropic"&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;AI_GATEWAY_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"your-anthropic-key"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Benefits of AI Gateway:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Cost Tracking&lt;/strong&gt;: Monitor spending across all AI providers&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Request Analytics&lt;/strong&gt;: Detailed logs of model usage patterns&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Caching&lt;/strong&gt;: Reduce redundant API calls with intelligent caching&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fallbacks&lt;/strong&gt;: Automatic failover to alternative models/providers&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Unified Billing&lt;/strong&gt;: Use Cloudflare credits instead of managing multiple provider accounts&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  4. R2 Persistent Storage
&lt;/h4&gt;

&lt;p&gt;Moltworker implements a backup/restore pattern for data persistence:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Backup process (runs every 5 minutes via cron)&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;backupToR2&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;sandbox&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;r2Bucket&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="c1"&gt;// Tar the Moltbot config directory&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;sandbox&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;exec&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;tar -czf /tmp/backup.tar.gz /root/.moltbot&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="c1"&gt;// Upload to R2&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;backupData&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;sandbox&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;readFile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/tmp/backup.tar.gz&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;r2Bucket&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;put&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;moltbot-backup.tar.gz&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;backupData&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Backup completed at&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;toISOString&lt;/span&gt;&lt;span class="p"&gt;());&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;// Restore on container startup&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;restoreFromR2&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;sandbox&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;r2Bucket&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;backup&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;r2Bucket&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;moltbot-backup.tar.gz&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;backup&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;sandbox&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;writeFile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/tmp/restore.tar.gz&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;backup&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;arrayBuffer&lt;/span&gt;&lt;span class="p"&gt;());&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;sandbox&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;exec&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;tar -xzf /tmp/restore.tar.gz -C /&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Restored from R2 backup&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;What Gets Persisted:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Paired device configurations&lt;/li&gt;
&lt;li&gt;Conversation history and context&lt;/li&gt;
&lt;li&gt;Custom skills and tools created by the agent&lt;/li&gt;
&lt;li&gt;User preferences and settings&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  5. Browser Rendering via CDP Proxy
&lt;/h4&gt;

&lt;p&gt;One of moltworker's most innovative features is the CDP (Chrome DevTools Protocol) proxy:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// CDP proxy implementation&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;handleCDPProxy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;request&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;URL&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;pathname&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/cdp/json/version&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// Return browser version info from Browser Rendering&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;browser&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;puppeteer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;launch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;BROWSER&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;Response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;browserVersion&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;browser&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;version&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;pathname&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;startsWith&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/cdp/devtools/browser/&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// Upgrade to WebSocket and proxy CDP commands&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;browserId&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;pathname&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;pop&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;handleCDPWebSocket&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;request&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;browserId&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;How It Works:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Moltbot inside the sandbox connects to &lt;code&gt;localhost:9222&lt;/code&gt; (standard CDP port)&lt;/li&gt;
&lt;li&gt;Moltworker intercepts these connections and proxies them to the Worker&lt;/li&gt;
&lt;li&gt;The Worker forwards CDP commands to Cloudflare Browser Rendering&lt;/li&gt;
&lt;li&gt;Browser Rendering executes commands on a real Chromium instance&lt;/li&gt;
&lt;li&gt;Responses flow back through the proxy to Moltbot&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This architecture allows Moltbot to perform browser automation without running Chromium inside the container, saving resources and improving security.&lt;/p&gt;

&lt;h4&gt;
  
  
  6. Zero Trust Access Authentication
&lt;/h4&gt;

&lt;p&gt;Moltworker uses Cloudflare Access to protect sensitive routes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Access JWT validation&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;validateAccessToken&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;request&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;token&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Cf-Access-Jwt-Assertion&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;token&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;valid&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;error&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;No Access token&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="c1"&gt;// Verify JWT signature and audience&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;payload&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;verifyJWT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;token&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;audience&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;CF_ACCESS_AUD&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;issuer&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`https://&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;CF_ACCESS_TEAM_DOMAIN&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;/cdn-cgi/access/certs`&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;

  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;valid&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;user&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;email&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Protected Routes:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;/_admin/*&lt;/code&gt; - Device management interface&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;/api/*&lt;/code&gt; - Administrative API endpoints&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;/debug/*&lt;/code&gt; - Diagnostic and logging endpoints&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Data Flow Example: User Message to AI Response
&lt;/h3&gt;

&lt;p&gt;Let's trace a complete request through moltworker:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;User sends message&lt;/strong&gt; via Telegram to their paired bot&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Telegram webhook&lt;/strong&gt; hits the moltworker Worker endpoint&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Worker validates&lt;/strong&gt; the gateway token and device pairing status&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Message forwarded&lt;/strong&gt; to Moltbot gateway running in the Sandbox&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Moltbot processes&lt;/strong&gt; the message and determines it needs AI assistance&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AI request sent&lt;/strong&gt; through AI Gateway to Anthropic Claude&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Claude response&lt;/strong&gt; flows back through AI Gateway (logged for analytics)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Moltbot formats&lt;/strong&gt; the response and sends it back to the Worker&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Worker delivers&lt;/strong&gt; the response to Telegram's API&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;User receives&lt;/strong&gt; the AI-generated message in their chat&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Throughout this flow:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;All AI requests are logged in AI Gateway for cost tracking&lt;/li&gt;
&lt;li&gt;Conversation context is stored in R2 for persistence&lt;/li&gt;
&lt;li&gt;Access logs are recorded in Zero Trust for security auditing&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  How to Deploy Moltworker: Step-by-Step Guide {#deployment-guide}
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Prerequisites
&lt;/h3&gt;

&lt;p&gt;Before deploying moltworker, ensure you have:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;✅ Cloudflare account with Workers Paid plan ($5/month)&lt;/li&gt;
&lt;li&gt;✅ Anthropic API key (or plan to use AI Gateway Unified Billing)&lt;/li&gt;
&lt;li&gt;✅ Node.js 18+ and npm installed locally&lt;/li&gt;
&lt;li&gt;✅ Wrangler CLI installed (&lt;code&gt;npm install -g wrangler&lt;/code&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Step 1: Clone and Install
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Clone the moltworker repository&lt;/span&gt;
git clone https://github.com/cloudflare/moltworker.git
&lt;span class="nb"&gt;cd &lt;/span&gt;moltworker

&lt;span class="c"&gt;# Install dependencies&lt;/span&gt;
npm &lt;span class="nb"&gt;install&lt;/span&gt;

&lt;span class="c"&gt;# Authenticate with Cloudflare&lt;/span&gt;
wrangler login
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 2: Configure AI Provider
&lt;/h3&gt;

&lt;p&gt;Choose between direct Anthropic access or AI Gateway:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Option A: Direct Anthropic Access&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Set your Anthropic API key&lt;/span&gt;
npx wrangler secret put ANTHROPIC_API_KEY
&lt;span class="c"&gt;# Paste your key when prompted&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Option B: AI Gateway (Recommended)&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Create an AI Gateway in the Cloudflare dashboard&lt;/span&gt;
&lt;span class="c"&gt;# Then configure the secrets:&lt;/span&gt;

npx wrangler secret put AI_GATEWAY_API_KEY
&lt;span class="c"&gt;# Enter your Anthropic key&lt;/span&gt;

npx wrangler secret put AI_GATEWAY_BASE_URL
&lt;span class="c"&gt;# Enter: https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}/anthropic&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;💡 &lt;strong&gt;Pro Tip&lt;/strong&gt;&lt;br&gt;
Using AI Gateway provides better observability and cost control. You can switch between providers without redeploying moltworker.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Step 3: Generate Gateway Token
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Generate a secure random token&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;MOLTBOT_GATEWAY_TOKEN&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;openssl rand &lt;span class="nt"&gt;-base64&lt;/span&gt; 32 | &lt;span class="nb"&gt;tr&lt;/span&gt; &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'=+/'&lt;/span&gt; | &lt;span class="nb"&gt;head&lt;/span&gt; &lt;span class="nt"&gt;-c&lt;/span&gt; 32&lt;span class="si"&gt;)&lt;/span&gt;

&lt;span class="c"&gt;# Display and save this token - you'll need it to access the UI&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Your gateway token: &lt;/span&gt;&lt;span class="nv"&gt;$MOLTBOT_GATEWAY_TOKEN&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;

&lt;span class="c"&gt;# Set it as a secret&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$MOLTBOT_GATEWAY_TOKEN&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; | npx wrangler secret put MOLTBOT_GATEWAY_TOKEN
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;⚠️ &lt;strong&gt;Critical&lt;/strong&gt;: Save this token securely. You'll need it to access the Control UI at &lt;code&gt;https://your-worker.workers.dev/?token=YOUR_GATEWAY_TOKEN&lt;/code&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 4: Deploy Moltworker
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Deploy to Cloudflare Workers&lt;/span&gt;
npm run deploy

&lt;span class="c"&gt;# Output will show your worker URL:&lt;/span&gt;
&lt;span class="c"&gt;# Published moltbot-sandbox (X.XX sec)&lt;/span&gt;
&lt;span class="c"&gt;#   https://moltbot-sandbox.your-subdomain.workers.dev&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 5: Configure Cloudflare Access
&lt;/h3&gt;

&lt;p&gt;To use the admin UI, you must set up authentication:&lt;/p&gt;

&lt;h4&gt;
  
  
  Enable Access on workers.dev
&lt;/h4&gt;

&lt;ol&gt;
&lt;li&gt;Go to &lt;a href="https://dash.cloudflare.com/?to=/:account/workers-and-pages" rel="noopener noreferrer"&gt;Workers &amp;amp; Pages dashboard&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Select your deployed Worker (e.g., &lt;code&gt;moltbot-sandbox&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Navigate to &lt;strong&gt;Settings&lt;/strong&gt; → &lt;strong&gt;Domains &amp;amp; Routes&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;In the &lt;code&gt;workers.dev&lt;/code&gt; row, click the menu (&lt;code&gt;...&lt;/code&gt;) → &lt;strong&gt;Enable Cloudflare Access&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Click &lt;strong&gt;Manage Cloudflare Access&lt;/strong&gt; to configure authentication:

&lt;ul&gt;
&lt;li&gt;Add your email to the allow list&lt;/li&gt;
&lt;li&gt;Or configure identity providers (Google, GitHub, etc.)&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Copy the &lt;strong&gt;Application Audience (AUD)&lt;/strong&gt; tag&lt;/li&gt;
&lt;/ol&gt;

&lt;h4&gt;
  
  
  Set Access Secrets
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Your Cloudflare Access team domain&lt;/span&gt;
npx wrangler secret put CF_ACCESS_TEAM_DOMAIN
&lt;span class="c"&gt;# Enter: myteam.cloudflareaccess.com&lt;/span&gt;

&lt;span class="c"&gt;# Application Audience tag from Access settings&lt;/span&gt;
npx wrangler secret put CF_ACCESS_AUD
&lt;span class="c"&gt;# Paste the AUD value you copied&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Redeploy
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm run deploy
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 6: Enable R2 Persistent Storage (Recommended)
&lt;/h3&gt;

&lt;p&gt;Without R2, your data is lost when the container restarts.&lt;/p&gt;

&lt;h4&gt;
  
  
  Create R2 API Token
&lt;/h4&gt;

&lt;ol&gt;
&lt;li&gt;Go to &lt;strong&gt;R2&lt;/strong&gt; → &lt;strong&gt;Overview&lt;/strong&gt; in Cloudflare dashboard&lt;/li&gt;
&lt;li&gt;Click &lt;strong&gt;Manage R2 API Tokens&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Create token with &lt;strong&gt;Object Read &amp;amp; Write&lt;/strong&gt; permissions&lt;/li&gt;
&lt;li&gt;Select the &lt;code&gt;moltbot-data&lt;/code&gt; bucket (auto-created on first deploy)&lt;/li&gt;
&lt;li&gt;Copy &lt;strong&gt;Access Key ID&lt;/strong&gt; and &lt;strong&gt;Secret Access Key&lt;/strong&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;h4&gt;
  
  
  Configure R2 Secrets
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# R2 credentials&lt;/span&gt;
npx wrangler secret put R2_ACCESS_KEY_ID
npx wrangler secret put R2_SECRET_ACCESS_KEY

&lt;span class="c"&gt;# Your Cloudflare Account ID&lt;/span&gt;
npx wrangler secret put CF_ACCOUNT_ID
&lt;span class="c"&gt;# Find this in dashboard: Click account menu → Copy Account ID&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Redeploy
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm run deploy
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 7: Pair Your First Device
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Visit the admin UI: &lt;code&gt;https://your-worker.workers.dev/_admin/&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Authenticate via Cloudflare Access&lt;/li&gt;
&lt;li&gt;Open the Control UI in a new tab: &lt;code&gt;https://your-worker.workers.dev/?token=YOUR_GATEWAY_TOKEN&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;The Control UI will show "Waiting for pairing approval..."&lt;/li&gt;
&lt;li&gt;Return to the admin UI and approve the pending device&lt;/li&gt;
&lt;li&gt;Your Control UI is now connected!&lt;/li&gt;
&lt;/ol&gt;

&lt;blockquote&gt;
&lt;p&gt;⏱️ &lt;strong&gt;Note&lt;/strong&gt;: The first request may take 1-2 minutes while the container starts. Subsequent requests are much faster.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Step 8: Optional Integrations
&lt;/h3&gt;

&lt;h4&gt;
  
  
  Add Telegram Bot
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Create a bot via @BotFather on Telegram&lt;/span&gt;
&lt;span class="c"&gt;# Copy the bot token and set it:&lt;/span&gt;
npx wrangler secret put TELEGRAM_BOT_TOKEN
npm run deploy
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Add Discord Bot
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Create a bot in Discord Developer Portal&lt;/span&gt;
&lt;span class="c"&gt;# Copy the bot token and set it:&lt;/span&gt;
npx wrangler secret put DISCORD_BOT_TOKEN
npm run deploy
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Add Slack Bot
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Create a Slack app with bot capabilities&lt;/span&gt;
&lt;span class="c"&gt;# Copy both tokens and set them:&lt;/span&gt;
npx wrangler secret put SLACK_BOT_TOKEN
npx wrangler secret put SLACK_APP_TOKEN
npm run deploy
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Enable Browser Automation
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Generate a secure secret for CDP authentication&lt;/span&gt;
npx wrangler secret put CDP_SECRET
&lt;span class="c"&gt;# Enter a random string&lt;/span&gt;

&lt;span class="c"&gt;# Set your worker's public URL&lt;/span&gt;
npx wrangler secret put WORKER_URL
&lt;span class="c"&gt;# Enter: https://your-worker.workers.dev&lt;/span&gt;

npm run deploy
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Deployment Checklist
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;[ ] Workers Paid plan active ($5/month)&lt;/li&gt;
&lt;li&gt;[ ] AI provider configured (Anthropic or AI Gateway)&lt;/li&gt;
&lt;li&gt;[ ] Gateway token generated and saved&lt;/li&gt;
&lt;li&gt;[ ] Cloudflare Access enabled and configured&lt;/li&gt;
&lt;li&gt;[ ] R2 storage configured (optional but recommended)&lt;/li&gt;
&lt;li&gt;[ ] First device paired via admin UI&lt;/li&gt;
&lt;li&gt;[ ] Chat integrations configured (optional)&lt;/li&gt;
&lt;li&gt;[ ] Browser automation enabled (optional)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Security Considerations and Best Practices {#security-best-practices}
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Multi-Layer Authentication Architecture
&lt;/h3&gt;

&lt;p&gt;Moltworker implements defense-in-depth with three authentication layers:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;th&gt;Protects Against&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Gateway Token&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Access to Control UI&lt;/td&gt;
&lt;td&gt;Unauthorized UI access&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Device Pairing&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Per-device authorization&lt;/td&gt;
&lt;td&gt;Rogue clients, stolen tokens&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Cloudflare Access&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Admin UI protection&lt;/td&gt;
&lt;td&gt;Unauthorized administration&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h4&gt;
  
  
  How They Work Together
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;graph TD
    A[User Request] --&amp;gt; B{Has Gateway Token?}
    B --&amp;gt;|No| C[Reject: 401 Unauthorized]
    B --&amp;gt;|Yes| D{Device Paired?}
    D --&amp;gt;|No| E[Show Pairing Pending]
    D --&amp;gt;|Yes| F{Admin Route?}
    F --&amp;gt;|No| G[Allow: Normal Operation]
    F --&amp;gt;|Yes| H{Valid Access JWT?}
    H --&amp;gt;|No| I[Redirect to Access Login]
    H --&amp;gt;|Yes| J[Allow: Admin Access]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Critical Security Warnings
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;⚠️ &lt;strong&gt;Prompt Injection Vulnerability&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Moltbot is susceptible to prompt injection attacks via:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Email content (if email integration is enabled)&lt;/li&gt;
&lt;li&gt;Web pages visited by the browser automation&lt;/li&gt;
&lt;li&gt;Chat messages from untrusted sources&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Mitigation&lt;/strong&gt;: Only enable integrations you trust. Never connect moltworker to public email addresses or allow it to browse untrusted websites.&lt;/p&gt;

&lt;p&gt;⚠️ &lt;strong&gt;Supply Chain Risk&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;As discussed on Hacker News, moltbot's dependency chain and rapid development pose supply chain attack risks. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Mitigation&lt;/strong&gt;: &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Pin specific versions in your deployment&lt;/li&gt;
&lt;li&gt;Review the closed PRs and issues before updating&lt;/li&gt;
&lt;li&gt;Consider forking for production use&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;⚠️ &lt;strong&gt;Data Privacy&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;While moltworker runs in isolated sandboxes, Cloudflare can technically access:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;All data passing through Workers&lt;/li&gt;
&lt;li&gt;Content stored in R2 buckets&lt;/li&gt;
&lt;li&gt;Logs and analytics data&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Mitigation&lt;/strong&gt;: Don't use moltworker for sensitive data if you require zero-knowledge architecture. For maximum privacy, self-host Moltbot on your own hardware.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Best Practices for Secure Deployment
&lt;/h3&gt;

&lt;h4&gt;
  
  
  1. Token Management
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Generate cryptographically secure tokens&lt;/span&gt;
openssl rand &lt;span class="nt"&gt;-base64&lt;/span&gt; 32 | &lt;span class="nb"&gt;tr&lt;/span&gt; &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'=+/'&lt;/span&gt; | &lt;span class="nb"&gt;head&lt;/span&gt; &lt;span class="nt"&gt;-c&lt;/span&gt; 32

&lt;span class="c"&gt;# Rotate tokens regularly (every 90 days)&lt;/span&gt;
npx wrangler secret put MOLTBOT_GATEWAY_TOKEN
npm run deploy
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  2. Access Policy Configuration
&lt;/h4&gt;

&lt;p&gt;Configure strict Access policies in the Zero Trust dashboard:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Example Access policy&lt;/span&gt;
&lt;span class="na"&gt;Name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Moltworker Admin Access&lt;/span&gt;
&lt;span class="na"&gt;Application Domain&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;moltbot-sandbox.your-subdomain.workers.dev&lt;/span&gt;
&lt;span class="na"&gt;Paths&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;/_admin/*&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;/api/*&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;/debug/*&lt;/span&gt;
&lt;span class="na"&gt;Policy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;Include&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;Email&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;your-email@example.com&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;Exclude&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;Everyone&lt;/span&gt;
&lt;span class="na"&gt;Session Duration&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;24 hours&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  3. Container Lifecycle Management
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# For production: Keep container always alive&lt;/span&gt;
&lt;span class="c"&gt;# (Default behavior, no action needed)&lt;/span&gt;

&lt;span class="c"&gt;# For development/testing: Allow container to sleep&lt;/span&gt;
npx wrangler secret put SANDBOX_SLEEP_AFTER
&lt;span class="c"&gt;# Enter: 1h (container sleeps after 1 hour of inactivity)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  4. Monitoring and Alerting
&lt;/h4&gt;

&lt;p&gt;Enable comprehensive logging:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Enable debug routes (only in development)&lt;/span&gt;
npx wrangler secret put DEBUG_ROUTES
&lt;span class="c"&gt;# Enter: true&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Access debug endpoints:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;GET /debug/processes&lt;/code&gt; - List container processes&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;GET /debug/logs?id=&amp;lt;process_id&amp;gt;&lt;/code&gt; - View process logs&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;GET /debug/version&lt;/code&gt; - Check moltbot and container versions&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  5. Network Isolation
&lt;/h4&gt;

&lt;p&gt;Moltworker automatically isolates:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Each user's sandbox from others&lt;/li&gt;
&lt;li&gt;Outbound connections to only approved destinations&lt;/li&gt;
&lt;li&gt;Inbound connections through the Worker proxy only&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  6. Secrets Rotation Schedule
&lt;/h4&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Secret&lt;/th&gt;
&lt;th&gt;Rotation Frequency&lt;/th&gt;
&lt;th&gt;Impact&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Gateway Token&lt;/td&gt;
&lt;td&gt;Every 90 days&lt;/td&gt;
&lt;td&gt;Requires re-pairing all devices&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AI Provider Keys&lt;/td&gt;
&lt;td&gt;Every 180 days&lt;/td&gt;
&lt;td&gt;Transparent to users&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;R2 Access Keys&lt;/td&gt;
&lt;td&gt;Every 180 days&lt;/td&gt;
&lt;td&gt;Requires redeployment&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CDP Secret&lt;/td&gt;
&lt;td&gt;Every 90 days&lt;/td&gt;
&lt;td&gt;Breaks browser automation until updated&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Security Audit Checklist
&lt;/h3&gt;

&lt;p&gt;Before deploying moltworker to production:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;[ ] All secrets are set via &lt;code&gt;wrangler secret put&lt;/code&gt; (never in code)&lt;/li&gt;
&lt;li&gt;[ ] Cloudflare Access is enabled on all admin routes&lt;/li&gt;
&lt;li&gt;[ ] Gateway token is cryptographically random (32+ characters)&lt;/li&gt;
&lt;li&gt;[ ] Device pairing is enabled (not using &lt;code&gt;DEV_MODE=true&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;[ ] R2 bucket has restricted access (not public)&lt;/li&gt;
&lt;li&gt;[ ] Only necessary chat integrations are enabled&lt;/li&gt;
&lt;li&gt;[ ] Email integration is disabled (high prompt injection risk)&lt;/li&gt;
&lt;li&gt;[ ] Debug routes are disabled in production&lt;/li&gt;
&lt;li&gt;[ ] Access logs are being monitored&lt;/li&gt;
&lt;li&gt;[ ] Backup strategy is in place for R2 data&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Moltworker vs Traditional Moltbot Deployment {#comparison}
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Comprehensive Comparison Matrix
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Aspect&lt;/th&gt;
&lt;th&gt;Self-Hosted Moltbot&lt;/th&gt;
&lt;th&gt;Moltworker on Cloudflare&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Initial Setup Cost&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$599+ (Mac mini) or $5-20/month (VPS)&lt;/td&gt;
&lt;td&gt;$5/month (Workers Paid)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Ongoing Costs&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Electricity + internet + maintenance&lt;/td&gt;
&lt;td&gt;Usage-based (typically $5-15/month)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Setup Complexity&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;High (Docker, networking, security)&lt;/td&gt;
&lt;td&gt;Medium (mostly configuration)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Time to Deploy&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;2-4 hours&lt;/td&gt;
&lt;td&gt;15-30 minutes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Maintenance Burden&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Manual updates, monitoring, backups&lt;/td&gt;
&lt;td&gt;Automatic platform updates&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Uptime&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Depends on home internet/VPS provider&lt;/td&gt;
&lt;td&gt;99.9%+ on global network&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Geographic Latency&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Single location&lt;/td&gt;
&lt;td&gt;300+ edge locations&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Scaling&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Buy more hardware&lt;/td&gt;
&lt;td&gt;Automatic&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Security Management&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;DIY (firewall, VPN, patches)&lt;/td&gt;
&lt;td&gt;Built-in (Access, sandboxing)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Data Privacy&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Full control (zero-knowledge possible)&lt;/td&gt;
&lt;td&gt;Cloudflare can access data&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Backup Strategy&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Manual or scripted&lt;/td&gt;
&lt;td&gt;Automatic R2 sync every 5 min&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Observability&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Self-configured (Grafana, etc.)&lt;/td&gt;
&lt;td&gt;Built-in (AI Gateway, Access logs)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Browser Automation&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Local Chromium (resource-heavy)&lt;/td&gt;
&lt;td&gt;Browser Rendering API (offloaded)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Local Integrations&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Full access (smart home, local files)&lt;/td&gt;
&lt;td&gt;Limited (cloud-accessible only)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Customization&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Unlimited (full system access)&lt;/td&gt;
&lt;td&gt;Limited to container environment&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Vendor Lock-in&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;None&lt;/td&gt;
&lt;td&gt;Cloudflare-specific&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  When to Choose Self-Hosted Moltbot
&lt;/h3&gt;

&lt;p&gt;Choose traditional self-hosting if you:&lt;/p&gt;

&lt;p&gt;✅ Require absolute data privacy (zero-knowledge architecture)&lt;br&gt;&lt;br&gt;
✅ Need local network integrations (smart home devices, NAS, etc.)&lt;br&gt;&lt;br&gt;
✅ Want unlimited customization and system-level access&lt;br&gt;&lt;br&gt;
✅ Have reliable infrastructure and technical expertise&lt;br&gt;&lt;br&gt;
✅ Prefer one-time hardware costs over recurring subscriptions&lt;br&gt;&lt;br&gt;
✅ Need to comply with data residency regulations  &lt;/p&gt;
&lt;h3&gt;
  
  
  When to Choose Moltworker
&lt;/h3&gt;

&lt;p&gt;Choose moltworker if you:&lt;/p&gt;

&lt;p&gt;✅ Want minimal setup and maintenance overhead&lt;br&gt;&lt;br&gt;
✅ Need high availability and global low-latency access&lt;br&gt;&lt;br&gt;
✅ Prefer usage-based pricing over hardware investment&lt;br&gt;&lt;br&gt;
✅ Value integrated observability and security features&lt;br&gt;&lt;br&gt;
✅ Don't require local network integrations&lt;br&gt;&lt;br&gt;
✅ Want automatic scaling and platform updates&lt;br&gt;&lt;br&gt;
✅ Are comfortable with Cloudflare accessing your data  &lt;/p&gt;
&lt;h3&gt;
  
  
  Performance Benchmarks
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Self-Hosted (Mac Mini M2)&lt;/th&gt;
&lt;th&gt;Moltworker (Cloudflare)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Cold Start&lt;/td&gt;
&lt;td&gt;N/A (always running)&lt;/td&gt;
&lt;td&gt;60-120 seconds&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Warm Request&lt;/td&gt;
&lt;td&gt;50-200ms&lt;/td&gt;
&lt;td&gt;100-300ms (global avg)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AI Response Time&lt;/td&gt;
&lt;td&gt;Depends on internet&lt;/td&gt;
&lt;td&gt;Optimized via AI Gateway&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Browser Automation&lt;/td&gt;
&lt;td&gt;2-5 seconds&lt;/td&gt;
&lt;td&gt;3-6 seconds&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Storage I/O&lt;/td&gt;
&lt;td&gt;Local SSD (very fast)&lt;/td&gt;
&lt;td&gt;R2 (network-dependent)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Concurrent Users&lt;/td&gt;
&lt;td&gt;Limited by hardware&lt;/td&gt;
&lt;td&gt;Unlimited (auto-scaling)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;
&lt;h3&gt;
  
  
  Cost Analysis: 12-Month TCO
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Self-Hosted (Mac Mini M2)&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Hardware: $599 (one-time)&lt;/li&gt;
&lt;li&gt;Electricity: $50/year (assuming 10W average)&lt;/li&gt;
&lt;li&gt;Internet: $0 (assuming existing connection)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Total Year 1&lt;/strong&gt;: $649&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Total Year 2+&lt;/strong&gt;: $50/year&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Self-Hosted (VPS - Hetzner CCX13)&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;VPS: $15/month × 12 = $180/year&lt;/li&gt;
&lt;li&gt;Backups: $5/month × 12 = $60/year&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Total per year&lt;/strong&gt;: $240/year&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Moltworker (Cloudflare)&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Workers Paid: $5/month × 12 = $60/year&lt;/li&gt;
&lt;li&gt;AI Gateway: $0 (free tier)&lt;/li&gt;
&lt;li&gt;R2 Storage: $0.75/month × 12 = $9/year (assuming 50GB)&lt;/li&gt;
&lt;li&gt;Browser Rendering: $5/month × 12 = $60/year (1M requests)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Total per year&lt;/strong&gt;: $129/year&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;💡 &lt;strong&gt;Cost Winner&lt;/strong&gt;: Moltworker is most cost-effective for years 1-2. Self-hosted Mac mini becomes cheaper after 2-3 years if you already have reliable internet.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2&gt;
  
  
  Community Feedback and Concerns {#community-feedback}
&lt;/h2&gt;

&lt;p&gt;The Hacker News discussion revealed significant community concerns about both Moltbot and moltworker:&lt;/p&gt;
&lt;h3&gt;
  
  
  Positive Reception
&lt;/h3&gt;

&lt;p&gt;✅ &lt;strong&gt;Cloudflare's Node.js Progress&lt;/strong&gt;: Developers praised the 98.5% NPM package compatibility&lt;br&gt;&lt;br&gt;
✅ &lt;strong&gt;Sandbox SDK Utility&lt;/strong&gt;: Many saw value in the Sandbox SDK beyond just moltworker&lt;br&gt;&lt;br&gt;
✅ &lt;strong&gt;Deployment Simplicity&lt;/strong&gt;: Appreciated the reduction in setup complexity vs self-hosting&lt;br&gt;&lt;br&gt;
✅ &lt;strong&gt;Built-in Observability&lt;/strong&gt;: AI Gateway analytics and Access logs were highlighted as valuable  &lt;/p&gt;
&lt;h3&gt;
  
  
  Major Concerns
&lt;/h3&gt;
&lt;h4&gt;
  
  
  1. Astroturfing and Hype Cycle
&lt;/h4&gt;

&lt;blockquote&gt;
&lt;p&gt;"There is so much branding and 'look at our success' marketing that this project comes off as heavily astro-turfed."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Community members noted:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Excessive social media promotion from non-technical accounts&lt;/li&gt;
&lt;li&gt;Comparison to crypto-era hype cycles&lt;/li&gt;
&lt;li&gt;Concerns about an eventual startup pivot or acquisition&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Cloudflare's Response&lt;/strong&gt;: Moltworker is explicitly labeled as a proof-of-concept, not a product. The goal is showcasing platform capabilities, not monetizing moltbot.&lt;/p&gt;
&lt;h4&gt;
  
  
  2. Security Vulnerabilities
&lt;/h4&gt;

&lt;blockquote&gt;
&lt;p&gt;"Clawdbot/Moltbot looks to be a supply-chain attack waiting to happen."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Specific security issues raised:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Prompt Injection&lt;/strong&gt;: No protection against malicious prompts in emails/websites&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Supply Chain&lt;/strong&gt;: Rapid development with low technical oversight&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Insecure Deployments&lt;/strong&gt;: Many users exposing dashboards without authentication&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Email Integration Risk&lt;/strong&gt;: Connecting to email creates attack vector&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Mitigation Strategies&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Moltworker enforces device pairing by default&lt;/li&gt;
&lt;li&gt;Cloudflare Access protects admin routes&lt;/li&gt;
&lt;li&gt;Sandbox isolation limits blast radius&lt;/li&gt;
&lt;li&gt;Documentation warns against email integration&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;
  
  
  3. Overhyped Capabilities
&lt;/h4&gt;

&lt;blockquote&gt;
&lt;p&gt;"Ultimately its a convenience wrapper that makes it easy to wire up Claude or ChatGPT to a chat platform like discord, but its claiming to be far more revolutionary."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Critics argued:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Core functionality is just API wrappers&lt;/li&gt;
&lt;li&gt;Similar tools exist (e.g., &lt;a href="https://github.com/clharman/afk-code" rel="noopener noreferrer"&gt;afk-code&lt;/a&gt; - 2-minute setup)&lt;/li&gt;
&lt;li&gt;The "agent" label is misleading marketing&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Counterpoint&lt;/strong&gt;: While the core is API integration, the value lies in:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Persistent memory and context&lt;/li&gt;
&lt;li&gt;Self-modifying capabilities (agents creating their own tools)&lt;/li&gt;
&lt;li&gt;Multi-platform gateway architecture&lt;/li&gt;
&lt;li&gt;Production-ready deployment on global infrastructure&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;
  
  
  4. Data Privacy Concerns
&lt;/h4&gt;

&lt;blockquote&gt;
&lt;p&gt;"All home/local integrations are gone. Data needs to be stored in the cloud. No thanks."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Valid concerns about moltworker specifically:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Cloudflare can access all data passing through Workers&lt;/li&gt;
&lt;li&gt;R2 storage is not zero-knowledge&lt;/li&gt;
&lt;li&gt;Loss of local network integrations (smart home, NAS, etc.)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;When This Matters&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Handling sensitive personal data&lt;/li&gt;
&lt;li&gt;Compliance requirements (HIPAA, GDPR with strict data residency)&lt;/li&gt;
&lt;li&gt;Desire for complete control over infrastructure&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;When It Doesn't&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Using public AI providers anyway (Anthropic already sees your data)&lt;/li&gt;
&lt;li&gt;Trusting Cloudflare's security practices&lt;/li&gt;
&lt;li&gt;Prioritizing convenience over absolute privacy&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;
  
  
  5. Technical Skepticism
&lt;/h4&gt;

&lt;blockquote&gt;
&lt;p&gt;"Just look at the closed PRs of their project. General technical knowledge is so low it's insane."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Community members reviewed moltbot's GitHub and found:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Low-quality contributions&lt;/li&gt;
&lt;li&gt;Security vulnerabilities in closed PRs&lt;/li&gt;
&lt;li&gt;Lack of code review rigor&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Important Context&lt;/strong&gt;: Moltbot is a rapidly evolving open-source project. Moltworker adapts it but doesn't control its development. Users should:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Review moltbot's security posture before deploying&lt;/li&gt;
&lt;li&gt;Consider forking for production use&lt;/li&gt;
&lt;li&gt;Monitor the project's issue tracker&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Balanced Perspective
&lt;/h3&gt;

&lt;p&gt;The community consensus seems to be:&lt;/p&gt;

&lt;p&gt;✅ &lt;strong&gt;Moltworker as a Platform Demo&lt;/strong&gt;: Excellent showcase of Cloudflare's capabilities&lt;br&gt;&lt;br&gt;
✅ &lt;strong&gt;Sandbox SDK Value&lt;/strong&gt;: Useful beyond just moltbot&lt;br&gt;&lt;br&gt;
⚠️ &lt;strong&gt;Moltbot Maturity&lt;/strong&gt;: Treat as experimental, not production-ready&lt;br&gt;&lt;br&gt;
⚠️ &lt;strong&gt;Security Posture&lt;/strong&gt;: Requires careful configuration and ongoing monitoring&lt;br&gt;&lt;br&gt;
❌ &lt;strong&gt;Hype vs Reality&lt;/strong&gt;: Marketing outpaces technical substance  &lt;/p&gt;
&lt;h2&gt;
  
  
  Real-World Use Cases {#use-cases}
&lt;/h2&gt;

&lt;p&gt;Despite concerns, moltworker enables legitimate use cases when deployed responsibly:&lt;/p&gt;
&lt;h3&gt;
  
  
  1. Personal Productivity Assistant
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Scenario&lt;/strong&gt;: A developer wants an AI assistant that can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Manage calendar and reminders&lt;/li&gt;
&lt;li&gt;Answer questions about their codebase&lt;/li&gt;
&lt;li&gt;Summarize daily news and research papers&lt;/li&gt;
&lt;li&gt;Interact via Slack during work hours&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Moltworker Setup&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Enable Slack integration&lt;/span&gt;
npx wrangler secret put SLACK_BOT_TOKEN
npx wrangler secret put SLACK_APP_TOKEN

&lt;span class="c"&gt;# Configure with Anthropic Claude&lt;/span&gt;
npx wrangler secret put ANTHROPIC_API_KEY

&lt;span class="c"&gt;# Deploy&lt;/span&gt;
npm run deploy
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Benefits&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Always available (99.9% uptime)&lt;/li&gt;
&lt;li&gt;Responds quickly from nearest edge location&lt;/li&gt;
&lt;li&gt;AI Gateway tracks usage costs&lt;/li&gt;
&lt;li&gt;Conversation history persists in R2&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2. Automated Web Research
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Scenario&lt;/strong&gt;: A researcher needs to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Monitor specific websites for updates&lt;/li&gt;
&lt;li&gt;Extract data from multiple sources&lt;/li&gt;
&lt;li&gt;Take screenshots of web pages&lt;/li&gt;
&lt;li&gt;Compile findings into reports&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Moltworker Setup&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Enable browser automation&lt;/span&gt;
npx wrangler secret put CDP_SECRET
npx wrangler secret put WORKER_URL

&lt;span class="c"&gt;# Deploy with browser skill&lt;/span&gt;
npm run deploy
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Example Interaction&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User: "Check the top 5 posts on Hacker News and summarize them"

Moltbot: 
1. Opening news.ycombinator.com...
2. Taking screenshot...
3. Extracting post titles and links...
4. Summarizing each post...

Here are today's top stories:
1. [Title] - [Summary]
2. [Title] - [Summary]
...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3. Multi-Platform Customer Support Bot
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Scenario&lt;/strong&gt;: A small business wants to provide AI-powered support across:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Telegram for quick customer queries&lt;/li&gt;
&lt;li&gt;Discord for community support&lt;/li&gt;
&lt;li&gt;Web chat for website visitors&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Moltworker Setup&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Enable all chat platforms&lt;/span&gt;
npx wrangler secret put TELEGRAM_BOT_TOKEN
npx wrangler secret put DISCORD_BOT_TOKEN

&lt;span class="c"&gt;# Use AI Gateway for cost control&lt;/span&gt;
npx wrangler secret put AI_GATEWAY_API_KEY
npx wrangler secret put AI_GATEWAY_BASE_URL

npm run deploy
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Benefits&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Single AI agent serves all platforms&lt;/li&gt;
&lt;li&gt;Unified conversation history&lt;/li&gt;
&lt;li&gt;Cost tracking via AI Gateway&lt;/li&gt;
&lt;li&gt;Scales automatically with user growth&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  4. Development Team Assistant
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Scenario&lt;/strong&gt;: A software team wants an AI that can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Answer questions about internal documentation&lt;/li&gt;
&lt;li&gt;Generate code snippets&lt;/li&gt;
&lt;li&gt;Create diagrams and visualizations&lt;/li&gt;
&lt;li&gt;Interact via Discord during standups&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Moltworker Setup&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Configure Discord integration&lt;/span&gt;
npx wrangler secret put DISCORD_BOT_TOKEN

&lt;span class="c"&gt;# Use AI Gateway with fallbacks&lt;/span&gt;
&lt;span class="c"&gt;# Primary: Claude Sonnet, Fallback: GPT-4&lt;/span&gt;
npx wrangler secret put AI_GATEWAY_API_KEY
npx wrangler secret put AI_GATEWAY_BASE_URL

npm run deploy
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Security Considerations&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use Cloudflare Access to restrict admin UI to team members&lt;/li&gt;
&lt;li&gt;Device pairing ensures only approved team members can interact&lt;/li&gt;
&lt;li&gt;AI Gateway logs all requests for audit trails&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  5. Personal Finance Tracker (⚠️ High Risk)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Scenario&lt;/strong&gt;: A user wants an AI to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Monitor bank account balances&lt;/li&gt;
&lt;li&gt;Categorize transactions&lt;/li&gt;
&lt;li&gt;Provide spending insights&lt;/li&gt;
&lt;li&gt;Send alerts for unusual activity&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;⚠️ WARNING&lt;/strong&gt;: This use case is &lt;strong&gt;NOT RECOMMENDED&lt;/strong&gt; due to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Prompt injection risk (malicious emails could trigger unauthorized actions)&lt;/li&gt;
&lt;li&gt;Data privacy concerns (financial data in Cloudflare infrastructure)&lt;/li&gt;
&lt;li&gt;Lack of audit trail for financial decisions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;If You Must&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Never connect to email&lt;/li&gt;
&lt;li&gt;Use read-only API access to financial institutions&lt;/li&gt;
&lt;li&gt;Enable all security layers (Access, device pairing, gateway token)&lt;/li&gt;
&lt;li&gt;Regularly review AI Gateway logs for suspicious activity&lt;/li&gt;
&lt;li&gt;Consider self-hosting instead of moltworker&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Troubleshooting Common Issues {#troubleshooting}
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Container Startup Issues
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Problem&lt;/strong&gt;: "Gateway fails to start" or "Container timeout"&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Solutions&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# 1. Check that Containers are enabled&lt;/span&gt;
&lt;span class="c"&gt;# Visit: https://dash.cloudflare.com/?to=/:account/workers/containers&lt;/span&gt;

&lt;span class="c"&gt;# 2. Verify all required secrets are set&lt;/span&gt;
npx wrangler secret list

&lt;span class="c"&gt;# Required secrets:&lt;/span&gt;
&lt;span class="c"&gt;# - MOLTBOT_GATEWAY_TOKEN&lt;/span&gt;
&lt;span class="c"&gt;# - ANTHROPIC_API_KEY (or AI_GATEWAY_API_KEY + AI_GATEWAY_BASE_URL)&lt;/span&gt;

&lt;span class="c"&gt;# 3. Check deployment logs&lt;/span&gt;
npx wrangler &lt;span class="nb"&gt;tail&lt;/span&gt;

&lt;span class="c"&gt;# 4. Increase timeout (if needed)&lt;/span&gt;
&lt;span class="c"&gt;# Edit wrangler.toml:&lt;/span&gt;
&lt;span class="c"&gt;# [env.production]&lt;/span&gt;
&lt;span class="c"&gt;# compatibility_date = "2025-01-01"&lt;/span&gt;
&lt;span class="c"&gt;# [env.production.sandbox]&lt;/span&gt;
&lt;span class="c"&gt;# timeout = 300  # 5 minutes&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  R2 Storage Not Working
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Problem&lt;/strong&gt;: "Data lost after container restart" or "R2 not mounting"&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Solutions&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# 1. Verify all three R2 secrets are set&lt;/span&gt;
npx wrangler secret list | &lt;span class="nb"&gt;grep &lt;/span&gt;R2

&lt;span class="c"&gt;# Should show:&lt;/span&gt;
&lt;span class="c"&gt;# - R2_ACCESS_KEY_ID&lt;/span&gt;
&lt;span class="c"&gt;# - R2_SECRET_ACCESS_KEY&lt;/span&gt;
&lt;span class="c"&gt;# - CF_ACCOUNT_ID&lt;/span&gt;

&lt;span class="c"&gt;# 2. Check R2 bucket exists&lt;/span&gt;
npx wrangler r2 bucket list

&lt;span class="c"&gt;# Should show: moltbot-data&lt;/span&gt;

&lt;span class="c"&gt;# 3. Test R2 access manually&lt;/span&gt;
npx wrangler r2 object get moltbot-data/moltbot-backup.tar.gz &lt;span class="nt"&gt;--file&lt;/span&gt; test.tar.gz

&lt;span class="c"&gt;# 4. Trigger manual backup from admin UI&lt;/span&gt;
&lt;span class="c"&gt;# Visit: https://your-worker.workers.dev/_admin/&lt;/span&gt;
&lt;span class="c"&gt;# Click "Backup Now" button&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Note&lt;/strong&gt;: R2 mounting only works in production, not with &lt;code&gt;wrangler dev&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Cloudflare Access Issues
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Problem&lt;/strong&gt;: "Access denied on admin routes" or "Infinite redirect loop"&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Solutions&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# 1. Verify Access secrets are set&lt;/span&gt;
npx wrangler secret list | &lt;span class="nb"&gt;grep &lt;/span&gt;CF_ACCESS

&lt;span class="c"&gt;# Should show:&lt;/span&gt;
&lt;span class="c"&gt;# - CF_ACCESS_TEAM_DOMAIN&lt;/span&gt;
&lt;span class="c"&gt;# - CF_ACCESS_AUD&lt;/span&gt;

&lt;span class="c"&gt;# 2. Check Access application configuration&lt;/span&gt;
&lt;span class="c"&gt;# Visit: https://one.dash.cloudflare.com/&lt;/span&gt;
&lt;span class="c"&gt;# Navigate to: Access &amp;gt; Applications&lt;/span&gt;
&lt;span class="c"&gt;# Verify your worker URL is listed&lt;/span&gt;

&lt;span class="c"&gt;# 3. Ensure your email is in the allow list&lt;/span&gt;
&lt;span class="c"&gt;# In Access application settings:&lt;/span&gt;
&lt;span class="c"&gt;# Policies &amp;gt; Include &amp;gt; Emails &amp;gt; [your-email@example.com]&lt;/span&gt;

&lt;span class="c"&gt;# 4. Clear browser cookies and try again&lt;/span&gt;
&lt;span class="c"&gt;# Access uses cookies for authentication&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Device Pairing Problems
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Problem&lt;/strong&gt;: "Devices not appearing in admin UI" or "Pairing request stuck"&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Solutions&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# 1. Wait 10-15 seconds and refresh&lt;/span&gt;
&lt;span class="c"&gt;# Device list commands have WebSocket overhead&lt;/span&gt;

&lt;span class="c"&gt;# 2. Check gateway token is correct&lt;/span&gt;
&lt;span class="c"&gt;# In Control UI URL: ?token=YOUR_GATEWAY_TOKEN&lt;/span&gt;
&lt;span class="c"&gt;# Must match the secret you set&lt;/span&gt;

&lt;span class="c"&gt;# 3. Verify device pairing is enabled&lt;/span&gt;
npx wrangler secret list | &lt;span class="nb"&gt;grep &lt;/span&gt;DEV_MODE

&lt;span class="c"&gt;# Should NOT show DEV_MODE=true in production&lt;/span&gt;

&lt;span class="c"&gt;# 4. Check moltbot gateway logs&lt;/span&gt;
&lt;span class="c"&gt;# Visit: https://your-worker.workers.dev/debug/processes&lt;/span&gt;
&lt;span class="c"&gt;# Find the gateway process ID&lt;/span&gt;
&lt;span class="c"&gt;# Visit: https://your-worker.workers.dev/debug/logs?id=&amp;lt;process_id&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  WebSocket Connection Failures
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Problem&lt;/strong&gt;: "WebSocket connection failed" or "Control UI disconnects"&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Solutions&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# 1. Check if using wrangler dev (known limitation)&lt;/span&gt;
&lt;span class="c"&gt;# WebSocket proxying through sandbox has issues in local dev&lt;/span&gt;
&lt;span class="c"&gt;# Deploy to Cloudflare for full functionality&lt;/span&gt;

&lt;span class="c"&gt;# 2. Verify gateway token in WebSocket URL&lt;/span&gt;
&lt;span class="c"&gt;# Should be: wss://your-worker.workers.dev/ws?token=YOUR_GATEWAY_TOKEN&lt;/span&gt;

&lt;span class="c"&gt;# 3. Check browser console for errors&lt;/span&gt;
&lt;span class="c"&gt;# Look for CORS or authentication issues&lt;/span&gt;

&lt;span class="c"&gt;# 4. Test WebSocket endpoint directly&lt;/span&gt;
npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-g&lt;/span&gt; wscat
wscat &lt;span class="nt"&gt;-c&lt;/span&gt; &lt;span class="s2"&gt;"wss://your-worker.workers.dev/ws?token=YOUR_GATEWAY_TOKEN"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Browser Automation Issues
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Problem&lt;/strong&gt;: "CDP connection failed" or "Browser skill not working"&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Solutions&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# 1. Verify CDP secrets are set&lt;/span&gt;
npx wrangler secret list | &lt;span class="nb"&gt;grep &lt;/span&gt;CDP

&lt;span class="c"&gt;# Should show:&lt;/span&gt;
&lt;span class="c"&gt;# - CDP_SECRET&lt;/span&gt;
&lt;span class="c"&gt;# - WORKER_URL&lt;/span&gt;

&lt;span class="c"&gt;# 2. Test CDP endpoint directly&lt;/span&gt;
curl https://your-worker.workers.dev/cdp/json/version &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"CDP_SECRET: your-secret"&lt;/span&gt;

&lt;span class="c"&gt;# Should return browser version info&lt;/span&gt;

&lt;span class="c"&gt;# 3. Check Browser Rendering is enabled&lt;/span&gt;
&lt;span class="c"&gt;# Visit: https://dash.cloudflare.com/?to=/:account/workers/browser-rendering&lt;/span&gt;

&lt;span class="c"&gt;# 4. Verify browser skill is installed in container&lt;/span&gt;
&lt;span class="c"&gt;# Visit: https://your-worker.workers.dev/debug/processes&lt;/span&gt;
&lt;span class="c"&gt;# Look for browser-related processes&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Performance Issues
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Problem&lt;/strong&gt;: "Slow responses" or "Timeouts"&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Solutions&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# 1. Check AI Gateway analytics&lt;/span&gt;
&lt;span class="c"&gt;# Visit: https://dash.cloudflare.com/?to=/:account/ai/ai-gateway&lt;/span&gt;
&lt;span class="c"&gt;# Look for slow provider responses&lt;/span&gt;

&lt;span class="c"&gt;# 2. Configure container to never sleep&lt;/span&gt;
npx wrangler secret put SANDBOX_SLEEP_AFTER
&lt;span class="c"&gt;# Enter: never&lt;/span&gt;

&lt;span class="c"&gt;# 3. Enable caching in AI Gateway&lt;/span&gt;
&lt;span class="c"&gt;# In AI Gateway settings:&lt;/span&gt;
&lt;span class="c"&gt;# Enable "Cache responses" with appropriate TTL&lt;/span&gt;

&lt;span class="c"&gt;# 4. Monitor cold start times&lt;/span&gt;
npx wrangler &lt;span class="nb"&gt;tail&lt;/span&gt; &lt;span class="nt"&gt;--format&lt;/span&gt; pretty

&lt;span class="c"&gt;# Look for "Container starting" messages&lt;/span&gt;
&lt;span class="c"&gt;# First request after sleep takes 60-120 seconds&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Configuration Not Applying
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Problem&lt;/strong&gt;: "Config changes not working" or "Old settings persist"&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Solutions&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# 1. Bust the Docker build cache&lt;/span&gt;
&lt;span class="c"&gt;# Edit Dockerfile and change the cache bust comment:&lt;/span&gt;
&lt;span class="c"&gt;# Build cache bust: 2026-01-30-v2&lt;/span&gt;

&lt;span class="c"&gt;# 2. Force rebuild and redeploy&lt;/span&gt;
npm run deploy

&lt;span class="c"&gt;# 3. Restart the gateway process&lt;/span&gt;
&lt;span class="c"&gt;# Visit: https://your-worker.workers.dev/_admin/&lt;/span&gt;
&lt;span class="c"&gt;# Click "Restart Gateway" button&lt;/span&gt;

&lt;span class="c"&gt;# 4. Clear R2 backup and start fresh&lt;/span&gt;
npx wrangler r2 object delete moltbot-data/moltbot-backup.tar.gz
&lt;span class="c"&gt;# Then restart the container&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  FAQ {#faq}
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Q: Is moltworker production-ready?
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;A&lt;/strong&gt;: No. Moltworker is explicitly labeled as a proof-of-concept by Cloudflare. It demonstrates platform capabilities but is not an officially supported product. For production use:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Thoroughly review the security considerations&lt;/li&gt;
&lt;li&gt;Fork the repository and maintain your own version&lt;/li&gt;
&lt;li&gt;Implement additional monitoring and alerting&lt;/li&gt;
&lt;li&gt;Consider self-hosting Moltbot if you need guaranteed support&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Q: How much does moltworker cost to run?
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;A&lt;/strong&gt;: Typical monthly costs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Workers Paid Plan&lt;/strong&gt;: $5/month (required)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;R2 Storage&lt;/strong&gt;: $0.75/month (50GB storage + operations)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Browser Rendering&lt;/strong&gt;: $5/month (1M requests, $5/million after)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AI Gateway&lt;/strong&gt;: Free (no additional cost)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cloudflare Access&lt;/strong&gt;: Free (up to 50 users)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Total&lt;/strong&gt;: $10-15/month for typical usage&lt;/p&gt;

&lt;h3&gt;
  
  
  Q: Can I use moltworker with OpenAI instead of Anthropic?
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;A&lt;/strong&gt;: Yes. Moltbot supports multiple AI providers:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Set OpenAI API key&lt;/span&gt;
npx wrangler secret put OPENAI_API_KEY

&lt;span class="c"&gt;# Or use AI Gateway with OpenAI provider&lt;/span&gt;
npx wrangler secret put AI_GATEWAY_API_KEY
npx wrangler secret put AI_GATEWAY_BASE_URL
&lt;span class="c"&gt;# Enter: https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}/openai&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;AI Gateway makes it easy to switch between providers without code changes.&lt;/p&gt;

&lt;h3&gt;
  
  
  Q: How does moltworker handle data privacy?
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;A&lt;/strong&gt;: Data privacy considerations:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Cloudflare Access&lt;/strong&gt;: Cloudflare can access all data passing through Workers and stored in R2&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AI Providers&lt;/strong&gt;: Your conversations are sent to Anthropic/OpenAI (per their privacy policies)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Encryption&lt;/strong&gt;: Data in transit is encrypted (TLS), but Cloudflare can decrypt it&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Zero-Knowledge&lt;/strong&gt;: Not possible with moltworker architecture&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For maximum privacy, self-host Moltbot on your own hardware.&lt;/p&gt;

&lt;h3&gt;
  
  
  Q: Can I run moltworker locally for development?
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;A&lt;/strong&gt;: Yes, with limitations:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Create .dev.vars file&lt;/span&gt;
&lt;span class="nb"&gt;cat&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; .dev.vars &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt;
DEV_MODE=true
DEBUG_ROUTES=true
ANTHROPIC_API_KEY=your-key
&lt;/span&gt;&lt;span class="no"&gt;EOF

&lt;/span&gt;&lt;span class="c"&gt;# Run locally&lt;/span&gt;
npm run dev
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Limitations&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;WebSocket connections may not work reliably&lt;/li&gt;
&lt;li&gt;R2 mounting is not available in local dev&lt;/li&gt;
&lt;li&gt;Sandbox behavior differs from production&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For full functionality, deploy to Cloudflare.&lt;/p&gt;

&lt;h3&gt;
  
  
  Q: What happens if Cloudflare has an outage?
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;A&lt;/strong&gt;: During a Cloudflare outage:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Your moltworker instance will be unavailable&lt;/li&gt;
&lt;li&gt;No data is lost (R2 backups persist)&lt;/li&gt;
&lt;li&gt;Once service resumes, your agent automatically recovers&lt;/li&gt;
&lt;li&gt;Conversation history and paired devices remain intact&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Cloudflare's uptime is typically 99.9%+, but for critical applications, consider:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Multi-cloud deployment (self-hosted backup)&lt;/li&gt;
&lt;li&gt;Monitoring and alerting for outages&lt;/li&gt;
&lt;li&gt;Documented recovery procedures&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Q: Can I customize the Moltbot runtime in moltworker?
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;A&lt;/strong&gt;: Yes, but with constraints:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Custom Skills&lt;/strong&gt;: Add skills to the container via Dockerfile modifications&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Environment Variables&lt;/strong&gt;: Configure via wrangler secrets&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;System Packages&lt;/strong&gt;: Install via Dockerfile (apt-get, npm, etc.)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Limitations&lt;/strong&gt;: Cannot modify the underlying Workers Runtime or Sandbox SDK&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Example Dockerfile customization:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="c"&gt;# Add custom skill&lt;/span&gt;
&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; my-custom-skill /root/clawd/skills/my-custom-skill&lt;/span&gt;

&lt;span class="c"&gt;# Install additional packages&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;apt-get update &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; apt-get &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-y&lt;/span&gt; &lt;span class="se"&gt;\
&lt;/span&gt;    python3-pip &lt;span class="se"&gt;\
&lt;/span&gt;    ffmpeg

&lt;span class="c"&gt;# Install Python dependencies&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;pip3 &lt;span class="nb"&gt;install &lt;/span&gt;requests beautifulsoup4
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Q: How do I migrate from self-hosted Moltbot to moltworker?
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;A&lt;/strong&gt;: Migration steps:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Export existing data&lt;/strong&gt;:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;   &lt;span class="c"&gt;# On your self-hosted machine&lt;/span&gt;
   &lt;span class="nb"&gt;tar&lt;/span&gt; &lt;span class="nt"&gt;-czf&lt;/span&gt; moltbot-backup.tar.gz ~/.moltbot
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Deploy moltworker&lt;/strong&gt; (follow deployment guide above)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Upload backup to R2&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;   npx wrangler r2 object put moltbot-data/moltbot-backup.tar.gz &lt;span class="se"&gt;\&lt;/span&gt;
     &lt;span class="nt"&gt;--file&lt;/span&gt; moltbot-backup.tar.gz
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Restart container&lt;/strong&gt; to trigger restore:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;   &lt;span class="c"&gt;# Visit admin UI and click "Restart Gateway"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Re-pair devices&lt;/strong&gt; if device IDs changed&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Test thoroughly&lt;/strong&gt; before decommissioning self-hosted instance&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Q: Is moltworker vulnerable to prompt injection attacks?
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;A&lt;/strong&gt;: Yes. Both Moltbot and moltworker are susceptible to prompt injection via:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Email content (if email integration enabled)&lt;/li&gt;
&lt;li&gt;Web pages visited by browser automation&lt;/li&gt;
&lt;li&gt;Chat messages from untrusted sources&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Mitigation strategies&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Never enable email integration&lt;/li&gt;
&lt;li&gt;Only browse trusted websites&lt;/li&gt;
&lt;li&gt;Use device pairing to restrict access&lt;/li&gt;
&lt;li&gt;Monitor AI Gateway logs for suspicious activity&lt;/li&gt;
&lt;li&gt;Consider implementing prompt filtering (custom middleware)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;There is currently no foolproof defense against prompt injection in LLM-based agents.&lt;/p&gt;

&lt;h3&gt;
  
  
  Q: Can I use moltworker for commercial purposes?
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;A&lt;/strong&gt;: Check the licenses:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Moltworker&lt;/strong&gt;: Cloudflare's repository license (likely permissive, verify on GitHub)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Moltbot&lt;/strong&gt;: Check moltbot's license on their GitHub repository&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cloudflare Services&lt;/strong&gt;: Review Cloudflare's Terms of Service for commercial use&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For commercial deployments:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Consult with legal counsel&lt;/li&gt;
&lt;li&gt;Consider forking and maintaining your own version&lt;/li&gt;
&lt;li&gt;Implement proper monitoring and SLAs&lt;/li&gt;
&lt;li&gt;Ensure compliance with data protection regulations&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Q: How do I contribute to moltworker?
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;A&lt;/strong&gt;: Moltworker is open-source:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Fork the repository&lt;/strong&gt;: &lt;a href="https://github.com/cloudflare/moltworker" rel="noopener noreferrer"&gt;https://github.com/cloudflare/moltworker&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Create a feature branch&lt;/strong&gt;: &lt;code&gt;git checkout -b feature/my-improvement&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Make your changes&lt;/strong&gt; and test thoroughly&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Submit a pull request&lt;/strong&gt; with detailed description&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Engage with maintainers&lt;/strong&gt; on GitHub issues&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Cloudflare has indicated they'll monitor the repository for a while but it's not an officially supported product.&lt;/p&gt;

&lt;h3&gt;
  
  
  Q: What's the future of moltworker?
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;A&lt;/strong&gt;: As of January 2026, moltworker is a proof-of-concept. Possible futures:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Community Maintenance&lt;/strong&gt;: Becomes a community-driven project&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Official Product&lt;/strong&gt;: Cloudflare productizes it (unlikely based on current statements)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Upstream Contribution&lt;/strong&gt;: Features merged into official Moltbot&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Deprecation&lt;/strong&gt;: Project becomes unmaintained as Moltbot evolves&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For production use, plan for the possibility of maintaining your own fork.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Moltworker represents an innovative approach to deploying AI agents, leveraging Cloudflare's Developer Platform to eliminate hardware requirements while providing enterprise-grade security and observability. However, it's essential to approach it with realistic expectations:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Moltworker Excels At&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Demonstrating Cloudflare's platform capabilities&lt;/li&gt;
&lt;li&gt;Reducing deployment complexity vs self-hosting&lt;/li&gt;
&lt;li&gt;Providing integrated observability and security&lt;/li&gt;
&lt;li&gt;Enabling rapid experimentation with AI agents&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Moltworker Falls Short On&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Production-readiness and official support&lt;/li&gt;
&lt;li&gt;Absolute data privacy (Cloudflare can access data)&lt;/li&gt;
&lt;li&gt;Local network integrations (smart home, NAS, etc.)&lt;/li&gt;
&lt;li&gt;Protection against prompt injection attacks&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Key Takeaway&lt;/strong&gt;: Moltworker is an excellent proof-of-concept and learning tool, but requires careful security configuration and realistic expectations about its maturity. For sensitive data or mission-critical applications, traditional self-hosting may be more appropriate.&lt;/p&gt;

&lt;h3&gt;
  
  
  Next Steps
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Experiment&lt;/strong&gt;: Deploy moltworker in a test environment&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Evaluate&lt;/strong&gt;: Assess if it meets your security and privacy requirements&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Monitor&lt;/strong&gt;: Watch the GitHub repository for updates and security advisories&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Contribute&lt;/strong&gt;: Help improve moltworker through code contributions or feedback&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Decide&lt;/strong&gt;: Choose between moltworker, self-hosting, or a hybrid approach based on your needs&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Additional Resources
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Moltworker GitHub&lt;/strong&gt;: &lt;a href="https://github.com/cloudflare/moltworker" rel="noopener noreferrer"&gt;https://github.com/cloudflare/moltworker&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Moltbot Official Site&lt;/strong&gt;: &lt;a href="https://molt.bot/" rel="noopener noreferrer"&gt;https://molt.bot/&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cloudflare Sandbox Docs&lt;/strong&gt;: &lt;a href="https://developers.cloudflare.com/sandbox/" rel="noopener noreferrer"&gt;https://developers.cloudflare.com/sandbox/&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cloudflare AI Gateway&lt;/strong&gt;: &lt;a href="https://developers.cloudflare.com/ai-gateway/" rel="noopener noreferrer"&gt;https://developers.cloudflare.com/ai-gateway/&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cloudflare Access&lt;/strong&gt;: &lt;a href="https://developers.cloudflare.com/cloudflare-one/policies/access/" rel="noopener noreferrer"&gt;https://developers.cloudflare.com/cloudflare-one/policies/access/&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Related Posts
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://dev.to/blog/2026-universal-commerce-protocol"&gt;Universal Commerce Protocol (UCP): The Complete 2026 Guide to Agentic Commerce Standards&lt;/a&gt; — Open standard for agentic commerce and payments&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://dev.to/blog/2025-the-complete-guide-to-a2ui-protocol"&gt;The Complete Guide to A2UI Protocol: Building Agent-Driven UIs with Google A2UI (2025)&lt;/a&gt; — Declarative UI protocol for AI agents&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://dev.to/blog/2025-full-guide-a2a-protocol"&gt;2025 Full Guide: Agent2Agent (A2A) Protocol&lt;/a&gt; — AI agent coordination and protocol fundamentals&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Last Updated: January 30, 2026&lt;/em&gt;&lt;br&gt;&lt;br&gt;
&lt;em&gt;Moltworker Version: Proof-of-Concept (2026-01)&lt;/em&gt;&lt;br&gt;&lt;br&gt;
&lt;em&gt;Author: Based on official Cloudflare documentation and community feedback&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://a2aprotocol.ai/blog/2026-moltworker-complete-guide" rel="noopener noreferrer"&gt;Moltworker Complete Guide 2026&lt;/a&gt;&lt;/p&gt;

</description>
      <category>moltworker</category>
    </item>
    <item>
      <title>2025 Complete Guide: Doubao-Seed-Code Model - In-Depth Analysis of ByteDance's AI Programming Assistant</title>
      <dc:creator>Sienna</dc:creator>
      <pubDate>Wed, 12 Nov 2025 01:10:02 +0000</pubDate>
      <link>https://forem.com/sienna/2025-complete-guide-doubao-seed-code-model-in-depth-analysis-of-bytedances-ai-programming-2bo2</link>
      <guid>https://forem.com/sienna/2025-complete-guide-doubao-seed-code-model-in-depth-analysis-of-bytedances-ai-programming-2bo2</guid>
      <description>&lt;h2&gt;
  
  
  🎯 Key Takeaways (TL;DR)
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Model Positioning&lt;/strong&gt;: Doubao-Seed-Code is ByteDance's professional code generation AI, supporting 200+ programming languages&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Core Capabilities&lt;/strong&gt;: Comprehensive programming assistance including code generation, completion, explanation, debugging, and unit test generation&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Integration Method&lt;/strong&gt;: Quick integration via Volcano Engine API, supporting both streaming and non-streaming calls&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use Cases&lt;/strong&gt;: IDE plugin development, code review tools, intelligent programming assistants, developer education platforms&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Table of Contents
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;What is Doubao-Seed-Code Model&lt;/li&gt;
&lt;li&gt;Core Features and Capabilities&lt;/li&gt;
&lt;li&gt;How to Integrate and Use&lt;/li&gt;
&lt;li&gt;API Call Details&lt;/li&gt;
&lt;li&gt;Best Practices and Application Scenarios&lt;/li&gt;
&lt;li&gt;Frequently Asked Questions&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  What is Doubao-Seed-Code Model
&lt;/h2&gt;

&lt;p&gt;Doubao-Seed-Code is a vertical domain model developed by ByteDance based on the Doubao large language model technology stack, specifically &lt;strong&gt;optimized for code scenarios&lt;/strong&gt;. The model is trained on massive code corpora and possesses deep programming language understanding and generation capabilities.&lt;/p&gt;

&lt;h3&gt;
  
  
  Technical Features
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature Dimension&lt;/th&gt;
&lt;th&gt;Capability Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Language Coverage&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Supports 200+ programming languages (Python, Java, JavaScript, C++, Go, etc.)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Context Length&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Supports long context understanding, suitable for large codebase analysis&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Response Speed&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Optimized inference performance, supports real-time code completion scenarios&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Accuracy&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Trained on real development scenarios with high code executability&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;blockquote&gt;
&lt;p&gt;💡 &lt;strong&gt;Professional Tip&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Doubao-Seed-Code not only generates code but also understands code intent, identifies potential bugs, and provides optimization suggestions - it's a true "AI pair programming" assistant.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Core Features and Capabilities
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1️⃣ Code Generation
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Feature Description&lt;/strong&gt;: Generate complete executable code based on natural language descriptions&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Typical Scenarios&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Generate function implementations from requirement documents&lt;/li&gt;
&lt;li&gt;Quickly scaffold project structures&lt;/li&gt;
&lt;li&gt;Generate algorithm solutions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Example Input&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Implement a quicksort algorithm in Python with detailed comments
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2️⃣ Code Completion
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Feature Description&lt;/strong&gt;: Intelligently predict the next line of code or complete current code snippets&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Technical Advantages&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;✅ Context-aware: Understands current file and project structure&lt;/li&gt;
&lt;li&gt;✅ Multi-line completion: Not just single lines, but complete code blocks&lt;/li&gt;
&lt;li&gt;✅ Style adaptation: Learns user coding style&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  3️⃣ Code Explanation
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Feature Description&lt;/strong&gt;: Convert complex code into easy-to-understand natural language descriptions&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Application Value&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Help beginners understand open-source projects&lt;/li&gt;
&lt;li&gt;Quickly grasp legacy code logic&lt;/li&gt;
&lt;li&gt;Generate code documentation&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  4️⃣ Code Debugging and Optimization
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;th&gt;Value&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Bug Detection&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Identify potential errors and security vulnerabilities&lt;/td&gt;
&lt;td&gt;Improve code quality&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Performance Optimization&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Provide algorithm complexity optimization suggestions&lt;/td&gt;
&lt;td&gt;Enhance runtime efficiency&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Code Refactoring&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Suggest more elegant implementation approaches&lt;/td&gt;
&lt;td&gt;Improve maintainability&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  5️⃣ Unit Test Generation
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Feature Description&lt;/strong&gt;: Automatically generate test cases for functions&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Generated Content&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Normal scenario tests&lt;/li&gt;
&lt;li&gt;Boundary condition tests&lt;/li&gt;
&lt;li&gt;Exception handling tests&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;⚠️ &lt;strong&gt;Note&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Auto-generated test cases require manual review to ensure coverage of all business logic branches.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  How to Integrate and Use
&lt;/h2&gt;

&lt;h3&gt;
  
  
  📋 Prerequisites
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Register Volcano Engine Account&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Visit: &lt;a href="https://www.volcengine.com" rel="noopener noreferrer"&gt;https://www.volcengine.com&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Complete real-name verification&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Activate Model Service&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Go to "Model Marketplace"&lt;/li&gt;
&lt;li&gt;Find "Doubao-Seed-Code Model"&lt;/li&gt;
&lt;li&gt;Click "Use Now"&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Obtain API Keys&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Create API Key in console&lt;/li&gt;
&lt;li&gt;Securely store Access Key and Secret Key&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  API Call Details
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Basic Call Example (Python)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;

&lt;span class="c1"&gt;# API Configuration
&lt;/span&gt;&lt;span class="n"&gt;API_ENDPOINT&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://ark.cn-beijing.volces.com/api/v3/chat/completions&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;API_KEY&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;your_api_key_here&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="c1"&gt;# Request Headers
&lt;/span&gt;&lt;span class="n"&gt;headers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Content-Type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;application/json&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Authorization&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Bearer &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;API_KEY&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;# Request Payload
&lt;/span&gt;&lt;span class="n"&gt;payload&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;model&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;doubao-seed-code&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# Model ID
&lt;/span&gt;    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;system&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a professional programming assistant&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Implement binary search algorithm in Python&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;temperature&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.7&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# Control creativity (0-1)
&lt;/span&gt;    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;max_tokens&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;2000&lt;/span&gt;   &lt;span class="c1"&gt;# Maximum output length
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;# Send Request
&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;API_ENDPOINT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# Extract Code
&lt;/span&gt;&lt;span class="n"&gt;code&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;choices&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;message&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;code&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Key Parameter Descriptions
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Parameter&lt;/th&gt;
&lt;th&gt;Type&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;th&gt;Recommended Value&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;model&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;string&lt;/td&gt;
&lt;td&gt;Model identifier&lt;/td&gt;
&lt;td&gt;&lt;code&gt;doubao-seed-code&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;temperature&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;float&lt;/td&gt;
&lt;td&gt;Randomness control (0-1)&lt;/td&gt;
&lt;td&gt;Code generation: 0.2-0.5&lt;br&gt;Creative programming: 0.7-0.9&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;max_tokens&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;int&lt;/td&gt;
&lt;td&gt;Maximum output tokens&lt;/td&gt;
&lt;td&gt;1000-4000&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;stream&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;bool&lt;/td&gt;
&lt;td&gt;Whether to stream response&lt;/td&gt;
&lt;td&gt;Real-time scenarios: true&lt;br&gt;Batch processing: false&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Streaming Call Example
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;stream_code_generation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;payload&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;model&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;doubao-seed-code&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;}],&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;stream&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;  &lt;span class="c1"&gt;# Enable streaming
&lt;/span&gt;    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;API_ENDPOINT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
        &lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
        &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
        &lt;span class="n"&gt;stream&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;line&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;iter_lines&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;line&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;line&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;decode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;utf-8&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;data: &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;''&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;choices&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;delta&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;choices&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;delta&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
                &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;delta&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;delta&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;end&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;''&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;flush&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Usage Example
&lt;/span&gt;&lt;span class="nf"&gt;stream_code_generation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Implement an LRU cache&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;✅ &lt;strong&gt;Best Practice&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Streaming calls are suitable for scenarios requiring real-time feedback (like IDE plugins), significantly enhancing user experience.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Best Practices and Application Scenarios
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Scenario 1: IDE Smart Completion Plugin
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Implementation Approach&lt;/strong&gt;:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Listen to user input events&lt;/li&gt;
&lt;li&gt;Get current file context (20 lines before and after)&lt;/li&gt;
&lt;li&gt;Call API to get completion suggestions&lt;/li&gt;
&lt;li&gt;Display results in floating window&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Prompt Optimization Tips&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;context&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
# Current file: user_service.py
# Existing code:
class UserService:
    def __init__(self, db):
        self.db = db

    def get_user(self, user_id):
        # Cursor position
&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

&lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;Please complete the get_user method implementation with exception handling&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Scenario 2: Code Review Assistant
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Feature Design&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Automatically detect code smells&lt;/li&gt;
&lt;li&gt;Provide refactoring suggestions&lt;/li&gt;
&lt;li&gt;Generate review reports&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Example Prompt&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Please review the following code, focusing on:
1. Potential null pointer exceptions
2. Performance bottlenecks
3. Security vulnerabilities
4. Code style issues

[Code to review]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Scenario 3: Technical Documentation Generation
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Application Value&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Automatically generate API documentation&lt;/li&gt;
&lt;li&gt;Add docstrings to functions&lt;/li&gt;
&lt;li&gt;Generate README files&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Comparison with Traditional Methods&lt;/strong&gt;:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Dimension&lt;/th&gt;
&lt;th&gt;Manual Writing&lt;/th&gt;
&lt;th&gt;AI Generation&lt;/th&gt;
&lt;th&gt;Advantage&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Speed&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;1 hour/module&lt;/td&gt;
&lt;td&gt;5 minutes/module&lt;/td&gt;
&lt;td&gt;🚀 12x improvement&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Consistency&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Depends on manual effort&lt;/td&gt;
&lt;td&gt;Automatically unified&lt;/td&gt;
&lt;td&gt;✅ Standardized style&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Coverage&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;50-70%&lt;/td&gt;
&lt;td&gt;90%+&lt;/td&gt;
&lt;td&gt;📈 More comprehensive&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Scenario 4: Programming Education Platform
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Functional Modules&lt;/strong&gt;:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Interactive Code Explanation&lt;/strong&gt;: Line-by-line code logic explanation&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Error Diagnosis&lt;/strong&gt;: Analyze student code and provide improvement suggestions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Exercise Generation&lt;/strong&gt;: Automatically generate problems based on knowledge points&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Frequently Asked Questions
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Q1: Which programming languages does Doubao-Seed-Code support?
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;A&lt;/strong&gt;: The model supports 200+ programming languages, including but not limited to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Mainstream Languages&lt;/strong&gt;: Python, Java, JavaScript, TypeScript, C++, C#, Go, Rust&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Scripting Languages&lt;/strong&gt;: Shell, PowerShell, Lua, Ruby, PHP&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Frontend Technologies&lt;/strong&gt;: HTML, CSS, Vue, React&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Databases&lt;/strong&gt;: SQL (MySQL, PostgreSQL, etc.)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Others&lt;/strong&gt;: Markdown, JSON, YAML, Dockerfile, etc.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For niche languages, the model also has basic understanding and generation capabilities.&lt;/p&gt;

&lt;h3&gt;
  
  
  Q2: How to improve code generation accuracy?
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;A&lt;/strong&gt;: Follow these best practices:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Provide Detailed Context&lt;/strong&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;   ❌ Poor: Write a sorting function
   ✅ Good: Implement quicksort in Python with requirements:
        - Support custom comparison function
        - Handle empty list cases
        - Time complexity O(nlogn)
        - Include type annotations and docstring
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Step-by-Step Guidance&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;First have the model generate function signature&lt;/li&gt;
&lt;li&gt;Then request core logic implementation&lt;/li&gt;
&lt;li&gt;Finally add exception handling&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Adjust temperature Parameter&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Code generation: 0.2-0.4 (more deterministic)&lt;/li&gt;
&lt;li&gt;Algorithm optimization: 0.5-0.7 (moderate creativity)&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Q3: Are there rate limits for API calls?
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;A&lt;/strong&gt;: Yes, specific limits depend on your subscription plan:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Plan Type&lt;/th&gt;
&lt;th&gt;QPM Limit&lt;/th&gt;
&lt;th&gt;Concurrency&lt;/th&gt;
&lt;th&gt;Monthly Calls&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Free Trial&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;10,000&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Basic&lt;/td&gt;
&lt;td&gt;60&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;100,000&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Professional&lt;/td&gt;
&lt;td&gt;300&lt;/td&gt;
&lt;td&gt;50&lt;/td&gt;
&lt;td&gt;1,000,000&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Enterprise&lt;/td&gt;
&lt;td&gt;Custom&lt;/td&gt;
&lt;td&gt;Custom&lt;/td&gt;
&lt;td&gt;Unlimited&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;blockquote&gt;
&lt;p&gt;⚠️ &lt;strong&gt;Note&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Exceeding limits will return a 429 error. Implement request queuing and retry mechanisms.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Q4: Who owns the copyright of generated code?
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;A&lt;/strong&gt;: According to Volcano Engine service agreement:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;✅ &lt;strong&gt;User owns full copyright&lt;/strong&gt;: Generated code belongs to the caller&lt;/li&gt;
&lt;li&gt;✅ &lt;strong&gt;Commercial use allowed&lt;/strong&gt;: No additional authorization required&lt;/li&gt;
&lt;li&gt;⚠️ &lt;strong&gt;User responsibility&lt;/strong&gt;: Users must ensure generated code doesn't infringe third-party rights&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Q5: How to handle sensitive code and data security?
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;A&lt;/strong&gt;: Security recommendations:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Data Anonymization&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Remove API keys, passwords, and other sensitive information&lt;/li&gt;
&lt;li&gt;Replace real business data with sample data&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Private Deployment&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Enterprise version supports private deployment&lt;/li&gt;
&lt;li&gt;Data stays within local network&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Audit Logs&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Enable API call logging&lt;/li&gt;
&lt;li&gt;Regularly review usage records&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Summary and Action Recommendations
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Core Value Summary
&lt;/h3&gt;

&lt;p&gt;Doubao-Seed-Code provides developers with &lt;strong&gt;comprehensive AI programming assistant capabilities&lt;/strong&gt;, covering the entire software development lifecycle from code generation to debugging optimization. Its core advantages include:&lt;/p&gt;

&lt;p&gt;✅ &lt;strong&gt;High Accuracy&lt;/strong&gt;: Trained on massive real code&lt;br&gt;&lt;br&gt;
✅ &lt;strong&gt;Easy Integration&lt;/strong&gt;: Standard REST API, supports multiple SDKs&lt;br&gt;&lt;br&gt;
✅ &lt;strong&gt;High Performance&lt;/strong&gt;: Optimized inference speed, supports real-time scenarios&lt;br&gt;&lt;br&gt;
✅ &lt;strong&gt;Continuous Evolution&lt;/strong&gt;: Regular updates, constantly improving capabilities  &lt;/p&gt;

&lt;h3&gt;
  
  
  Related Resources
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;📚 &lt;strong&gt;Official Documentation&lt;/strong&gt;: &lt;a href="https://www.volcengine.com/docs/82379/1949118" rel="noopener noreferrer"&gt;https://www.volcengine.com/docs/82379/1949118&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;




</description>
      <category>doubao</category>
    </item>
    <item>
      <title>2025 Complete Guide: In-Depth Analysis of ERNIE-4.5-VL-28B-A3B-Thinking Multimodal AI Model</title>
      <dc:creator>Sienna</dc:creator>
      <pubDate>Wed, 12 Nov 2025 00:29:02 +0000</pubDate>
      <link>https://forem.com/sienna/2025-complete-guide-in-depth-analysis-of-ernie-45-vl-28b-a3b-thinking-multimodal-ai-model-402a</link>
      <guid>https://forem.com/sienna/2025-complete-guide-in-depth-analysis-of-ernie-45-vl-28b-a3b-thinking-multimodal-ai-model-402a</guid>
      <description>&lt;h2&gt;
  
  
  🎯 Key Takeaways (TL;DR)
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Lightweight &amp;amp; Efficient&lt;/strong&gt;: Activates only 3B parameters while matching top-tier flagship model performance&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Breakthrough Reasoning&lt;/strong&gt;: Achieves exceptional visual reasoning and STEM problem-solving through large-scale reinforcement learning&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Innovative Features&lt;/strong&gt;: Supports "Thinking with Images", visual grounding, tool calling, and video understanding&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Easy Deployment&lt;/strong&gt;: Supports multiple inference frameworks including Transformers, vLLM, and FastDeploy&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Open Source Friendly&lt;/strong&gt;: Licensed under Apache 2.0, allowing commercial use&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Table of Contents
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;What is ERNIE-4.5-VL-28B-A3B-Thinking&lt;/li&gt;
&lt;li&gt;Core Technical Highlights&lt;/li&gt;
&lt;li&gt;Six Key Capabilities Explained&lt;/li&gt;
&lt;li&gt;Performance Benchmarks&lt;/li&gt;
&lt;li&gt;Quick Start Guide&lt;/li&gt;
&lt;li&gt;Deployment Options Comparison&lt;/li&gt;
&lt;li&gt;Fine-tuning and Training&lt;/li&gt;
&lt;li&gt;Frequently Asked Questions&lt;/li&gt;
&lt;li&gt;Summary and Recommendations&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  What is ERNIE-4.5-VL-28B-A3B-Thinking
&lt;/h2&gt;

&lt;p&gt;ERNIE-4.5-VL-28B-A3B-Thinking is Baidu's latest generation multimodal AI model, built upon the powerful ERNIE-4.5-VL-28B-A3B architecture. It's a large language model specifically optimized for vision-language understanding tasks, having absorbed massive amounts of high-quality visual-language reasoning data through extensive mid-training phases.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;💡 &lt;strong&gt;Expert Tip&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The model's key feature is its MoE (Mixture of Experts) architecture. While the total parameter count is 28B, only 3B parameters are activated during inference, enabling it to maintain high performance while dramatically reducing computational costs.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Core Innovations
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Large-scale Vision-Language Training&lt;/strong&gt;: Absorbed vast amounts of premium visual-language reasoning data during mid-training&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Deep Semantic Alignment&lt;/strong&gt;: Significantly enhanced semantic alignment between visual and language modalities&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Advanced Reinforcement Learning&lt;/strong&gt;: Employs GSPO and IcePop strategies combined with dynamic difficulty sampling for efficient learning&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Enhanced Instruction Following&lt;/strong&gt;: Dramatically improved visual grounding performance and instruction execution capabilities&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Core Technical Highlights
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Training Technology Innovations
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Technical Feature&lt;/th&gt;
&lt;th&gt;Implementation&lt;/th&gt;
&lt;th&gt;Benefits&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Multimodal RL&lt;/td&gt;
&lt;td&gt;GSPO + IcePop strategies&lt;/td&gt;
&lt;td&gt;Stabilizes MoE training, improves learning efficiency&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Dynamic Difficulty Sampling&lt;/td&gt;
&lt;td&gt;Adaptive training sample difficulty adjustment&lt;/td&gt;
&lt;td&gt;Accelerates convergence, enhances generalization&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Large-scale Mid-training&lt;/td&gt;
&lt;td&gt;Massive visual-language reasoning data&lt;/td&gt;
&lt;td&gt;Boosts representation power and cross-modal understanding&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Verifiable Task Learning&lt;/td&gt;
&lt;td&gt;RL on verifiable tasks&lt;/td&gt;
&lt;td&gt;Ensures reasoning accuracy&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Architectural Advantages
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;MoE (Mixture of Experts) Architecture&lt;/strong&gt; enables the model to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Activate only necessary 3B parameters during inference&lt;/li&gt;
&lt;li&gt;Maintain 28B parameter knowledge capacity&lt;/li&gt;
&lt;li&gt;Significantly reduce inference costs and latency&lt;/li&gt;
&lt;li&gt;Achieve better energy efficiency&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;⚠️ &lt;strong&gt;Important Note&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Although the model activates only 3B parameters, single-card deployment requires at least 80GB GPU memory. This is because the complete model weights need to be loaded, even though only a portion is activated during inference.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Six Key Capabilities Explained
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. 🧠 Visual Reasoning
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Core Strengths:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Multi-step complex reasoning&lt;/li&gt;
&lt;li&gt;Chart analysis and interpretation&lt;/li&gt;
&lt;li&gt;Causal relationship reasoning&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Application Scenarios:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Complex chart data analysis&lt;/li&gt;
&lt;li&gt;Visual logic problem solving&lt;/li&gt;
&lt;li&gt;Scene understanding and inference&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Empowered by large-scale reinforcement learning, the model demonstrates exceptional multi-step reasoning capabilities in complex visual tasks. Whether analyzing intricate statistical charts or understanding causal relationships in images, ERNIE-4.5-VL-Thinking delivers accurate analytical results.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. 🔬 STEM Reasoning
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Breakthrough Performance:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Solving math problems from photos&lt;/li&gt;
&lt;li&gt;Physics formula recognition and calculation&lt;/li&gt;
&lt;li&gt;Geometric figure analysis&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Practical Value:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Educational assistance tools&lt;/li&gt;
&lt;li&gt;Homework grading systems&lt;/li&gt;
&lt;li&gt;Scientific research data analysis&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Leveraging powerful visual capabilities, the model achieves a performance leap in STEM tasks. It can directly recognize mathematical formulas and geometric figures from photos and perform accurate calculations and reasoning, handling even complex problems with ease.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. 📍 Visual Grounding
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Enhanced Features:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;More precise object localization&lt;/li&gt;
&lt;li&gt;Flexible instruction execution&lt;/li&gt;
&lt;li&gt;Complex industrial scenario adaptation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Typical Applications:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Industrial quality inspection&lt;/li&gt;
&lt;li&gt;Autonomous driving scene understanding&lt;/li&gt;
&lt;li&gt;Robot visual navigation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Responding to strong community demand, the model significantly enhances visual grounding performance. Improved instruction-following capabilities make grounding functions more accessible, easily triggering localization in complex industrial scenarios for dramatic efficiency gains.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. 🤔 Thinking with Images
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Innovative Functionality:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Thinks like humans&lt;/li&gt;
&lt;li&gt;Freely zooms image details&lt;/li&gt;
&lt;li&gt;Progressive information extraction&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Workflow:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User Input Image → Initial Analysis → Identify Key Regions → 
Zoom Detail Inspection → Synthesize Information → Generate Complete Answer
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is one of the model's most innovative features. When paired with tools like image zooming and image search, "Thinking with Images" dramatically elevates the model's ability to process fine-grained details and handle long-tail visual knowledge. The model thinks like a human, first observing the whole, then zooming into key regions for careful inspection, and finally synthesizing all information to provide an answer.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;✅ &lt;strong&gt;Best Practice&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;When processing high-resolution images or pictures with abundant details, enabling "Thinking with Images" can significantly improve recognition accuracy.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  5. 🛠️ Tool Utilization
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Supported Tool Types:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Image search&lt;/li&gt;
&lt;li&gt;Image zooming&lt;/li&gt;
&lt;li&gt;External knowledge base queries&lt;/li&gt;
&lt;li&gt;Calculator and other auxiliary tools&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Advantages:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Handle long-tail knowledge&lt;/li&gt;
&lt;li&gt;Real-time information retrieval&lt;/li&gt;
&lt;li&gt;Enhanced problem-solving capabilities&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Empowered by robust tool-calling capabilities, the model can instantly use functions like image search to easily identify long-tail knowledge and achieve comprehensive information retrieval. These enhancements form a critical foundation for developing sophisticated multimodal agents.&lt;/p&gt;

&lt;h3&gt;
  
  
  6. 🎬 Video Understanding
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Core Capabilities:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Outstanding temporal awareness&lt;/li&gt;
&lt;li&gt;Precise event localization&lt;/li&gt;
&lt;li&gt;Cross-frame content change recognition&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Application Domains:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Video content moderation&lt;/li&gt;
&lt;li&gt;Intelligent video editing&lt;/li&gt;
&lt;li&gt;Surveillance video analysis&lt;/li&gt;
&lt;li&gt;Sports event analysis&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The model possesses outstanding temporal awareness and event localization abilities, accurately identifying content changes across different time segments in videos, making video analysis smarter and more efficient.&lt;/p&gt;




&lt;h2&gt;
  
  
  Performance Benchmarks
&lt;/h2&gt;

&lt;p&gt;According to official benchmark results, ERNIE-4.5-VL-28B-A3B-Thinking performs excellently across multiple evaluation benchmarks. As a lightweight model activating only 3B parameters, its performance closely matches or even exceeds industry-leading flagship models.&lt;/p&gt;

&lt;h3&gt;
  
  
  Comparison with Top Models
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Capability Dimension&lt;/th&gt;
&lt;th&gt;ERNIE-4.5-VL-Thinking&lt;/th&gt;
&lt;th&gt;Industry Top Models Average&lt;/th&gt;
&lt;th&gt;Advantage&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Visual Reasoning&lt;/td&gt;
&lt;td&gt;⭐⭐⭐⭐⭐&lt;/td&gt;
&lt;td&gt;⭐⭐⭐⭐&lt;/td&gt;
&lt;td&gt;RL enhancement&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;STEM Problems&lt;/td&gt;
&lt;td&gt;⭐⭐⭐⭐⭐&lt;/td&gt;
&lt;td&gt;⭐⭐⭐⭐&lt;/td&gt;
&lt;td&gt;Visual breakthrough&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Visual Grounding&lt;/td&gt;
&lt;td&gt;⭐⭐⭐⭐⭐&lt;/td&gt;
&lt;td&gt;⭐⭐⭐&lt;/td&gt;
&lt;td&gt;Specialized optimization&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tool Calling&lt;/td&gt;
&lt;td&gt;⭐⭐⭐⭐⭐&lt;/td&gt;
&lt;td&gt;⭐⭐⭐⭐&lt;/td&gt;
&lt;td&gt;Native support&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Parameter Efficiency&lt;/td&gt;
&lt;td&gt;⭐⭐⭐⭐⭐&lt;/td&gt;
&lt;td&gt;⭐⭐⭐&lt;/td&gt;
&lt;td&gt;Only 3B activated&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Video Understanding&lt;/td&gt;
&lt;td&gt;⭐⭐⭐⭐⭐&lt;/td&gt;
&lt;td&gt;⭐⭐⭐⭐&lt;/td&gt;
&lt;td&gt;Strong temporal awareness&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;blockquote&gt;
&lt;p&gt;📊 &lt;strong&gt;Performance Highlights&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Official benchmark charts show the model approaches or exceeds industry-leading flagship models across multiple dimensions while maintaining significant parameter efficiency advantages. This means users can achieve top-tier performance at lower costs.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Key Performance Metrics
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Inference Speed&lt;/strong&gt;: Thanks to only 3B activated parameters, inference is 2-3x faster than equivalent full-parameter models&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Memory Footprint&lt;/strong&gt;: While 80GB is needed to load the model, inference memory usage is far lower than traditional large models&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Accuracy&lt;/strong&gt;: Achieves SOTA levels across multiple vision-language understanding benchmarks&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Generalization&lt;/strong&gt;: Maintains strong performance on unseen tasks&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Quick Start Guide
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Method 1: Using Transformers Library (Recommended for Beginners)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Suitable For:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Rapid prototyping&lt;/li&gt;
&lt;li&gt;Small-scale inference tasks&lt;/li&gt;
&lt;li&gt;Learning and experimentation&lt;/li&gt;
&lt;li&gt;Single or low-frequency calls&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Basic Code Example:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;transformers&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;AutoProcessor&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;AutoTokenizer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;AutoModelForCausalLM&lt;/span&gt;

&lt;span class="c1"&gt;# Load model
&lt;/span&gt;&lt;span class="n"&gt;model_path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;baidu/ERNIE-4.5-VL-28B-A3B-Thinking&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;AutoModelForCausalLM&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_pretrained&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model_path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;device_map&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;auto&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;dtype&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;bfloat16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;trust_remote_code&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Load processor
&lt;/span&gt;&lt;span class="n"&gt;processor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;AutoProcessor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_pretrained&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model_path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;trust_remote_code&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_image_preprocess&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;processor&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Build messages
&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;What color clothes is the girl wearing in the picture?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;image_url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;image_url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://paddlenlp.bj.bcebos.com/datasets/paddlemix/demo_images/example1.jpg&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                &lt;span class="p"&gt;}&lt;/span&gt;
            &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="c1"&gt;# Process input
&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;processor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tokenizer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;apply_chat_template&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;tokenize&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;add_generation_prompt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;image_inputs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;video_inputs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;processor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;process_vision_info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;inputs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;processor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;images&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;image_inputs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;videos&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;video_inputs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;padding&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;return_tensors&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Generate response
&lt;/span&gt;&lt;span class="n"&gt;device&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;next&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;parameters&lt;/span&gt;&lt;span class="p"&gt;()).&lt;/span&gt;&lt;span class="n"&gt;device&lt;/span&gt;
&lt;span class="n"&gt;inputs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;inputs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;device&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;generated_ids&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;inputs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;inputs&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;input_ids&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;to&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;device&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;inputs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;max_new_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1024&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;use_cache&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;output_text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;processor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;decode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;generated_ids&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;inputs&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;input_ids&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]):])&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;output_text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Key Parameter Explanations:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;device_map="auto"&lt;/code&gt;: Automatically allocates model to available devices&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;dtype=torch.bfloat16&lt;/code&gt;: Uses bfloat16 precision, balancing performance and accuracy&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;trust_remote_code=True&lt;/code&gt;: Allows execution of custom code from model repository&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;max_new_tokens=1024&lt;/code&gt;: Controls maximum length of generated text&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Method 2: Using vLLM (Recommended for Production)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Suitable For:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;High-concurrency inference services&lt;/li&gt;
&lt;li&gt;Production environment deployment&lt;/li&gt;
&lt;li&gt;Applications requiring high throughput&lt;/li&gt;
&lt;li&gt;API service construction&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Installation Steps:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Install uv package manager&lt;/span&gt;
pip &lt;span class="nb"&gt;install &lt;/span&gt;uv

&lt;span class="c"&gt;# Install vLLM main branch&lt;/span&gt;
uv pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-U&lt;/span&gt; vllm &lt;span class="nt"&gt;--pre&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--extra-index-url&lt;/span&gt; https://wheels.vllm.ai/nightly &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--extra-index-url&lt;/span&gt; https://download.pytorch.org/whl/cu129 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--index-strategy&lt;/span&gt; unsafe-best-match
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Start Service:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Basic startup (requires 80G memory)&lt;/span&gt;
vllm serve baidu/ERNIE-4.5-VL-28B-A3B-Thinking &lt;span class="nt"&gt;--trust-remote-code&lt;/span&gt;

&lt;span class="c"&gt;# If encountering memory shortage, add the following parameter&lt;/span&gt;
vllm serve baidu/ERNIE-4.5-VL-28B-A3B-Thinking &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--trust-remote-code&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--gpu-memory-utilization&lt;/span&gt; 0.95
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Enable Reasoning Parser and Tool Calling:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;vllm serve baidu/ERNIE-4.5-VL-28B-A3B-Thinking &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--trust-remote-code&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--reasoning-parser&lt;/span&gt; ernie45 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--tool-call-parser&lt;/span&gt; ernie45 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--enable-auto-tool-choice&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;vLLM Advantages:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;PagedAttention&lt;/strong&gt;: Efficient memory management, supports larger batches&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Continuous Batching&lt;/strong&gt;: Dynamically batches requests, maximizes GPU utilization&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Optimized CUDA Kernels&lt;/strong&gt;: Specially optimized inference kernels for faster speed&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;OpenAI-Compatible API&lt;/strong&gt;: Provides OpenAI API-compatible interface&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Method 3: Using FastDeploy (Recommended for Enterprise)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Suitable For:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Enterprise-grade production deployment&lt;/li&gt;
&lt;li&gt;Requiring quantization acceleration&lt;/li&gt;
&lt;li&gt;Multi-instance load balancing&lt;/li&gt;
&lt;li&gt;Complete monitoring and management&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Quick Start:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;fastdeploy serve &lt;span class="nt"&gt;--model&lt;/span&gt; baidu/ERNIE-4.5-VL-28B-A3B-Thinking &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--max-model-len&lt;/span&gt; 131072 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--max-num-seqs&lt;/span&gt; 32 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--port&lt;/span&gt; 8180 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--quantization&lt;/span&gt; wint8 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--reasoning-parser&lt;/span&gt; ernie-45-vl-thinking &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--tool-call-parser&lt;/span&gt; ernie-45-vl-thinking &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--mm-processor-kwargs&lt;/span&gt; &lt;span class="s1"&gt;'{"image_max_pixels": 12845056 }'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Parameter Details:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;--max-model-len 131072&lt;/code&gt;: Maximum supported sequence length&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;--max-num-seqs 32&lt;/code&gt;: Maximum concurrent sequences&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;--quantization wint8&lt;/code&gt;: Uses 8-bit integer quantization, reduces memory usage&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;--mm-processor-kwargs&lt;/code&gt;: Multimodal processor parameters, controls maximum image pixels&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;💡 &lt;strong&gt;Expert Tip&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;FastDeploy supports wint8 quantization, reducing memory requirements from 80GB to approximately 60GB while maintaining performance. This is the best choice for memory-constrained scenarios.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Deployment Options Comparison
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Detailed Comparison Table
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Deployment Option&lt;/th&gt;
&lt;th&gt;Ease of Use&lt;/th&gt;
&lt;th&gt;Performance&lt;/th&gt;
&lt;th&gt;Concurrency&lt;/th&gt;
&lt;th&gt;Memory Requirement&lt;/th&gt;
&lt;th&gt;Quantization&lt;/th&gt;
&lt;th&gt;Suitable Scenarios&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Transformers&lt;/td&gt;
&lt;td&gt;⭐⭐⭐⭐⭐&lt;/td&gt;
&lt;td&gt;⭐⭐⭐&lt;/td&gt;
&lt;td&gt;⭐⭐&lt;/td&gt;
&lt;td&gt;80GB+&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;Development &amp;amp; Testing&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;vLLM&lt;/td&gt;
&lt;td&gt;⭐⭐⭐⭐&lt;/td&gt;
&lt;td&gt;⭐⭐⭐⭐⭐&lt;/td&gt;
&lt;td&gt;⭐⭐⭐⭐⭐&lt;/td&gt;
&lt;td&gt;80GB+&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;Production&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;FastDeploy&lt;/td&gt;
&lt;td&gt;⭐⭐⭐⭐&lt;/td&gt;
&lt;td&gt;⭐⭐⭐⭐⭐&lt;/td&gt;
&lt;td&gt;⭐⭐⭐⭐⭐&lt;/td&gt;
&lt;td&gt;60GB+ (quantized)&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;Enterprise&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Performance Comparison
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Transformers&lt;/th&gt;
&lt;th&gt;vLLM&lt;/th&gt;
&lt;th&gt;FastDeploy&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Single Inference Latency&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;td&gt;Low&lt;/td&gt;
&lt;td&gt;Low&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Throughput (req/s)&lt;/td&gt;
&lt;td&gt;1-5&lt;/td&gt;
&lt;td&gt;20-50&lt;/td&gt;
&lt;td&gt;20-50&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Memory Efficiency&lt;/td&gt;
&lt;td&gt;Fair&lt;/td&gt;
&lt;td&gt;Excellent&lt;/td&gt;
&lt;td&gt;Excellent&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Startup Time&lt;/td&gt;
&lt;td&gt;Fast&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;API Compatibility&lt;/td&gt;
&lt;td&gt;Custom&lt;/td&gt;
&lt;td&gt;OpenAI-compatible&lt;/td&gt;
&lt;td&gt;Custom&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Selection Recommendations
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;If you are:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;AI Researcher/Student&lt;/strong&gt; → Choose &lt;strong&gt;Transformers&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;✅ Easy to experiment and debug&lt;/li&gt;
&lt;li&gt;✅ Full model access&lt;/li&gt;
&lt;li&gt;✅ Rich documentation and community support&lt;/li&gt;
&lt;li&gt;❌ Not optimal performance&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;&lt;strong&gt;Startup/Individual Developer&lt;/strong&gt; → Choose &lt;strong&gt;vLLM&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;✅ Balanced performance and ease of use&lt;/li&gt;
&lt;li&gt;✅ OpenAI-compatible API&lt;/li&gt;
&lt;li&gt;✅ Active community&lt;/li&gt;
&lt;li&gt;✅ Free and open source&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;&lt;strong&gt;Large Enterprise&lt;/strong&gt; → Choose &lt;strong&gt;FastDeploy&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;✅ Complete enterprise-grade support&lt;/li&gt;
&lt;li&gt;✅ Quantization optimization&lt;/li&gt;
&lt;li&gt;✅ Monitoring and management features&lt;/li&gt;
&lt;li&gt;✅ Long-term maintenance guarantee&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;




&lt;h2&gt;
  
  
  Fine-tuning and Training
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Fine-tuning with ERNIEKit
&lt;/h3&gt;

&lt;p&gt;ERNIEKit is a training toolkit based on PaddlePaddle, specifically designed for the ERNIE series models, providing comprehensive training support.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Supported Training Scenarios:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;✅ Supervised Fine-Tuning (SFT)&lt;/li&gt;
&lt;li&gt;✅ LoRA Low-Rank Adaptation&lt;/li&gt;
&lt;li&gt;✅ DPO Alignment Training&lt;/li&gt;
&lt;li&gt;✅ Function Calling Training&lt;/li&gt;
&lt;li&gt;✅ Multi-GPU Distributed Training&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Quick Start Fine-tuning
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Step 1: Download Model&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;huggingface-cli download baidu/ERNIE-4.5-VL-28B-A3B-Thinking &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--local-dir&lt;/span&gt; baidu/ERNIE-4.5-VL-28B-A3B-Thinking
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Step 2: Run SFT Training&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Basic SFT + LoRA (Recommended)&lt;/span&gt;
erniekit train examples/configs/ERNIE-4.5-VL-28B-A3B-Thinking/sft/run_sft_lora_8k.yaml

&lt;span class="c"&gt;# Function calling specialized training&lt;/span&gt;
erniekit train examples/configs/ERNIE-4.5-VL-28B-A3B-Thinking/sft_function_call/run_sft_8k.yaml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Training Configuration Examples
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;LoRA Configuration Recommendations:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;lora_config&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;r&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;8&lt;/span&gt;                    &lt;span class="c1"&gt;# LoRA rank, higher means more expressive but more memory&lt;/span&gt;
  &lt;span class="na"&gt;lora_alpha&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;16&lt;/span&gt;          &lt;span class="c1"&gt;# LoRA scaling factor&lt;/span&gt;
  &lt;span class="na"&gt;target_modules&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;         &lt;span class="c1"&gt;# Target modules for LoRA&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;q_proj&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;v_proj&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;k_proj&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;o_proj&lt;/span&gt;
  &lt;span class="na"&gt;lora_dropout&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;0.05&lt;/span&gt;      &lt;span class="c1"&gt;# Dropout rate&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Training Hyperparameter Recommendations:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;training_args&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;learning_rate&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;1e-5&lt;/span&gt;     &lt;span class="c1"&gt;# Learning rate&lt;/span&gt;
  &lt;span class="na"&gt;num_train_epochs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;3&lt;/span&gt;     &lt;span class="c1"&gt;# Number of epochs&lt;/span&gt;
  &lt;span class="na"&gt;per_device_train_batch_size&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;4&lt;/span&gt;
  &lt;span class="na"&gt;gradient_accumulation_steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;4&lt;/span&gt;
  &lt;span class="na"&gt;warmup_ratio&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;0.1&lt;/span&gt;       &lt;span class="c1"&gt;# Warmup ratio&lt;/span&gt;
  &lt;span class="na"&gt;save_steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;500&lt;/span&gt;         &lt;span class="c1"&gt;# Checkpoint save interval&lt;/span&gt;
  &lt;span class="na"&gt;logging_steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;10&lt;/span&gt;       &lt;span class="c1"&gt;# Logging interval&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Data Preparation
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Standard Data Format:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"messages"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"role"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"user"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"content"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"text"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"text"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Describe this image"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"image_url"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"image_url"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"url"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"path/to/image.jpg"&lt;/span&gt;&lt;span class="p"&gt;}}&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"role"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"assistant"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"content"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"This is an image of..."&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Fine-tuning Best Practices
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;✅ &lt;strong&gt;Best Practices&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Data Quality First&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Ensure correct training data format&lt;/li&gt;
&lt;li&gt;Include high-quality image-text pairs&lt;/li&gt;
&lt;li&gt;Sufficient data diversity&lt;/li&gt;
&lt;li&gt;Avoid data bias&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;LoRA Configuration Optimization&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Resource-constrained: r=8, alpha=16&lt;/li&gt;
&lt;li&gt;Balanced: r=16, alpha=32&lt;/li&gt;
&lt;li&gt;High-quality: r=32, alpha=64&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Learning Rate Adjustment&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Start with smaller learning rate (1e-5)&lt;/li&gt;
&lt;li&gt;Use warmup to avoid training instability&lt;/li&gt;
&lt;li&gt;Monitor loss curves and adjust timely&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Validation and Monitoring&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Regular evaluation on validation set&lt;/li&gt;
&lt;li&gt;Use early stopping to avoid overfitting&lt;/li&gt;
&lt;li&gt;Track key metric changes&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Memory Optimization&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use gradient accumulation to reduce batch size&lt;/li&gt;
&lt;li&gt;Enable mixed precision training&lt;/li&gt;
&lt;li&gt;Consider using DeepSpeed ZeRO&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Training Hardware Requirements
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Training Method&lt;/th&gt;
&lt;th&gt;Minimum Memory&lt;/th&gt;
&lt;th&gt;Recommended Memory&lt;/th&gt;
&lt;th&gt;GPU Count&lt;/th&gt;
&lt;th&gt;Training Time (1000 samples)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;LoRA (r=8)&lt;/td&gt;
&lt;td&gt;40GB&lt;/td&gt;
&lt;td&gt;80GB&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;2-4 hours&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;LoRA (r=16)&lt;/td&gt;
&lt;td&gt;48GB&lt;/td&gt;
&lt;td&gt;80GB&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;3-6 hours&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Full Fine-tune&lt;/td&gt;
&lt;td&gt;160GB+&lt;/td&gt;
&lt;td&gt;320GB+&lt;/td&gt;
&lt;td&gt;4+&lt;/td&gt;
&lt;td&gt;12-24 hours&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  🤔 Frequently Asked Questions
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Q1: How much GPU memory is required to run the model?
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;A:&lt;/strong&gt; &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Inference&lt;/strong&gt;: At least &lt;strong&gt;80GB GPU memory&lt;/strong&gt; per card (e.g., A100 or H100)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Quantized Inference&lt;/strong&gt;: Can be reduced to approximately &lt;strong&gt;60GB&lt;/strong&gt; using wint8 quantization&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fine-tuning (LoRA)&lt;/strong&gt;: Requires at least &lt;strong&gt;40-80GB&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Full Fine-tuning&lt;/strong&gt;: Requires &lt;strong&gt;160GB+&lt;/strong&gt;, multi-GPU training recommended&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Memory Optimization Suggestions:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use quantization techniques (wint8)&lt;/li&gt;
&lt;li&gt;Enable gradient checkpointing&lt;/li&gt;
&lt;li&gt;Reduce batch size&lt;/li&gt;
&lt;li&gt;Use LoRA instead of full fine-tuning&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Q2: What languages does the model support?
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;A:&lt;/strong&gt; The model is primarily optimized for &lt;strong&gt;Chinese and English&lt;/strong&gt;, with the strongest understanding and generation capabilities in these two languages.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Language Support Details:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;🟢 &lt;strong&gt;Chinese&lt;/strong&gt;: Excellent (primary optimization language)&lt;/li&gt;
&lt;li&gt;🟢 &lt;strong&gt;English&lt;/strong&gt;: Excellent (primary optimization language)&lt;/li&gt;
&lt;li&gt;🟡 &lt;strong&gt;Other Languages&lt;/strong&gt;: Basic support, effectiveness may not match Chinese/English&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Q3: How to enable "Thinking with Images" functionality?
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;A:&lt;/strong&gt; "Thinking with Images" is automatically enabled when using tool-calling mode.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Enabling Method:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Add parameters when starting vLLM&lt;/span&gt;
vllm serve baidu/ERNIE-4.5-VL-28B-A3B-Thinking &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--trust-remote-code&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--reasoning-parser&lt;/span&gt; ernie45 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--tool-call-parser&lt;/span&gt; ernie45 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--enable-auto-tool-choice&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The model automatically determines when to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Zoom image details&lt;/li&gt;
&lt;li&gt;Search related images&lt;/li&gt;
&lt;li&gt;Call other tools&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Q4: Can it be used commercially?
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;A:&lt;/strong&gt; ✅ &lt;strong&gt;Yes, commercial use is allowed&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The model is licensed under &lt;strong&gt;Apache 2.0&lt;/strong&gt;, which permits:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;✅ Commercial use&lt;/li&gt;
&lt;li&gt;✅ Modification and distribution&lt;/li&gt;
&lt;li&gt;✅ Patent use&lt;/li&gt;
&lt;li&gt;✅ Private use&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Important Notes:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Retain copyright notices&lt;/li&gt;
&lt;li&gt;Mark significant modifications&lt;/li&gt;
&lt;li&gt;Comply with license terms&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Q5: What advantages does it have compared to other multimodal models?
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;A:&lt;/strong&gt; Key advantages include:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Advantage Dimension&lt;/th&gt;
&lt;th&gt;Specific Performance&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Parameter Efficiency&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Only 3B activated parameters, 50%+ lower inference cost&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Reasoning Capability&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Large-scale RL training, excellent complex reasoning&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Tool Integration&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Native support for image search, zoom, etc.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Visual Grounding&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Specially optimized grounding, suitable for industrial scenarios&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Chinese Support&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Deep optimization for Chinese, better Chinese performance&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Open Source Friendly&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Apache 2.0 license, barrier-free commercial use&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Q6: Does it support video input?
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;A:&lt;/strong&gt; ✅ &lt;strong&gt;Full video understanding support&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Video Processing Capabilities:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Temporal information understanding&lt;/li&gt;
&lt;li&gt;Event localization&lt;/li&gt;
&lt;li&gt;Cross-frame content change recognition&lt;/li&gt;
&lt;li&gt;Video summary generation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Usage Method:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Describe what happens in the video&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;video&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;video&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;path/to/video.mp4&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;image_inputs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;video_inputs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;processor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;process_vision_info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Q7: How to achieve optimal inference performance?
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;A:&lt;/strong&gt; Recommended configuration and optimization strategies:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Deployment Configuration:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;vllm serve baidu/ERNIE-4.5-VL-28B-A3B-Thinking &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--trust-remote-code&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--dtype&lt;/span&gt; bfloat16 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--max-model-len&lt;/span&gt; 8192 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--max-num-seqs&lt;/span&gt; 32 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--gpu-memory-utilization&lt;/span&gt; 0.95 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--enable-chunked-prefill&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Performance Optimization Recommendations:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Use vLLM or FastDeploy&lt;/strong&gt; instead of Transformers&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Enable bfloat16 precision&lt;/strong&gt; for speed-accuracy balance&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Set concurrency appropriately&lt;/strong&gt; adjust &lt;code&gt;max-num-seqs&lt;/code&gt; based on memory&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Batch requests&lt;/strong&gt; use batching mode for bulk inference&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Enable PagedAttention&lt;/strong&gt; enabled by default in vLLM, improves memory efficiency&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use quantization&lt;/strong&gt; if memory-constrained, use wint8 quantization&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Performance Benchmark Reference:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Single inference latency: 200-500ms (depends on input length)&lt;/li&gt;
&lt;li&gt;Throughput: 20-50 requests/second (vLLM, single A100)&lt;/li&gt;
&lt;li&gt;Concurrency support: Up to 32 concurrent requests&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Q8: How frequently is the model updated?
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;A:&lt;/strong&gt; Baidu regularly updates the ERNIE series models.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Get Update Information:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;📱 &lt;a href="https://huggingface.co/baidu/ERNIE-4.5-VL-28B-A3B-Thinking" rel="noopener noreferrer"&gt;Hugging Face Model Page&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;📝 &lt;a href="https://yiyan.baidu.com/blog/ernie4.5" rel="noopener noreferrer"&gt;ERNIE Official Blog&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;💻 &lt;a href="https://github.com/PaddlePaddle/ERNIE" rel="noopener noreferrer"&gt;GitHub Repository&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Recommendations:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Follow official channels for latest versions&lt;/li&gt;
&lt;li&gt;Check Release Notes for improvements&lt;/li&gt;
&lt;li&gt;Validate compatibility in test environment before upgrading&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Q9: How to handle inference errors or exceptions?
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;A:&lt;/strong&gt; Common issues and solutions:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Out of Memory (OOM):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Solution 1: Increase memory utilization&lt;/span&gt;
&lt;span class="nt"&gt;--gpu-memory-utilization&lt;/span&gt; 0.95

&lt;span class="c"&gt;# Solution 2: Reduce concurrency&lt;/span&gt;
&lt;span class="nt"&gt;--max-num-seqs&lt;/span&gt; 16

&lt;span class="c"&gt;# Solution 3: Use quantization&lt;/span&gt;
&lt;span class="nt"&gt;--quantization&lt;/span&gt; wint8
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Loading Failure:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Ensure trust_remote_code is added&lt;/span&gt;
&lt;span class="nt"&gt;--trust-remote-code&lt;/span&gt;

&lt;span class="c"&gt;# Check network connection and model download integrity&lt;/span&gt;
huggingface-cli download baidu/ERNIE-4.5-VL-28B-A3B-Thinking &lt;span class="nt"&gt;--resume-download&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Slow Inference:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Check if using optimized inference framework (vLLM/FastDeploy)&lt;/li&gt;
&lt;li&gt;Verify GPU utilization is normal&lt;/li&gt;
&lt;li&gt;Consider using batch processing mode&lt;/li&gt;
&lt;li&gt;Check if input image resolution is too high&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Q10: How to evaluate fine-tuning effectiveness?
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;A:&lt;/strong&gt; Recommended methods for evaluating fine-tuned models:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Quantitative Evaluation:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Calculate metrics on validation set
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn.metrics&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;accuracy_score&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;f1_score&lt;/span&gt;

&lt;span class="c1"&gt;# For classification tasks
&lt;/span&gt;&lt;span class="n"&gt;accuracy&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;accuracy_score&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_pred&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;f1&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;f1_score&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;y_true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y_pred&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;average&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;weighted&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# For generation tasks
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;rouge&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Rouge&lt;/span&gt;
&lt;span class="n"&gt;rouge&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Rouge&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;scores&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;rouge&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_scores&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;predictions&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;references&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;avg&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;2. Qualitative Evaluation:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Manual inspection of generation quality&lt;/li&gt;
&lt;li&gt;Compare outputs before and after fine-tuning&lt;/li&gt;
&lt;li&gt;Test edge cases and difficult samples&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;3. Business Metrics:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;User satisfaction&lt;/li&gt;
&lt;li&gt;Task completion rate&lt;/li&gt;
&lt;li&gt;Error rate reduction&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Summary and Recommendations
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Core Advantages Summary
&lt;/h3&gt;

&lt;p&gt;ERNIE-4.5-VL-28B-A3B-Thinking represents a significant breakthrough in multimodal AI:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;🎯 Technical Innovation&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;MoE architecture achieves parameter efficiency breakthrough&lt;/li&gt;
&lt;li&gt;Large-scale reinforcement learning enhances reasoning capabilities&lt;/li&gt;
&lt;li&gt;Innovative "Thinking with Images" feature&lt;/li&gt;
&lt;li&gt;Native tool calling support&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;⚡ Outstanding Performance&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;3B activated parameters achieve top-tier model performance&lt;/li&gt;
&lt;li&gt;2-3x faster inference speed&lt;/li&gt;
&lt;li&gt;Significantly reduced memory footprint&lt;/li&gt;
&lt;li&gt;Leading performance across multiple benchmarks&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;🛠️ Comprehensive Features&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Visual reasoning and STEM problem solving&lt;/li&gt;
&lt;li&gt;Precise visual grounding capabilities&lt;/li&gt;
&lt;li&gt;Powerful video understanding&lt;/li&gt;
&lt;li&gt;Flexible tool calling mechanism&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;🚀 Flexible Deployment&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Multiple deployment options supported&lt;/li&gt;
&lt;li&gt;Quantization optimization lowers barriers&lt;/li&gt;
&lt;li&gt;Comprehensive documentation and examples&lt;/li&gt;
&lt;li&gt;Active community support&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;💼 Open Source Friendly&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Apache 2.0 license&lt;/li&gt;
&lt;li&gt;Commercial use supported&lt;/li&gt;
&lt;li&gt;Complete training toolchain&lt;/li&gt;
&lt;li&gt;Continuous version updates&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Application Scenario Analysis
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Application Domain&lt;/th&gt;
&lt;th&gt;Suitability&lt;/th&gt;
&lt;th&gt;Key Capabilities&lt;/th&gt;
&lt;th&gt;Typical Cases&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;EdTech&lt;/td&gt;
&lt;td&gt;⭐⭐⭐⭐⭐&lt;/td&gt;
&lt;td&gt;STEM Reasoning&lt;/td&gt;
&lt;td&gt;Homework grading, intelligent tutoring&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Industrial QC&lt;/td&gt;
&lt;td&gt;⭐⭐⭐⭐⭐&lt;/td&gt;
&lt;td&gt;Visual Grounding&lt;/td&gt;
&lt;td&gt;Defect detection, quality control&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Content Moderation&lt;/td&gt;
&lt;td&gt;⭐⭐⭐⭐⭐&lt;/td&gt;
&lt;td&gt;Video Understanding&lt;/td&gt;
&lt;td&gt;Video review, content classification&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Customer Service&lt;/td&gt;
&lt;td&gt;⭐⭐⭐⭐&lt;/td&gt;
&lt;td&gt;Multimodal Understanding&lt;/td&gt;
&lt;td&gt;Image-text support, Q&amp;amp;A&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Medical Imaging&lt;/td&gt;
&lt;td&gt;⭐⭐⭐⭐&lt;/td&gt;
&lt;td&gt;Visual Reasoning&lt;/td&gt;
&lt;td&gt;Image analysis, diagnostic assistance&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Autonomous Driving&lt;/td&gt;
&lt;td&gt;⭐⭐⭐⭐&lt;/td&gt;
&lt;td&gt;Scene Understanding&lt;/td&gt;
&lt;td&gt;Environment perception, decision support&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;E-commerce&lt;/td&gt;
&lt;td&gt;⭐⭐⭐⭐⭐&lt;/td&gt;
&lt;td&gt;Image Search&lt;/td&gt;
&lt;td&gt;Product recognition, recommendation systems&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Related Resource Links
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Official Channels:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;🤖 &lt;a href="https://ernie.baidu.com/" rel="noopener noreferrer"&gt;ERNIE Bot Online&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;🤗 &lt;a href="https://huggingface.co/baidu" rel="noopener noreferrer"&gt;Hugging Face Model Page&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;💻 &lt;a href="https://github.com/PaddlePaddle/ERNIE" rel="noopener noreferrer"&gt;GitHub Repository&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;📝 &lt;a href="https://yiyan.baidu.com/blog/ernie4.5" rel="noopener noreferrer"&gt;Official Blog&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://curateclick.com/blog/ernie-4.5-vl-28b-a3b-thinking-complete-guide" rel="noopener noreferrer"&gt;ERNIE-4.5-VL-28B-A3B-Thinking Multimodal AI Model Complete Guide&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
    </item>
    <item>
      <title>Gemini CLI Extensions: The Complete Developer's Guide to AI-Powered Command Line Customization (2025)</title>
      <dc:creator>Sienna</dc:creator>
      <pubDate>Thu, 09 Oct 2025 13:14:51 +0000</pubDate>
      <link>https://forem.com/sienna/gemini-cli-extensions-the-complete-developers-guide-to-ai-powered-command-line-customization-g2b</link>
      <guid>https://forem.com/sienna/gemini-cli-extensions-the-complete-developers-guide-to-ai-powered-command-line-customization-g2b</guid>
      <description>&lt;h2&gt;
  
  
  🎯 Core Highlights (TL;DR)
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Revolutionary Launch&lt;/strong&gt;: Google launched Gemini CLI extensions with 70+ ready-to-use integrations from industry leaders&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Seamless Integration&lt;/strong&gt;: Install any extension with a single command: &lt;code&gt;gemini extensions install &amp;lt;URL&amp;gt;&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Enterprise Ready&lt;/strong&gt;: Major partners including Stripe, Shopify, Postman, Figma, and Dynatrace provide official extensions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Open Ecosystem&lt;/strong&gt;: Build custom extensions using MCP servers, context files, and custom commands&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AI-Powered Intelligence&lt;/strong&gt;: Extensions teach Gemini CLI how to use tools effectively with built-in "playbooks"&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Table of Contents
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;What are Gemini CLI Extensions?&lt;/li&gt;
&lt;li&gt;How to Install and Use Extensions&lt;/li&gt;
&lt;li&gt;Industry Partner Extensions&lt;/li&gt;
&lt;li&gt;Google-Created Extensions&lt;/li&gt;
&lt;li&gt;Building Your Own Extensions&lt;/li&gt;
&lt;li&gt;Extension Architecture Deep Dive&lt;/li&gt;
&lt;li&gt;Best Practices and Use Cases&lt;/li&gt;
&lt;li&gt;FAQ&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  What are Gemini CLI Extensions?
&lt;/h2&gt;

&lt;p&gt;Gemini CLI extensions represent a paradigm shift in command-line development tools. Launched in October 2025, these extensions transform the &lt;a href="https://gemini-cli.xyz/extensions" rel="noopener noreferrer"&gt;Gemini CLI&lt;/a&gt; from a simple AI assistant into a comprehensive, personalized development environment.&lt;/p&gt;

&lt;h3&gt;
  
  
  Key Features:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Pre-packaged Intelligence&lt;/strong&gt;: Each extension contains built-in knowledge about how to use specific tools&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Zero Configuration&lt;/strong&gt;: Get meaningful results from the first command without complex setup&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Open Ecosystem&lt;/strong&gt;: Anyone can build and share extensions via GitHub&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tool Integration&lt;/strong&gt;: Connect databases, design platforms, payment services, and more&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;💡 &lt;strong&gt;Professional Tip&lt;/strong&gt;&lt;br&gt;
Extensions go beyond basic MCP (Model Context Protocol) connections by adding intelligence layers that understand context and best practices for each tool.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  How to Install and Use Extensions
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Installation Process
&lt;/h3&gt;

&lt;p&gt;Installing Gemini CLI extensions is remarkably straightforward:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Install from GitHub URL&lt;/span&gt;
gemini extensions &lt;span class="nb"&gt;install &lt;/span&gt;https://github.com/username/extension-name

&lt;span class="c"&gt;# Install from local folder&lt;/span&gt;
gemini extensions &lt;span class="nb"&gt;install&lt;/span&gt; ./local-extension-folder
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Extension Management Commands
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Command&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;th&gt;Example&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;gemini extensions list&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;View installed extensions&lt;/td&gt;
&lt;td&gt;Lists all active extensions&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;gemini extensions remove &amp;lt;name&amp;gt;&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Uninstall extension&lt;/td&gt;
&lt;td&gt;Remove specific extension&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;gemini extensions new &amp;lt;name&amp;gt; &amp;lt;type&amp;gt;&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Create new extension&lt;/td&gt;
&lt;td&gt;Generate extension template&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Usage Workflow
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;graph TD
    A[Discover Extension] --&amp;gt; B[Install with Single Command]
    B --&amp;gt; C[Extension Auto-Configures]
    C --&amp;gt; D[Use Natural Language Commands]
    D --&amp;gt; E[AI Executes with Context]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Industry Partner Extensions
&lt;/h2&gt;

&lt;p&gt;The launch includes official extensions from major technology companies, demonstrating enterprise-grade adoption:&lt;/p&gt;

&lt;h3&gt;
  
  
  Development &amp;amp; API Tools
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Postman Extension&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Generate API request collections automatically&lt;/li&gt;
&lt;li&gt;Manage workspaces through natural language&lt;/li&gt;
&lt;li&gt;Evaluate API performance and documentation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Stripe Extension&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Interact with Stripe API seamlessly&lt;/li&gt;
&lt;li&gt;Access comprehensive payment knowledge base&lt;/li&gt;
&lt;li&gt;Automate payment workflow setup&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Security &amp;amp; Monitoring
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Snyk Extension&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Integrate comprehensive security scanning&lt;/li&gt;
&lt;li&gt;Ensure code security from inception&lt;/li&gt;
&lt;li&gt;Automate vulnerability detection&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Dynatrace Extension&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Real-time application performance insights&lt;/li&gt;
&lt;li&gt;Root-cause analysis acceleration&lt;/li&gt;
&lt;li&gt;Availability monitoring from CLI&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Design &amp;amp; Content
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Figma Extension&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Generate code from design frames&lt;/li&gt;
&lt;li&gt;Extract design system context&lt;/li&gt;
&lt;li&gt;Ensure design-code consistency&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Shopify Extension&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Access Shopify developer ecosystem&lt;/li&gt;
&lt;li&gt;Search documentation intelligently&lt;/li&gt;
&lt;li&gt;Build serverless Shopify functions&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Data &amp;amp; Analytics
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Elastic Extension&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Search and analyze Elasticsearch data&lt;/li&gt;
&lt;li&gt;Connect to Elastic Cloud Serverless&lt;/li&gt;
&lt;li&gt;Integrate with developer workflows&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Google-Created Extensions
&lt;/h2&gt;

&lt;p&gt;Google has developed a comprehensive suite of extensions covering various development scenarios:&lt;/p&gt;

&lt;h3&gt;
  
  
  Cloud-Native Development
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Extension&lt;/th&gt;
&lt;th&gt;Primary Use Case&lt;/th&gt;
&lt;th&gt;Key Features&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Cloud Run&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Serverless deployment&lt;/td&gt;
&lt;td&gt;Local code to live URL in one step&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;GKE&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Kubernetes management&lt;/td&gt;
&lt;td&gt;Cluster health, application deployment&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;gcloud&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Google Cloud interaction&lt;/td&gt;
&lt;td&gt;Complete GCP environment control&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Observability&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Monitoring &amp;amp; troubleshooting&lt;/td&gt;
&lt;td&gt;Cloud environment insights&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Application Development
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Flutter Extension&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Create, build, and refactor Flutter apps&lt;/li&gt;
&lt;li&gt;AI-powered debugging assistance&lt;/li&gt;
&lt;li&gt;Maintenance automation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Firebase Extension&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Backend setup and management&lt;/li&gt;
&lt;li&gt;Real-time database configuration&lt;/li&gt;
&lt;li&gt;Authentication system setup&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Chrome DevTools Extension&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Live browser automation&lt;/li&gt;
&lt;li&gt;Performance analysis&lt;/li&gt;
&lt;li&gt;In-depth debugging capabilities&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  AI &amp;amp; Data Integration
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Genkit Extension&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Enhanced GenAI app development&lt;/li&gt;
&lt;li&gt;Flow management and debugging&lt;/li&gt;
&lt;li&gt;OpenTelemetry trace analysis&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Looker Extension&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Business data exploration&lt;/li&gt;
&lt;li&gt;Visualization generation&lt;/li&gt;
&lt;li&gt;Trend analysis automation&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;✅ &lt;strong&gt;Best Practice&lt;/strong&gt;&lt;br&gt;
Start with Google-created extensions to understand the ecosystem before building custom solutions.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Building Your Own Extensions
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Extension Components
&lt;/h3&gt;

&lt;p&gt;Gemini CLI extensions can bundle multiple components:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;Extension Structure:
├── MCP Servers (1 or more)
├── Context Files (GEMINI.md, custom types)
├── Excluded Tools (disable built-ins)
└── Custom Commands (slash commands)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Creation Templates
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Create MCP server extension&lt;/span&gt;
gemini extensions new my-extension mcp-server

&lt;span class="c"&gt;# Create custom commands extension&lt;/span&gt;
gemini extensions new my-extension custom-commands
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Development Workflow
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Choose Template&lt;/strong&gt;: Start with appropriate template type&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Define Context&lt;/strong&gt;: Create instructional context files&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Implement Tools&lt;/strong&gt;: Develop MCP server or custom commands&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Test Locally&lt;/strong&gt;: Validate functionality in development&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Package &amp;amp; Share&lt;/strong&gt;: Publish to GitHub for community use&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Extension Architecture Deep Dive
&lt;/h2&gt;

&lt;h3&gt;
  
  
  MCP Integration
&lt;/h3&gt;

&lt;p&gt;Extensions leverage the Model Context Protocol (MCP) for tool connectivity:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Raw Connection&lt;/strong&gt;: MCP provides basic tool access&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Intelligence Layer&lt;/strong&gt;: Extensions add context and best practices&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Seamless Experience&lt;/strong&gt;: AI understands how to use tools effectively&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Context Files
&lt;/h3&gt;

&lt;p&gt;Context files provide crucial guidance to the AI:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# Example GEMINI.md structure&lt;/span&gt;
&lt;span class="gu"&gt;## Tool Purpose&lt;/span&gt;
Brief description of what this tool does

&lt;span class="gu"&gt;## Usage Patterns&lt;/span&gt;
Common workflows and best practices

&lt;span class="gu"&gt;## Examples&lt;/span&gt;
Specific use cases and command patterns
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Custom Commands
&lt;/h3&gt;

&lt;p&gt;Slash commands encapsulate complex prompts:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Example custom command&lt;/span&gt;
/deploy-app &lt;span class="s2"&gt;"Deploy my application to production with monitoring"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Best Practices and Use Cases
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Development Workflow Integration
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Morning Routine Automation&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Check system health, deploy updates, review metrics&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; Check my GKE cluster health, deploy latest code to Cloud Run, and show me yesterday&lt;span class="s1"&gt;'s error rates
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Cross-Platform Development&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Flutter app with Firebase backend&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; Create a new Flutter app with Firebase authentication and Firestore integration
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Team Collaboration
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Code Review Process&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Automated security and quality checks&lt;/span&gt;
&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; Review my latest commits &lt;span class="k"&gt;for &lt;/span&gt;security vulnerabilities and suggest improvements
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Performance Optimization
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Scenario&lt;/th&gt;
&lt;th&gt;Extension Combination&lt;/th&gt;
&lt;th&gt;Benefit&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Full-stack debugging&lt;/td&gt;
&lt;td&gt;Chrome DevTools + Dynatrace&lt;/td&gt;
&lt;td&gt;Frontend and backend insights&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;API development&lt;/td&gt;
&lt;td&gt;Postman + Stripe&lt;/td&gt;
&lt;td&gt;Complete payment integration&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Security audit&lt;/td&gt;
&lt;td&gt;Snyk + Code Review&lt;/td&gt;
&lt;td&gt;Comprehensive vulnerability detection&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;blockquote&gt;
&lt;p&gt;⚠️ &lt;strong&gt;Important Note&lt;/strong&gt;&lt;br&gt;
Extensions work best when combined thoughtfully. Avoid installing too many similar extensions that might conflict.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  🤔 Frequently Asked Questions
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Q: How do Gemini CLI extensions differ from regular CLI tools?
&lt;/h3&gt;

&lt;p&gt;A: Unlike traditional CLI tools that require manual configuration and learning, Gemini CLI extensions come with built-in intelligence. They understand context, follow best practices automatically, and integrate seamlessly with natural language commands.&lt;/p&gt;

&lt;h3&gt;
  
  
  Q: Can I use multiple extensions simultaneously?
&lt;/h3&gt;

&lt;p&gt;A: Yes, extensions are designed to work together. You can combine different extensions to create powerful workflows, such as using Figma for design extraction while simultaneously deploying with Cloud Run.&lt;/p&gt;

&lt;h3&gt;
  
  
  Q: Are extensions secure for enterprise use?
&lt;/h3&gt;

&lt;p&gt;A: Extensions from verified partners like Stripe, Dynatrace, and Snyk undergo security reviews. For custom extensions, review the source code and ensure they follow security best practices before installation.&lt;/p&gt;

&lt;h3&gt;
  
  
  Q: How do I contribute to the extension ecosystem?
&lt;/h3&gt;

&lt;p&gt;A: Create your extension using the provided templates, test thoroughly, publish to GitHub, and submit to the &lt;a href="https://gemini-cli.xyz/extensions" rel="noopener noreferrer"&gt;Gemini CLI Extensions gallery&lt;/a&gt; for community visibility.&lt;/p&gt;

&lt;h3&gt;
  
  
  Q: What's the difference between MCP servers and extensions?
&lt;/h3&gt;

&lt;p&gt;A: MCP servers provide raw tool connectivity, while extensions add intelligence, context, and best practices. Extensions can bundle MCP servers with additional guidance for optimal AI interaction.&lt;/p&gt;

&lt;h3&gt;
  
  
  Q: Can extensions work offline?
&lt;/h3&gt;

&lt;p&gt;A: Some extensions require internet connectivity for API access, but local extensions with custom commands and context files can function offline once installed.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion and Next Steps
&lt;/h2&gt;

&lt;p&gt;Gemini CLI extensions represent a significant evolution in developer tooling, transforming the command line from a simple interface into an intelligent, personalized development environment. With over 70 extensions already available and major industry partners contributing official integrations, the ecosystem is rapidly maturing.&lt;/p&gt;

&lt;h3&gt;
  
  
  Immediate Action Items:
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Explore the Gallery&lt;/strong&gt;: Visit the &lt;a href="https://gemini-cli.xyz/extensions" rel="noopener noreferrer"&gt;official extensions page&lt;/a&gt; to discover relevant tools&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Start with Partners&lt;/strong&gt;: Install extensions from trusted partners like Stripe, Postman, or Figma&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Experiment with Google Extensions&lt;/strong&gt;: Try Cloud Run or Firebase extensions for cloud development&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Build Custom Solutions&lt;/strong&gt;: Use templates to create extensions for your specific workflows&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Join the Community&lt;/strong&gt;: Contribute to the growing ecosystem by sharing your extensions&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The future of command-line development is here, and it's more intelligent, integrated, and accessible than ever before. Whether you're a solo developer or part of an enterprise team, Gemini CLI extensions offer the tools to build the personalized development environment you've always wanted.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;🚀 &lt;strong&gt;Ready to Get Started?&lt;/strong&gt;&lt;br&gt;
Install your first extension today: &lt;code&gt;gemini extensions install https://github.com/postmanlabs/postman-mcp-server&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://curateclick.com/blog/gemini-cli-extensions" rel="noopener noreferrer"&gt;Gemini CLI Extensions Guide&lt;/a&gt;&lt;/p&gt;

</description>
      <category>gemini</category>
    </item>
    <item>
      <title>Agentic Commerce Protocol (ACP)</title>
      <dc:creator>Sienna</dc:creator>
      <pubDate>Sun, 05 Oct 2025 04:08:27 +0000</pubDate>
      <link>https://forem.com/sienna/agentic-commerce-protocol-acp-3j2a</link>
      <guid>https://forem.com/sienna/agentic-commerce-protocol-acp-3j2a</guid>
      <description>&lt;p&gt;&lt;strong&gt;&lt;a href="https://agentic-commerce-protocol.com/" rel="noopener noreferrer"&gt;Agentic Commerce Protocol&lt;/a&gt;&lt;/strong&gt; is an &lt;strong&gt;open standard&lt;/strong&gt; for programmatic commerce flows between buyers, AI agents, and businesses.&lt;/p&gt;

&lt;h2&gt;
  
  
  🔑 Key Features
&lt;/h2&gt;

&lt;h3&gt;
  
  
  📖 &lt;strong&gt;Open Source&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Open source under Apache 2.0 license&lt;/li&gt;
&lt;li&gt;Community-designed&lt;/li&gt;
&lt;li&gt;Enables businesses to transact with any AI agent or payment processor&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  🏢 &lt;strong&gt;Business-Friendly&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Businesses maintain customer relationships as the merchant of record&lt;/li&gt;
&lt;li&gt;Retain control over which products can be sold, how they're presented, and order fulfillment&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  🔄 &lt;strong&gt;Supports Complex Flows&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Supports various commerce types including physical/digital goods, subscriptions, and asynchronous purchases&lt;/li&gt;
&lt;li&gt;Flexible configuration options&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  🔌 &lt;strong&gt;Technology Compatibility&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Compatible with REST API and MCP (Model Context Protocol)&lt;/li&gt;
&lt;li&gt;Integrates with existing commerce backends and payment processors&lt;/li&gt;
&lt;li&gt;Works with any technology stack&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  🔒 &lt;strong&gt;Security and PCI Compliance&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Securely passes payment credentials from buyers to AI agents&lt;/li&gt;
&lt;li&gt;Maintains security without exposing underlying payment credentials&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  🎯 Benefits by Stakeholder
&lt;/h2&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;For Businesses&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Reach more customers through AI agents&lt;/li&gt;
&lt;li&gt;Sell to high-intent buyers using existing commerce infrastructure&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;For AI Agents&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Embed commerce functionality into applications&lt;/li&gt;
&lt;li&gt;Enable users to transact directly with businesses without being the merchant of record&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;For Payment Providers&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Increase transaction volume by processing agentic transactions through AI agents&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  💡 Real-World Application
&lt;/h2&gt;

&lt;p&gt;Integrated with &lt;strong&gt;ChatGPT's Instant Checkout&lt;/strong&gt;, enabling agentic payment processing through Stripe and other ACP-compatible payment service providers.&lt;/p&gt;

&lt;p&gt;ACP aims to build a standardized protocol for AI-era commerce that benefits all participants in the ecosystem.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://agentic-commerce-protocol.com/docs/" rel="noopener noreferrer"&gt;Agentic Commerce Protocol Documentation&lt;/a&gt;&lt;/p&gt;

</description>
      <category>acp</category>
    </item>
    <item>
      <title>Making Documentation Simple: The Complete Markdown Cheat Sheet Guide</title>
      <dc:creator>Sienna</dc:creator>
      <pubDate>Sun, 05 Oct 2025 04:04:50 +0000</pubDate>
      <link>https://forem.com/sienna/making-documentation-simple-the-complete-markdown-cheat-sheet-guide-3gd0</link>
      <guid>https://forem.com/sienna/making-documentation-simple-the-complete-markdown-cheat-sheet-guide-3gd0</guid>
      <description>&lt;p&gt;In this information-rich era, whether you're a developer, technical writer, or content creator, there's one core skill you can't do without—&lt;strong&gt;writing clear and beautiful documentation&lt;/strong&gt;. Markdown is the magic tool that makes it all simple.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Do You Need &lt;a href="https://markdowncheatsheet.com/" rel="noopener noreferrer"&gt;Markdown Cheat Sheet&lt;/a&gt;?
&lt;/h2&gt;

&lt;p&gt;Imagine these scenarios:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You're writing a GitHub README file and suddenly forget the table syntax&lt;/li&gt;
&lt;li&gt;You want to insert a code block in your documentation but aren't sure how to add syntax highlighting&lt;/li&gt;
&lt;li&gt;You're a Markdown beginner and need a systematic learning resource&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;One-stop solution for all your Markdown syntax needs with using Markdown Cheat Sheet.&lt;/p&gt;

&lt;h2&gt;
  
  
  Core Features
&lt;/h2&gt;

&lt;h3&gt;
  
  
  📚 1. &lt;a href="https://markdowncheatsheet.com/reference" rel="noopener noreferrer"&gt;Complete Syntax Reference Library&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;This isn't just a simple syntax list, but a &lt;strong&gt;comprehensive, systematic Markdown knowledge base&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Basic Syntax&lt;/strong&gt;: Headers, bold, italic, links, images&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Advanced Features&lt;/strong&gt;: Tables, code blocks, task lists, blockquotes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Practical Tips&lt;/strong&gt;: Each syntax comes with clear examples and best practices&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The most thoughtful feature? &lt;strong&gt;Click any example to copy it instantly&lt;/strong&gt;, saving you the trouble of manual typing.&lt;/p&gt;

&lt;h3&gt;
  
  
  ⚡ 2. &lt;a href="https://markdowncheatsheet.com/editor" rel="noopener noreferrer"&gt;Real-Time Online Editor&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;Theory is good, but practice is better. The website's &lt;strong&gt;real-time preview editor&lt;/strong&gt; lets you:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Write Markdown code on the left&lt;/li&gt;
&lt;li&gt;See rendered results instantly on the right&lt;/li&gt;
&lt;li&gt;Export to HTML or PDF&lt;/li&gt;
&lt;li&gt;Auto-save feature so you never lose content&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This "what you see is what you get" experience makes learning Markdown intuitive and efficient.&lt;/p&gt;

&lt;h2&gt;
  
  
  Designed for Everyone
&lt;/h2&gt;

&lt;h3&gt;
  
  
  👨‍💻 Developers
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;GitHub README file writing&lt;/li&gt;
&lt;li&gt;Technical documentation&lt;/li&gt;
&lt;li&gt;Code comments and explanations&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  ✍️ Technical Writers
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Blog post creation&lt;/li&gt;
&lt;li&gt;Tutorial and guide writing&lt;/li&gt;
&lt;li&gt;Product documentation&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  🎓 Students and Educators
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Class note organization&lt;/li&gt;
&lt;li&gt;Assignment and report writing&lt;/li&gt;
&lt;li&gt;Learning material creation&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Conclusion: Bringing Documentation Back to Simplicity
&lt;/h2&gt;

&lt;p&gt;In today's world where technical documentation is increasingly important, Markdown has become the de facto standard format. And &lt;strong&gt;Markdown Cheat Sheet&lt;/strong&gt; is the best companion to help you master this skill.&lt;/p&gt;

&lt;p&gt;Whether you are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;🔰 A newcomer just starting with Markdown&lt;/li&gt;
&lt;li&gt;💼 A professional needing to write documentation efficiently&lt;/li&gt;
&lt;li&gt;🎯 A creator wanting to improve documentation quality&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;No more syntax worries, no more formatting headaches&lt;/strong&gt;—visit &lt;a href="https://markdowncheatsheet.com" rel="noopener noreferrer"&gt;Markdown Cheat Sheet&lt;/a&gt; and make documentation writing simple and enjoyable.&lt;/p&gt;




&lt;p&gt;👉&lt;a href="https://markdowncheatsheet.com/editor" rel="noopener noreferrer"&gt;MarkDown Online Editor&lt;/a&gt;&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Generate images with Nano Banana</title>
      <dc:creator>Sienna</dc:creator>
      <pubDate>Fri, 29 Aug 2025 11:57:36 +0000</pubDate>
      <link>https://forem.com/sienna/generate-images-with-nano-banana-1dhm</link>
      <guid>https://forem.com/sienna/generate-images-with-nano-banana-1dhm</guid>
      <description>&lt;p&gt;Nano Banana is Google Gemini’s new text-to-image editing feature that lets you create or modify pictures just by describing what you want in plain language—no manual tools or sliders needed.&lt;/p&gt;

&lt;p&gt;Integrated Nano Banana into project &lt;a href="https://qwq32.com" rel="noopener noreferrer"&gt;QWQ AI&lt;/a&gt; to generate images.&lt;/p&gt;

&lt;p&gt;Example：&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzqgq91pz52m2dh9w5mo9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzqgq91pz52m2dh9w5mo9.png" alt="dog with Nano Banana" width="800" height="462"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqng9gc3q67ek38l7mu1b.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqng9gc3q67ek38l7mu1b.png" alt="dog with Nano Banana after" width="800" height="564"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;➡&lt;a href="https://qwq32.com/image-editor" rel="noopener noreferrer"&gt;image editor&lt;/a&gt;&lt;br&gt;
  &lt;a href="https://qwq32.com/text-to-image" rel="noopener noreferrer"&gt;text to image&lt;/a&gt;&lt;/p&gt;

</description>
      <category>nanobanana</category>
      <category>aiimage</category>
    </item>
    <item>
      <title>2025 Complete Guide: How to Use an AI Chinese Name Generator to Create a Meaningful Chinese Name</title>
      <dc:creator>Sienna</dc:creator>
      <pubDate>Wed, 20 Aug 2025 03:05:24 +0000</pubDate>
      <link>https://forem.com/sienna/2025-complete-guide-how-to-use-an-ai-chinese-name-generator-to-create-a-meaningful-chinese-name-11lo</link>
      <guid>https://forem.com/sienna/2025-complete-guide-how-to-use-an-ai-chinese-name-generator-to-create-a-meaningful-chinese-name-11lo</guid>
      <description>&lt;h2&gt;
  
  
  🎯 TL;DR
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Enter your original name, gender, and birth date—the AI returns three Chinese names with Wu Xing analysis in 30 seconds.
&lt;/li&gt;
&lt;li&gt;Choose from four styles—traditional, modern, literary, or fusion—and lock in two- or three-character names.
&lt;/li&gt;
&lt;li&gt;Each result includes character meanings, five-element balance, and cultural backstories—ready for social media.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What Is an AI Chinese Name Generator?
&lt;/h2&gt;

&lt;p&gt;An AI Chinese Name Generator is an online tool that combines &lt;strong&gt;deep learning + traditional Wu Xing theory&lt;/strong&gt;. It analyzes six dimensions to craft Chinese names that sound elegant, carry positive meanings, and balance the five elements:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Dimension&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;th&gt;Example&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Gender energy&lt;/td&gt;
&lt;td&gt;Yin-yang balance&lt;/td&gt;
&lt;td&gt;Male names: rising tone; Female: softer characters&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Birth BaZi&lt;/td&gt;
&lt;td&gt;Five-element compensation&lt;/td&gt;
&lt;td&gt;Lack of Wood → add “梓、森”&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Style preference&lt;/td&gt;
&lt;td&gt;Personalization&lt;/td&gt;
&lt;td&gt;Traditional / Modern / Literary / Fusion&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Meaning keywords&lt;/td&gt;
&lt;td&gt;Precise semantics&lt;/td&gt;
&lt;td&gt;Input “wisdom” → “睿、哲”&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Character count&lt;/td&gt;
&lt;td&gt;Scenario adaptation&lt;/td&gt;
&lt;td&gt;Multiple choices&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  5-Step Guide to Your Exclusive Chinese Name
&lt;/h2&gt;

&lt;h3&gt;
  
  
  📊 Flowchart
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5vzmmlmq44b97ttk7on3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5vzmmlmq44b97ttk7on3.png" alt="Flowchart" width="456" height="1302"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Next-Step Action List
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;🚀 &lt;strong&gt;Try it now&lt;/strong&gt;: Visit &lt;a href="https://namagenerator.com/en" rel="noopener noreferrer"&gt;Generator Nama China&lt;/a&gt;, enter your details, and get three candidates in 30 seconds.&lt;/li&gt;
&lt;/ol&gt;

&lt;blockquote&gt;
&lt;p&gt;✅ &lt;strong&gt;Bottom line&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
The AI Chinese Name Generator blends five-element theory, BaZi analysis, and cultural significance to create authentic Chinese names.&lt;/p&gt;
&lt;/blockquote&gt;

</description>
      <category>ai</category>
    </item>
    <item>
      <title>Kiro Steering Guide</title>
      <dc:creator>Sienna</dc:creator>
      <pubDate>Sat, 19 Jul 2025 13:22:38 +0000</pubDate>
      <link>https://forem.com/sienna/kiro-steering-guide-ae7</link>
      <guid>https://forem.com/sienna/kiro-steering-guide-ae7</guid>
      <description>&lt;h2&gt;
  
  
  What is Steering?
&lt;/h2&gt;

&lt;p&gt;Steering gives Kiro persistent knowledge about your project through markdown files in &lt;code&gt;.kiro/steering/&lt;/code&gt;. Unlike traditional approaches like Cursor's &lt;code&gt;.cursorrules&lt;/code&gt; which provide basic configuration, Steering represents a more advanced and sophisticated way to manage AI assistant context. While &lt;code&gt;.cursorrules&lt;/code&gt; offers simple rule-based guidance, Steering provides comprehensive, structured, and contextual project knowledge that evolves with your codebase.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Steering vs .cursorrules Comparison:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Cursor's .cursorrules&lt;/strong&gt;: Simple configuration file with basic rules&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Kiro's Steering&lt;/strong&gt;: Advanced markdown-based knowledge system with conditional loading, file references, and structured documentation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Instead of explaining your conventions in every chat, steering files ensure Kiro consistently follows your established patterns, libraries, and standards.&lt;/p&gt;

&lt;h2&gt;
  
  
  Key Benefits
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Consistent Code Generation&lt;/strong&gt; - Every component, API endpoint, or test follows your team's established patterns and conventions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Reduced Repetition&lt;/strong&gt; - No need to explain project standards in each conversation. Kiro remembers your preferences.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Team Alignment&lt;/strong&gt; - All developers work with the same standards, whether they're new to the project or seasoned contributors.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Scalable Project Knowledge&lt;/strong&gt; - Documentation that grows with your codebase, capturing decisions and patterns as your project evolves.&lt;/p&gt;

&lt;h2&gt;
  
  
  Default Steering Files
&lt;/h2&gt;

&lt;p&gt;Kiro automatically creates three foundational files that establish core project context:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Product Overview&lt;/strong&gt; (&lt;code&gt;product.md&lt;/code&gt;) - Defines your product's purpose, target users, key features, and business objectives. This helps Kiro understand the "why" behind technical decisions and suggest solutions aligned with your product goals.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Technology Stack&lt;/strong&gt; (&lt;code&gt;tech.md&lt;/code&gt;) - Documents your chosen frameworks, libraries, development tools, and technical constraints. When Kiro suggests implementations, it will prefer your established stack over alternatives.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Project Structure&lt;/strong&gt; (&lt;code&gt;structure.md&lt;/code&gt;) - Outlines file organization, naming conventions, import patterns, and architectural decisions. This ensures generated code fits seamlessly into your existing codebase.&lt;/p&gt;

&lt;p&gt;These foundation files are included in every interaction by default, forming the baseline of Kiro's project understanding.&lt;/p&gt;

&lt;h2&gt;
  
  
  Creating Custom Steering Files
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fox31vwhxdxa8jlnytono.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fox31vwhxdxa8jlnytono.png" alt="Creating Custom Steering Files" width="409" height="1249"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Extend Kiro's understanding with specialized guidance tailored to your project's unique needs:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Navigate to the &lt;strong&gt;Steering&lt;/strong&gt; section in the Kiro panel&lt;/li&gt;
&lt;li&gt;Click the &lt;strong&gt;+&lt;/strong&gt; button to create a new &lt;code&gt;.md&lt;/code&gt; file&lt;/li&gt;
&lt;li&gt;Choose a descriptive filename (e.g., &lt;code&gt;api-standards.md&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Write your guidance using standard markdown syntax&lt;/li&gt;
&lt;li&gt;Use natural language to describe your requirements, then select the &lt;strong&gt;Refine&lt;/strong&gt; button and Kiro will format it&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Inclusion Modes
&lt;/h2&gt;

&lt;p&gt;Steering files can be configured to load at different times based on your needs. This flexibility helps optimize performance and ensures relevant context is available when needed.&lt;/p&gt;

&lt;p&gt;Configure inclusion modes by adding front matter to the top of your steering files. The front matter uses YAML syntax and must be placed at the very beginning of the file, enclosed by triple dashes (&lt;code&gt;---&lt;/code&gt;).&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi4evvbh6bn1m5skpbrn2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi4evvbh6bn1m5skpbrn2.png" alt="Inclusion Modes" width="800" height="840"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Always Included (Default)
&lt;/h3&gt;

&lt;p&gt;These files are loaded into every Kiro interaction automatically. Use this mode for core standards that should influence all code generation and suggestions. Examples include your technology stack, coding conventions, and fundamental architectural principles.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Best for&lt;/strong&gt;: Project-wide standards, technology preferences, security policies, and coding conventions that apply universally.&lt;/p&gt;

&lt;h3&gt;
  
  
  Conditional Inclusion
&lt;/h3&gt;

&lt;p&gt;Files are automatically included only when working with files that match the specified pattern. This keeps context relevant and reduces noise by loading specialized guidance only when needed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Common patterns&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;"*.tsx"&lt;/code&gt; - React components and JSX files&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;"app/api/**/*"&lt;/code&gt; - API routes and backend logic&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;"**/*.test.*"&lt;/code&gt; - Test files and testing utilities&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;"src/components/**/*"&lt;/code&gt; - Component-specific guidelines&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;"*.md"&lt;/code&gt; - Documentation files&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Best for&lt;/strong&gt;: Domain-specific standards like component patterns, API design rules, testing approaches, or deployment procedures that only apply to certain file types.&lt;/p&gt;

&lt;h3&gt;
  
  
  Manual Inclusion
&lt;/h3&gt;

&lt;p&gt;Files are available on-demand by referencing them with &lt;code&gt;#steering-file-name&lt;/code&gt; in your chat messages. This gives you precise control over when specialized context is needed without cluttering every interaction.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Usage&lt;/strong&gt;: Type &lt;code&gt;#troubleshooting-guide&lt;/code&gt; or &lt;code&gt;#performance-optimization&lt;/code&gt; in chat to include that steering file for the current conversation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Best for&lt;/strong&gt;: Specialized workflows, troubleshooting guides, migration procedures, or context-heavy documentation that's only needed occasionally.&lt;/p&gt;

&lt;h2&gt;
  
  
  File References
&lt;/h2&gt;

&lt;p&gt;Link to live project files to keep steering current:&lt;/p&gt;

&lt;p&gt;Examples:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;API specs: &lt;code&gt;#[[file:api/openapi.yaml]]&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Component patterns: &lt;code&gt;#[[file:components/ui/button.tsx]]&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Config templates: &lt;code&gt;#[[file:.env.example]]&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Best Practices
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Keep Files Focused&lt;/strong&gt; One domain per file - API design, testing, or deployment procedures.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use Clear Names&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;api-rest-conventions.md&lt;/code&gt; - REST API standards&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;testing-unit-patterns.md&lt;/code&gt; - Unit testing approaches&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;components-form-validation.md&lt;/code&gt; - Form component standards&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Include Context&lt;/strong&gt; Explain why decisions were made, not just what the standards are.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Provide Examples&lt;/strong&gt; Use code snippets and before/after comparisons to demonstrate standards.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Security First&lt;/strong&gt; Never include API keys, passwords, or sensitive data. Steering files are part of your codebase.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Maintain Regularly&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Review during sprint planning and architecture changes&lt;/li&gt;
&lt;li&gt;Test file references after restructuring&lt;/li&gt;
&lt;li&gt;Treat steering changes like code changes - require reviews&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Common Steering File Strategies
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;API Standards&lt;/strong&gt; (&lt;code&gt;api-standards.md&lt;/code&gt;) - Define REST conventions, error response formats, authentication flows, and versioning strategies. Include endpoint naming patterns, HTTP status code usage, and request/response examples.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Testing Approach&lt;/strong&gt; (&lt;code&gt;testing-standards.md&lt;/code&gt;) - Establish unit test patterns, integration test strategies, mocking approaches, and coverage expectations. Document preferred testing libraries, assertion styles, and test file organization.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Code Style&lt;/strong&gt; (&lt;code&gt;code-conventions.md&lt;/code&gt;) - Specify naming patterns, file organization, import ordering, and architectural decisions. Include examples of preferred code structures, component patterns, and anti-patterns to avoid.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Security Guidelines&lt;/strong&gt; (&lt;code&gt;security-policies.md&lt;/code&gt;) - Document authentication requirements, data validation rules, input sanitization standards, and vulnerability prevention measures. Include secure coding practices specific to your application.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Deployment Process&lt;/strong&gt; (&lt;code&gt;deployment-workflow.md&lt;/code&gt;) - Outline build procedures, environment configurations, deployment steps, and rollback strategies. Include CI/CD pipeline details and environment-specific requirements.&lt;/p&gt;

&lt;h2&gt;
  
  
  Case Study: Language Preference Configuration
&lt;/h2&gt;

&lt;p&gt;Here's a practical example of how to configure Kiro to use Chinese for responses:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;File&lt;/strong&gt;: &lt;code&gt;language-preferences.md&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gu"&gt;## Communication Guidelines&lt;/span&gt;
&lt;span class="p"&gt;
-&lt;/span&gt; Kiro uses Chinese for output
&lt;span class="p"&gt;-&lt;/span&gt; Frontend page content uses English  
&lt;span class="p"&gt;-&lt;/span&gt; Code comments use Chinese

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Implementation Notes
&lt;/h2&gt;

&lt;p&gt;This configuration ensures consistent language usage across different aspects of the development workflow while maintaining appropriate language choices for each context.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://aicodingtools.blog/en/kiro/kiro-steering-guide" rel="noopener noreferrer"&gt;Kiro Steering Guide&lt;/a&gt;&lt;/p&gt;

</description>
      <category>kiro</category>
      <category>ai</category>
    </item>
  </channel>
</rss>
