<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Evan Lin</title>
    <description>The latest articles on Forem by Evan Lin (@evanlin).</description>
    <link>https://forem.com/evanlin</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F409957%2Fc150d4a7-cb20-469d-a230-bac27232c577.jpeg</url>
      <title>Forem: Evan Lin</title>
      <link>https://forem.com/evanlin</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/evanlin"/>
    <language>en</language>
    <item>
      <title>Gemini API File Search: Enhanced Multimodal Capabilities with Embedding 2, Including Open-Source LINE Bot Implementation</title>
      <dc:creator>Evan Lin</dc:creator>
      <pubDate>Tue, 12 May 2026 04:17:48 +0000</pubDate>
      <link>https://forem.com/gde/gemini-api-file-search-enhanced-multimodal-capabilities-with-embedding-2-including-open-source-g72</link>
      <guid>https://forem.com/gde/gemini-api-file-search-enhanced-multimodal-capabilities-with-embedding-2-including-open-source-g72</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faeqghkoo1xi76898n5zl.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faeqghkoo1xi76898n5zl.png" alt="image-20260511221639333" width="800" height="325"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;(Image source: &lt;a href="https://blog.google/innovation-and-ai/technology/developers-tools/expanded-gemini-api-file-search-multimodal-rag/" rel="noopener noreferrer"&gt;Google Blog - Gemini API File Search is now multimodal: build efficient, verifiable RAG&lt;/a&gt;)&lt;/p&gt;

&lt;h1&gt;
  
  
  Recap: RAG Finally Doesn't Need to Build Legos
&lt;/h1&gt;

&lt;p&gt;In the past few years, whenever developers thought about RAG (Retrieval-Augmented Generation), the component list that came to mind probably looked like this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A chunker (langchain? Write it yourself?)&lt;/li&gt;
&lt;li&gt;An embedding model (OpenAI text-embedding-3? Cohere? BGE?)&lt;/li&gt;
&lt;li&gt;A vector database (ChromaDB, FAISS, pgvector, Pinecone… which one to choose is a battle)&lt;/li&gt;
&lt;li&gt;A retrieval + rerank process&lt;/li&gt;
&lt;li&gt;And then the LLM&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Not to mention that multimodal RAG needs another layer: How to embed images? Do you need to OCR first? Do you need to split two stores, one for text and one for images? How to calculate scores for mixed text and image search? Just these few questions can take up a sprint.&lt;/p&gt;

&lt;p&gt;Recently, Google released &lt;a href="https://blog.google/innovation-and-ai/technology/developers-tools/expanded-gemini-api-file-search-multimodal-rag/" rel="noopener noreferrer"&gt;Expanded Gemini API File Search for multimodal RAG&lt;/a&gt; on the developer blog, turning the long pipeline above into " &lt;strong&gt;calling a managed API&lt;/strong&gt; ", and &lt;strong&gt;images are natively supported&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;This article will do two things:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; Explain the new features clearly, including what &lt;strong&gt;Gemini Embedding 2&lt;/strong&gt; is doing behind the scenes.&lt;/li&gt;
&lt;li&gt; Use an &lt;strong&gt;open-source&lt;/strong&gt; LINE Bot (&lt;a href="https://github.com/kkdai/linebot-multimodal-rag" rel="noopener noreferrer"&gt;&lt;code&gt;kkdai/linebot-multimodal-rag&lt;/code&gt;&lt;/a&gt;) as a live demonstration to see how the new features are combined in actual production code — and share the two typical pitfalls I encountered during debugging to help everyone avoid them.&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Three Major Highlights of the New Features
&lt;/h2&gt;

&lt;p&gt;According to the official blog, the core of this upgrade is three things:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. True Multimodal File Search (Native Multimodal File Search)
&lt;/h3&gt;

&lt;p&gt;In the past, File Search was pure text retrieval, and images could only be indexed by OCRing them into text.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“File Search now processes images and text together. Powered by the Gemini Embedding 2 model, the tool understands native image data.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Now you can &lt;strong&gt;directly put images into the File Search Store&lt;/strong&gt;, and index them together with text. The engine behind it is &lt;strong&gt;Gemini Embedding 2&lt;/strong&gt; — text, images, videos, audio, and documents &lt;strong&gt;share the same vector space&lt;/strong&gt;, so you can "find text with images", "find images with text", or "find images with images" without having to align the spaces yourself.&lt;/p&gt;

&lt;p&gt;For us product people, this means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Mixed text and image search is no longer a research topic&lt;/strong&gt;, it's an API call.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;No need to maintain two stores&lt;/strong&gt; (one for text chunks and one for CLIP-style image embeddings).&lt;/li&gt;
&lt;li&gt;  Scientific charts, UI screenshots, reports, photo albums... these &lt;strong&gt;things that used to lose most of their meaning after OCR&lt;/strong&gt; can now retain the original visual information for retrieval.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2. Custom Metadata and Server-side Filtering
&lt;/h3&gt;

&lt;p&gt;Each file you put into the store can now be tagged with key-value labels:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"key"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"user_id"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"string_value"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"U1234abcd..."&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"key"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"department"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"string_value"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Legal"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"key"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"status"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"string_value"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Final"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Use the &lt;a href="https://google.aip.dev/160" rel="noopener noreferrer"&gt;google.aip.dev/160&lt;/a&gt; filter syntax (same format as most GCP list APIs) when querying:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="n"&gt;metadata_filter&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'department="Legal" AND status="Final"'&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Filtering is done &lt;strong&gt;first on Google's side&lt;/strong&gt;, not retrieving a bunch and then discarding. After reducing the noise, the &lt;strong&gt;speed and accuracy will both increase&lt;/strong&gt;, which is a lifesaver for multi-tenant SaaS — one store with metadata filters can separate tenants, without the need to isolate N stores.&lt;/p&gt;

&lt;p&gt;My LINE Bot uses this directly to do &lt;strong&gt;per-user data isolation&lt;/strong&gt;: each time a file is uploaded, it's tagged with the LINE &lt;code&gt;user_id&lt;/code&gt;, and when querying, a filter is applied, so user A will never see user B's data in the Q&amp;amp;A.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Page-level Citations
&lt;/h3&gt;

&lt;p&gt;Each cited snippet in the response will now include the &lt;strong&gt;page number&lt;/strong&gt;.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“captures the page number for every piece of indexed information.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This is super critical for enterprise customers. "AI says Y is mentioned on page X of the contract" vs. "AI says Y is mentioned in the contract" — the former can be directly accepted by legal/auditing, while the latter requires manual effort to flip through the book for verification. Page numbers unlock the final mile of "LLM answers cannot be traced back to the source".&lt;/p&gt;




&lt;h2&gt;
  
  
  The Multimodal Engine: Gemini Embedding 2
&lt;/h2&gt;

&lt;p&gt;The core of the new feature is this &lt;a href="https://deepmind.google/models/gemini/embedding/" rel="noopener noreferrer"&gt;Gemini Embedding 2&lt;/a&gt; model. Quote its specifications for your selection decisions:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv6qi7ndky7i4xyvit5vs.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv6qi7ndky7i4xyvit5vs.png" alt="image-20260511221801984" width="800" height="534"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Item&lt;/th&gt;
&lt;th&gt;Specification&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Supported Input&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;Text, images, videos, audio, documents&lt;/strong&gt; (same embedding space)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Input token limit&lt;/td&gt;
&lt;td&gt;8,192 tokens&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Output dimensions&lt;/td&gt;
&lt;td&gt;128 ～ 3,072 (using Matryoshka Representation Learning, small dimensions can also maintain similar accuracy)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Multilingual support&lt;/td&gt;
&lt;td&gt;100+ languages&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Several key benchmarks (recall@1):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Text-to-Image Search&lt;/strong&gt;: TextCaps &lt;strong&gt;89.6&lt;/strong&gt; / Docci &lt;strong&gt;93.4&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Image-to-Text Search&lt;/strong&gt;: TextCaps &lt;strong&gt;97.4&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Multilingual (MTEB)&lt;/strong&gt;: mean &lt;strong&gt;69.9&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Video-Text Matching&lt;/strong&gt;: Vatex ndcg@10 &lt;strong&gt;68.8&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Speech-Text Retrieval&lt;/strong&gt;: MSEB mrr@10 &lt;strong&gt;73.9&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Several key observations:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Matryoshka is not a buzzword&lt;/strong&gt;: You can store it with 3072 dimensions first, and when running retrieval, switch to 768 dimensions to run faster and maintain quality. Storage/scoring costs can be optimized in stages.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Cross-modal scores are very real&lt;/strong&gt;: 97.4% recall@1 (image→text) means that if you have an image and want to find the corresponding descriptive text, you'll find it almost immediately. This can be directly implemented for use cases like "take a picture of a product label and find the corresponding page of the user manual".&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;100+ languages&lt;/strong&gt;: This is a very real difference for the Taiwan/Japan/Korea/Southeast Asia markets.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  What Developers Really Care About: Price and Access Cost
&lt;/h2&gt;

&lt;p&gt;From the official tutorial article &lt;a href="https://dev.to/googleai/multimodal-rag-with-the-gemini-api-file-search-tool-a-developer-guide-5878"&gt;Multimodal RAG with the Gemini API File Search tool: a developer guide&lt;/a&gt;, there are two sections that developers sensitive to cost should highlight:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Fully managed, with no vector database overhead.”&lt;/p&gt;

&lt;p&gt;“Storage and query-time embeddings are free. You only pay for indexing and tokens.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;In plain English:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;You don't pay for the vector database&lt;/strong&gt;, nor do you pay for the monthly salary of the people maintaining it.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Storage is free&lt;/strong&gt;, and &lt;strong&gt;embedding calculations at query time are also free&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;  You only have two things to pay for: &lt;strong&gt;the embedding fee for the initial indexing&lt;/strong&gt; and &lt;strong&gt;the LLM tokens consumed when generating the answer&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is a friendly cost curve for personal side projects and early startups — you don't need to decide on day one "can I afford the baseline of the vector DB".&lt;/p&gt;




&lt;h2&gt;
  
  
  Standard Workflow: 4 SDK calls to complete a RAG
&lt;/h2&gt;

&lt;p&gt;Organized from the dev.to guide, the minimum viable workflow:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;google&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;genai&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;google.genai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;types&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;genai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Client&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# 1. Create a store (specify the multimodal embedding model)
&lt;/span&gt;&lt;span class="n"&gt;store&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;file_search_stores&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;display_name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;my-multimodal-rag&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;embedding_model&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;models/gemini-embedding-2&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;

&lt;span class="c1"&gt;# 2. Upload files + custom metadata
&lt;/span&gt;&lt;span class="n"&gt;operation&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;file_search_stores&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;upload_to_file_search_store&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;file_search_store_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;store&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;report-q1.pdf&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;display_name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Q1 Report&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;custom_metadata&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;department&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;string_value&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Finance&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;year&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;string_value&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;2026&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# Upload is a long-running operation, needs to poll:
# operation = client.operations.get(operation)
&lt;/span&gt;
&lt;span class="c1"&gt;# 3. Feed file_search as a tool to generate_content
&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;models&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate_content&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gemini-3-flash-preview&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;contents&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;What was the revenue growth rate in the first quarter of last year?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;GenerateContentConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;file_search&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;FileSearch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;file_search_store_names&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;store&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="n"&gt;metadata_filter&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;department=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Finance&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; AND year=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;2026&lt;/span&gt;&lt;span class="sh"&gt;"'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;))],&lt;/span&gt;
    &lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# 4. Get citations (including page numbers)
&lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;citation&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;candidates&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;grounding_metadata&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;grounding_chunks&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;citation&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;web&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;uri&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;citation&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;web&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c1"&gt;# or the corresponding file/page fields
&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To provide citations with images to the user, there is also &lt;code&gt;client.file_search_stores.download_media()&lt;/code&gt; that can be called.&lt;/p&gt;

&lt;p&gt;It's no exaggeration, &lt;strong&gt;the entire multimodal RAG is less than 30 lines of code&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Demo Case: Putting These New Features into a LINE Bot
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fax8dlyjqm7ty2z00fv10.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fax8dlyjqm7ty2z00fv10.png" alt="image-20260511221916359" width="800" height="1734"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdzg2z445gd33i5ianvxd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdzg2z445gd33i5ianvxd.png" alt="image-20260511221851736" width="800" height="1734"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It's abstract just looking at the SDK examples, so I made it into a LINE Bot that can be put to work, open-sourced at &lt;a href="https://github.com/kkdai/linebot-multimodal-rag" rel="noopener noreferrer"&gt;&lt;code&gt;kkdai/linebot-multimodal-rag&lt;/code&gt;&lt;/a&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Users drop &lt;strong&gt;PDFs / images / text files&lt;/strong&gt; into the LINE chat box → Bot indexes into the File Search Store.&lt;/li&gt;
&lt;li&gt;  Users type questions → Gemini finds answers from the data &lt;strong&gt;uploaded by the user themselves&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;  Users drop an image and ask a question → The same can be done for image-to-text retrieval.&lt;/li&gt;
&lt;li&gt;  Deployment target: GCP Cloud Run + Cloud Build automatic deployment.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The architecture is very intuitive (key fields):&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Component&lt;/th&gt;
&lt;th&gt;Role&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;LINE Webhook&lt;/td&gt;
&lt;td&gt;FastAPI receives message events&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GCS&lt;/td&gt;
&lt;td&gt;Persists original files (&lt;code&gt;uploads/{user_id}/{message_id}.{ext}&lt;/code&gt;)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Gemini File Search Store&lt;/td&gt;
&lt;td&gt;The only index layer (managed)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Custom metadata &lt;code&gt;user_id&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Multi-tenant isolation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;FastAPI BackgroundTasks&lt;/td&gt;
&lt;td&gt;Avoid the LINE reply token 30-second limit&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Comparing to the three major new features mentioned earlier:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Multimodal&lt;/strong&gt;: Users drop images, drop PDFs, all go into the same store, and all consume the same pipeline during search.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Custom metadata&lt;/strong&gt;: Files for each LINE user are tagged with &lt;code&gt;user_id&lt;/code&gt;, filtered during queries, achieving server-side forced isolation.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Page-level citations&lt;/strong&gt;: In the future, to display "the answer comes from XX.pdf page 5" in LINE messages, directly consume &lt;code&gt;grounding_metadata&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The entire repo is about 600 lines of Python, and it completes a " &lt;strong&gt;your own private multimodal knowledge base chat Bot&lt;/strong&gt; ".&lt;/p&gt;




&lt;h2&gt;
  
  
  Deployment Battle: commit → automatic online
&lt;/h2&gt;

&lt;p&gt;It's not enough for the open-source example to just run; to demonstrate it at the workshop, it needs to be at the level of "code changes, push to GitHub, and automatically deploy". This time, I asked &lt;a href="https://docs.anthropic.com/en/docs/claude-code" rel="noopener noreferrer"&gt;Claude Code&lt;/a&gt; to be my co-pilot to help me connect CI/CD.&lt;/p&gt;

&lt;p&gt;I only dropped one sentence:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Help me create a Cloud Build connection to GitHub, and trigger a build to deploy to Cloud Run after committing to main."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Claude Code first scanned &lt;code&gt;cloudbuild.yaml&lt;/code&gt;, existing Cloud Run settings, Secret Manager, and Artifact Registry, and listed a "current problem", and then &lt;strong&gt;stopped to ask me a key decision&lt;/strong&gt;: Should I keep the existing service name or change the yaml? Does GitHub need authorization? After I answered, it built the missing resources in one go:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Build Artifact Registry repo&lt;/span&gt;
gcloud artifacts repositories create linebot &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--repository-format&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;docker &lt;span class="nt"&gt;--location&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;asia-east1

&lt;span class="c"&gt;# Secret migration: move from the current service to Secret Manager (via stdin, don't leave shell history)&lt;/span&gt;
gcloud run services describe linebot-gemini-file-search &lt;span class="nt"&gt;--region&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;asia-east1 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--format&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'value(...)'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  | gcloud secrets create LINE_CHANNEL_SECRET &lt;span class="nt"&gt;--data-file&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;-

&lt;span class="c"&gt;# Give Cloud Build / Compute SA the roles needed for deployment&lt;/span&gt;
&lt;span class="k"&gt;for &lt;/span&gt;role &lt;span class="k"&gt;in &lt;/span&gt;run.admin iam.serviceAccountUser artifactregistry.writer &lt;span class="se"&gt;\&lt;/span&gt;
            secretmanager.secretAccessor storage.objectAdmin logging.logWriter&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;do
  &lt;/span&gt;gcloud projects add-iam-policy-binding your-cool-project-id &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--member&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"serviceAccount:660825558664-compute@developer.gserviceaccount.com"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--role&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"roles/&lt;/span&gt;&lt;span class="nv"&gt;$role&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="nt"&gt;--condition&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;None
&lt;span class="k"&gt;done&lt;/span&gt;

&lt;span class="c"&gt;# Build trigger&lt;/span&gt;
gcloud builds triggers create github &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;linebot-multimodal-rag-main &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--repo-owner&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;kkdai &lt;span class="nt"&gt;--repo-name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;linebot-multimodal-rag &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--branch-pattern&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"^main$"&lt;/span&gt; &lt;span class="nt"&gt;--build-config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;cloudbuild.yaml

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The only thing that couldn't be automated was &lt;strong&gt;GitHub OAuth authorization&lt;/strong&gt; — Claude Code directly admitted to me that "this step can only be done by clicking in the Console", and provided the URL and step-by-step instructions. After finishing the one-minute click, the trigger ran through.&lt;/p&gt;




&lt;h2&gt;
  
  
  Pitfalls Record: Two Traps Directly Related to the New Features
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Pitfall 1: Hardcoded Model ID is Outdated
&lt;/h3&gt;

&lt;p&gt;The default values in &lt;code&gt;cloudbuild.yaml&lt;/code&gt; and code both write &lt;code&gt;gemini-3.1-flash&lt;/code&gt;, but after looking at the &lt;a href="https://ai.google.dev/gemini-api/docs/models" rel="noopener noreferrer"&gt;Gemini API's current model id list&lt;/a&gt;: there's no such model at all. The correct ID for Gemini 3 Flash is &lt;code&gt;gemini-3-flash-preview&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why this happened&lt;/strong&gt;: multimodal RAG is a very new feature, and related documents, tutorials, and examples are still being created in large numbers, and the naming has also been slightly adjusted. The initial version of the Repo can easily write an id that "looks like it but doesn't actually exist".&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Solution&lt;/strong&gt;: Change the entire repo to &lt;code&gt;gemini-3-flash-preview&lt;/code&gt;, and also confirm that the embedding model is &lt;code&gt;models/gemini-embedding-2&lt;/code&gt; (correct, didn't step on the trap). After pushing, Cloud Build automatically triggered, and a new revision went online in three minutes.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pitfall 2: Mysterious "Upload has already been terminated"
&lt;/h3&gt;

&lt;p&gt;This trap was directly stepped on the " &lt;strong&gt;image upload&lt;/strong&gt; " path newly supported by File Search Store — it's also the most worth sharing, because it demonstrates that "the error messages of new APIs are sometimes very euphemistic".&lt;/p&gt;

&lt;p&gt;I sent a JPG from LINE to the Bot and clicked "store in database", and the result:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;❌ Failed to store: 400 Bad Request. {'message': 'Upload has already been terminated.', 'status': 'Bad Request'}

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Couldn't see the reason at all. Cloud Logging only had the same error, no stack trace. After looking around on the &lt;a href="https://discuss.ai.google.dev/" rel="noopener noreferrer"&gt;Google AI Developers Forum&lt;/a&gt;, I found that several file types (.md / .xlsx / large CSV) had encountered similar reports.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The real culprit&lt;/strong&gt; is hidden in this seemingly innocent code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# app/gemini_service.py (before modification)
&lt;/span&gt;&lt;span class="n"&gt;suffix&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;mimetypes&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;guess_extension&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mime_type&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.bin&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;tempfile&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;NamedTemporaryFile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;suffix&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;suffix&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;delete&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;tmp&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;tmp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;file_bytes&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;tmp_path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tmp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Before Python 3.13, &lt;code&gt;mimetypes.guess_extension("image/jpeg")&lt;/code&gt; &lt;strong&gt;returns &lt;code&gt;.jpe&lt;/code&gt;, not &lt;code&gt;.jpg&lt;/code&gt;&lt;/strong&gt;. The reason is that in the MIME table of the standard library, &lt;code&gt;.jpe&lt;/code&gt; is lexicographically before &lt;code&gt;.jpg&lt;/code&gt;, and this quirk has existed for nearly twenty years.&lt;/p&gt;

&lt;p&gt;Gemini File Search Store doesn't recognize the file extension &lt;code&gt;.jpe&lt;/code&gt;, but the API's message uses "Upload has already been terminated" in a way that is very easy to mislead — at first, I thought it was because the upload size exceeded, or it was choked by concurrency, or there was a race inside the SDK.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Solution&lt;/strong&gt;: Take the file extension directly from &lt;code&gt;display_name&lt;/code&gt; (handlers have already been correctly set to &lt;code&gt;image_&amp;lt;id&amp;gt;.jpg&lt;/code&gt;), and use an explicit MIME comparison table as a backup:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# app/gemini_service.py (after modification)
&lt;/span&gt;&lt;span class="n"&gt;_MIME_TO_EXT&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;image/jpeg&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.jpg&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;image/png&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.png&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;image/webp&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.webp&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;application/pdf&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.pdf&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="c1"&gt;# ...
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;display_name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;suffix&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;display_name&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;rsplit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;suffix&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;_MIME_TO_EXT&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mime_type&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="n"&gt;mimetypes&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;guess_extension&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mime_type&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.bin&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[BG Store] uploading display_name=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;display_name&lt;/span&gt;&lt;span class="si"&gt;!r}&lt;/span&gt;&lt;span class="s"&gt; mime=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;mime_type&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
      &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;size=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;file_bytes&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; tmp_suffix=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;suffix&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Also, add &lt;code&gt;traceback.format_exc()&lt;/code&gt; to the &lt;code&gt;except&lt;/code&gt; part, so that the next time something goes wrong, Cloud Logging will have the full stack.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The takeaway from this story&lt;/strong&gt;: When you're running on a new modality on a "newly GA'd API", please be sure to:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;First confirm on the client side that the filename / file extension you generate is the format expected by the API&lt;/strong&gt;, don't trust the &lt;code&gt;mimetypes&lt;/code&gt; standard library to guess for you.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Write the stack trace into the log&lt;/strong&gt;, otherwise you can't save yourself from the esoteric discussions on the forum like "just change a file".&lt;/li&gt;
&lt;li&gt; Compare the file extension you generate with the &lt;a href="https://ai.google.dev/gemini-api/docs/file-search" rel="noopener noreferrer"&gt;Gemini File Search official supported format list&lt;/a&gt;.&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Summary: The Entry Fee for Multimodal RAG, the Lowest in History
&lt;/h2&gt;

&lt;p&gt;This time's Gemini API File Search upgrade compresses a feature line that used to take 3 months to go online into " &lt;strong&gt;dozens of lines of code + a managed API&lt;/strong&gt; " to run:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Native multimodal support&lt;/strong&gt;: Text, images, videos, audio, and documents share the same embedding space, goodbye to the OCR transition layer.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Custom metadata + server-side filter&lt;/strong&gt;: Multi-tenant SaaS doesn't need to struggle with how many stores to split.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Page-level citations&lt;/strong&gt;: Enterprise compliance scenarios finally have native grounding.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Friendly to money&lt;/strong&gt;: Storage / query embedding are both free, only pay for indexing + LLM tokens.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Cross-modal scores of Embedding 2&lt;/strong&gt;: 97.4% recall@1 is not a demo number, it's the level that can directly support the product.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want to directly see a production-shaped end-to-end example: &lt;a href="https://github.com/kkdai/linebot-multimodal-rag" rel="noopener noreferrer"&gt;&lt;code&gt;kkdai/linebot-multimodal-rag&lt;/code&gt;&lt;/a&gt; the entire repo PR welcome, and you're also welcome to use it to modify it into your own domain's RAG application — Notion knowledge base, employee manual Q&amp;amp;A machine, photo album manager, research paper index... probably only imagination will limit you.&lt;/p&gt;

&lt;p&gt;If you want to get started, the recommended reading order:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; Google official blog: &lt;a href="https://blog.google/innovation-and-ai/technology/developers-tools/expanded-gemini-api-file-search-multimodal-rag/" rel="noopener noreferrer"&gt;Expanded Gemini API File Search for multimodal RAG&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt; Gemini Embedding 2 specification page: &lt;a href="https://deepmind.google/models/gemini/embedding/" rel="noopener noreferrer"&gt;deepmind.google/models/gemini/embedding&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt; Developer implementation guide: &lt;a href="https://dev.to/googleai/multimodal-rag-with-the-gemini-api-file-search-tool-a-developer-guide-5878"&gt;Multimodal RAG with the Gemini API File Search tool: a developer guide&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt; My open-source example: &lt;a href="https://github.com/kkdai/linebot-multimodal-rag" rel="noopener noreferrer"&gt;github.com/kkdai/linebot-multimodal-rag&lt;/a&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Welcome everyone to try out this very powerful Multimodal RAG support!&lt;/p&gt;

</description>
      <category>api</category>
      <category>gemini</category>
      <category>llm</category>
      <category>rag</category>
    </item>
    <item>
      <title>[GCP Practice][BwAI] AI-Powered Development: Quickly Deploy a LINE Bot Cloud Backup Tool with Gemini CLI</title>
      <dc:creator>Evan Lin</dc:creator>
      <pubDate>Thu, 07 May 2026 04:36:40 +0000</pubDate>
      <link>https://forem.com/gde/gcp-practicebwai-ai-powered-development-quickly-deploy-a-line-bot-cloud-backup-tool-with-4ghi</link>
      <guid>https://forem.com/gde/gcp-practicebwai-ai-powered-development-quickly-deploy-a-line-bot-cloud-backup-tool-with-4ghi</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqpdr5gzv1yj95xvae4ss.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqpdr5gzv1yj95xvae4ss.png" alt="Preview Program 2026-05-05 12.38.54" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Background
&lt;/h1&gt;

&lt;p&gt;In the upcoming &lt;strong&gt;Build With AI 2026&lt;/strong&gt; workshop, we're bringing a very practical project: the &lt;strong&gt;LINE Bot File Backup Robot&lt;/strong&gt;. It allows you to directly upload images and files from your LINE chatroom to Google Drive, and it will automatically create folders by month to keep things organized.&lt;/p&gt;

&lt;p&gt;Traditionally, putting a project like this, which includes OAuth authorization, a Firestore database, and Cloud Run container deployment, on the cloud would often leave beginners struggling with lengthy &lt;code&gt;gcloud&lt;/code&gt; commands.&lt;/p&gt;

&lt;p&gt;But this time it's different, we have a secret weapon: &lt;strong&gt;Gemini CLI&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;This article will document how we used AI as a DevOps engineer, completing the entire complex deployment process by "talking," and of course, including the various real pitfalls we encountered along the way.&lt;/p&gt;




&lt;h2&gt;
  
  
  Preparation: Summoning the AI Assistant
&lt;/h2&gt;

&lt;p&gt;Before we start, besides the basic &lt;code&gt;gcloud&lt;/code&gt; installation and login, you only need to install &lt;a href="https://github.com/google/gemini-cli" rel="noopener noreferrer"&gt;Gemini CLI&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Prepare the following "confidential parameters" (all are Mock processed in this article):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;PROJECT_ID&lt;/strong&gt;: &lt;code&gt;your-cool-project-id&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LINE Channel Secret&lt;/strong&gt;: &lt;code&gt;YOUR_LINE_SECRET_XXXX&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LINE Access Token&lt;/strong&gt;: &lt;code&gt;YOUR_LINE_TOKEN_XXXX&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;After entering the project folder, I only said one sentence to Gemini CLI:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Help me deploy to Cloud Run using gcloud, and stop and ask me if you need any information. Refer to the repo…"&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Next, it's time to witness miracles (and fix bugs).&lt;/p&gt;




&lt;h2&gt;
  
  
  Practical Deployment Process: AI Leading the Way
&lt;/h2&gt;

&lt;p&gt;Gemini CLI intelligently analyzed &lt;code&gt;Dockerfile&lt;/code&gt; and &lt;code&gt;main.go&lt;/code&gt; and immediately listed a set of battle plans.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 1: Environment Detection and API Enablement
&lt;/h3&gt;

&lt;p&gt;The AI first confirmed my current project settings in &lt;code&gt;gcloud&lt;/code&gt; and then enabled the necessary services in one go:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;gcloud services &lt;span class="nb"&gt;enable &lt;/span&gt;firestore.googleapis.com &lt;span class="se"&gt;\&lt;/span&gt;
  cloudbuild.googleapis.com &lt;span class="se"&gt;\&lt;/span&gt;
  run.googleapis.com &lt;span class="se"&gt;\&lt;/span&gt;
  artifactregistry.googleapis.com

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 2: Creating a Firestore Database (Encountering the First Pitfall)
&lt;/h3&gt;

&lt;p&gt;Our Bot needs to record the OAuth State anti-counterfeiting mark, so Firestore is needed. The AI tried to execute the command, but we immediately encountered an error. &lt;em&gt;(See the pitfall record below for details)&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;After correction, the correct command is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;gcloud firestore databases create &lt;span class="nt"&gt;--location&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;asia-east1 &lt;span class="nt"&gt;--type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;firestore-native

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 3: Deploying Cloud Run First, Filling in the Blanks Later
&lt;/h3&gt;

&lt;p&gt;This is a classic "chicken or the egg" problem: Google OAuth needs to know your Cloud Run URL (Redirect URI), but your Cloud Run deployment needs to fill in the OAuth Client ID and Secret.&lt;/p&gt;

&lt;p&gt;Gemini CLI's strategy is great: &lt;strong&gt;Deploy with placeholders first!&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;gcloud run deploy linebot-backup-service &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--source&lt;/span&gt; &lt;span class="nb"&gt;.&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--region&lt;/span&gt; asia-east1 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--set-env-vars&lt;/span&gt; &lt;span class="s2"&gt;"GOOGLE_CLOUD_PROJECT=your-cool-project-id,ChannelSecret=YOUR_LINE_SECRET_XXXX,ChannelAccessToken=YOUR_LINE_TOKEN_XXXX,GOOGLE_CLIENT_ID=PENDING,GOOGLE_CLIENT_SECRET=PENDING,GOOGLE_REDIRECT_URL=PENDING"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--allow-unauthenticated&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--quiet&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After successful deployment, we got a string of fragrant URLs: &lt;code&gt;https://linebot-backup-service-xxxxx.a.run.app&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 4: Completing Google OAuth Settings and Environment Variable Updates
&lt;/h3&gt;

&lt;p&gt;With the URL, I can go to the "API &amp;amp; Services" in Google Cloud Console to complete the settings:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; Create an &lt;strong&gt;OAuth consent screen&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt; Create credentials for a &lt;strong&gt;Web application&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt; Fill in the "Authorized redirect URI" with the URL we just got, plus &lt;code&gt;/oauth/callback&lt;/code&gt;.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;After getting the real ID and Secret, I directly pasted the information to Gemini CLI, and it automatically updated the service for me:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;gcloud run services update linebot-backup-service &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--region&lt;/span&gt; asia-east1 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--update-env-vars&lt;/span&gt; &lt;span class="s2"&gt;"GOOGLE_REDIRECT_URL=https://[YOUR_URL]/oauth/callback,GOOGLE_CLIENT_ID=real-client-id.apps.googleusercontent.com,GOOGLE_CLIENT_SECRET=real-secret-xxxx"&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Done! Finally, just go to the LINE Developers Console and fill in the Webhook.&lt;/p&gt;




&lt;h2&gt;
  
  
  Blood and Tears Pitfall Records During the Deployment Process
&lt;/h2&gt;

&lt;p&gt;It looks smooth, but in fact, the AI and I hit a few walls together. This is also the most real experience of using CLI tools.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pitfall 1: Forgetting to Bind a Credit Card, the 390001 Error
&lt;/h3&gt;

&lt;p&gt;When executing the first &lt;code&gt;gcloud run deploy&lt;/code&gt;, the terminal directly spewed red text all over the face:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;FAILED_PRECONDITION: Billing account for project is not found...&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Reason&lt;/strong&gt;: Cloud Run and Cloud Build require the project to enable billing (Billing Enabled). This is a brand new test project, and I forgot to bind the billing account. &lt;strong&gt;Solution&lt;/strong&gt;: The AI immediately checked the project status for me (&lt;code&gt;gcloud beta billing projects describe&lt;/code&gt;) and asked me if I wanted to switch to a project with billing, or to fix it. I obediently went to the Console to bind my credit card, and the deployment was able to continue.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pitfall 2: The Evolution of Command Parameter Syntax
&lt;/h3&gt;

&lt;p&gt;When creating Firestore, the AI initially gave the command &lt;code&gt;--type=native-mode&lt;/code&gt; or &lt;code&gt;--type=native&lt;/code&gt;, but gcloud didn't appreciate it:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;ERROR: argument --type: Invalid choice: 'native-mode'&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Reason&lt;/strong&gt;: The CLI parameters of &lt;code&gt;gcloud&lt;/code&gt; will change with version updates. &lt;strong&gt;Solution&lt;/strong&gt;: Carefully look at the gcloud error message, and now the correct parameter values are &lt;code&gt;firestore-native&lt;/code&gt; or &lt;code&gt;datastore-mode&lt;/code&gt;. After changing to &lt;code&gt;--type=firestore-native&lt;/code&gt;, it passed smoothly.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pitfall 3: The Invisible "Drive API"
&lt;/h3&gt;

&lt;p&gt;When everything was deployed, we encountered a permission error when testing "upload to Google Drive". &lt;strong&gt;Reason&lt;/strong&gt;: This is a Bot that helps you upload files to Drive, but when we enabled the API in the first step, we actually forgot to enable the protagonist: &lt;strong&gt;Google Drive API&lt;/strong&gt;! Without it, even if OAuth authorization is successful, the program will still be blocked. &lt;strong&gt;Solution&lt;/strong&gt;: I only entered the mysterious &lt;code&gt;"3."&lt;/code&gt; (implying the third checkpoint) into the terminal, and the AI immediately understood and added this critical blow:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;gcloud services &lt;span class="nb"&gt;enable &lt;/span&gt;drive.googleapis.com

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Through Gemini CLI, the originally tedious and error-prone infrastructure construction work has become a "two-person pair programming" session.&lt;/p&gt;

&lt;p&gt;AI can help you remember lengthy gcloud parameters, help you sort out the deployment logic (deploy with PENDING first and then update), and even adjust strategies quickly based on error messages when you encounter errors.&lt;/p&gt;

&lt;p&gt;This is the core spirit that &lt;strong&gt;Build With AI 2026&lt;/strong&gt; wants to convey: let AI handle the tedious DevOps chores, so that developers can focus more energy on innovation in core business logic.&lt;/p&gt;

&lt;p&gt;If you are still manually typing long and ugly gcloud commands, I strongly recommend you install Gemini CLI and give it a try!&lt;/p&gt;

</description>
      <category>ai</category>
      <category>gemini</category>
      <category>googlecloud</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>GCP Hands-on: Deploying OpenAB - Building a Gemini ACP Bridge for Telegram on GCE</title>
      <dc:creator>Evan Lin</dc:creator>
      <pubDate>Sat, 02 May 2026 12:01:51 +0000</pubDate>
      <link>https://forem.com/gde/gcp-hands-on-deploying-openab-building-a-gemini-acp-bridge-for-telegram-on-gce-1bd</link>
      <guid>https://forem.com/gde/gcp-hands-on-deploying-openab-building-a-gemini-acp-bridge-for-telegram-on-gce-1bd</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fez62lmp04cnsnbxlwngj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fez62lmp04cnsnbxlwngj.png" alt="image-20260502171732526" width="800" height="436"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Background
&lt;/h1&gt;

&lt;p&gt;Recently, in order to enable AI coding assistants (such as Claude Code or Gemini CLI) to be used directly on chat platforms, I started researching &lt;strong&gt;&lt;a href="https://openabdev.github.io/openab/" rel="noopener noreferrer"&gt;OpenAB&lt;/a&gt;&lt;/strong&gt;. This is a powerful bridge that can connect Slack, Discord, or Telegram to CLI tools that comply with the &lt;strong&gt;ACP (Agent Client Protocol)&lt;/strong&gt; standard.&lt;/p&gt;

&lt;p&gt;This article documents the complete practical process of deploying &lt;a href="https://openabdev.github.io/openab/" rel="noopener noreferrer"&gt;OpenAB&lt;/a&gt; on Google Cloud, specifically how to bypass authentication restrictions, handle Telegram's HTTPS requirements, and resolve path and permission issues in containerized deployments.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;OpenAB Reference Documentation&lt;/strong&gt;: &lt;a href="https://openabdev.github.io/openab/" rel="noopener noreferrer"&gt;https://openabdev.github.io/openab/&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;OpenAB Repo&lt;/strong&gt;: &lt;a href="https://github.com/openabdev/openab" rel="noopener noreferrer"&gt;https://github.com/openabdev/openab&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Deployment Decision: Why GCE instead of Cloud Run?
&lt;/h2&gt;

&lt;p&gt;Although Cloud Run is my first choice, when dealing with OpenAB, &lt;strong&gt;Google Compute Engine (GCE)&lt;/strong&gt; is the best solution. There are two reasons:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;Stateful Session:&lt;/strong&gt; OpenAB will start a child process (such as Gemini CLI) for each conversation thread. These processes must reside for a long time to maintain the conversation context. Cloud Run's automatic scaling mechanism will kill these processes, leading to conversation interruption.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Authentication Persistence&lt;/strong&gt;: The AI CLI's Token needs to be stored on the local disk. GCE, combined with Persistent Disk, can ensure that the login status does not disappear after restarting.&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Practical Steps: Step-by-Step Deployment Process
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Step 1: Writing an Automated Startup Script
&lt;/h3&gt;

&lt;p&gt;To standardize the deployment, we wrote a &lt;code&gt;setup-openab.sh&lt;/code&gt;. Its core task is to install Docker, create persistent directories, and dynamically generate &lt;code&gt;config.toml&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The most critical part is the &lt;strong&gt;custom Docker Image&lt;/strong&gt;. Since the official OpenAB image does not necessarily include all AI tools, we install Node.js and &lt;code&gt;@google/gemini-cli&lt;/code&gt; on-site through Dockerfile:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="s"&gt; ghcr.io/openabdev/openab:latest&lt;/span&gt;
&lt;span class="k"&gt;USER&lt;/span&gt;&lt;span class="s"&gt; root&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;apt-get update &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; apt-get &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-y&lt;/span&gt; curl &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="se"&gt;\
&lt;/span&gt;    curl &lt;span class="nt"&gt;-fsSL&lt;/span&gt; https://deb.nodesource.com/setup_20.x | bash - &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="se"&gt;\
&lt;/span&gt;    apt-get &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-y&lt;/span&gt; nodejs &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="se"&gt;\
&lt;/span&gt;    npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-g&lt;/span&gt; @google/gemini-cli
&lt;span class="k"&gt;USER&lt;/span&gt;&lt;span class="s"&gt; 1000&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 2: Using gcloud to Create a GCE Instance
&lt;/h3&gt;

&lt;p&gt;We chose the &lt;code&gt;e2-medium&lt;/code&gt; specification and passed sensitive information (such as Bot Token) through Metadata to avoid hardcoding it in the script.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;gcloud compute instances create openab-server &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--project&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;your-project-id &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--zone&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;asia-east1-b &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--machine-type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;e2-medium &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--image-family&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;debian-11 &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--image-project&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;debian-cloud &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--metadata-from-file&lt;/span&gt; startup-script&lt;span class="o"&gt;=&lt;/span&gt;setup-openab.sh &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--metadata&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;tg_bot_token&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;YOUR_BOT_TOKEN

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 3: Configuring the Gemini API Key
&lt;/h3&gt;

&lt;p&gt;Unlike Kiro, which requires interactive login, &lt;code&gt;gemini-cli&lt;/code&gt; can directly read environment variables. We inject the API Key into OpenAB's &lt;code&gt;config.toml&lt;/code&gt; to make it run automatically in the background:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight toml"&gt;&lt;code&gt;&lt;span class="nn"&gt;[agent]&lt;/span&gt;
&lt;span class="py"&gt;command&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"gemini"&lt;/span&gt;
&lt;span class="py"&gt;args&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"--acp"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="py"&gt;env&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="py"&gt;GEMINI_API_KEY&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"AIzaSy..."&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 4: Using Cloudflare Tunnel to Solve HTTPS Requirements
&lt;/h3&gt;

&lt;p&gt;Telegram Webhook strictly requires &lt;strong&gt;HTTPS&lt;/strong&gt;. Instead of setting up a complex Nginx + SSL, I chose to use &lt;strong&gt;Cloudflare Quick Tunnel&lt;/strong&gt;:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; Run on VM: &lt;code&gt;cloudflared tunnel --url http://localhost:8080&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt; Get a randomly generated HTTPS URL.&lt;/li&gt;
&lt;li&gt; Register Webhook: &lt;code&gt;curl "https://api.telegram.org/bot&amp;lt;TOKEN&amp;gt;/setWebhook?url=&amp;lt;CF_URL&amp;gt;/webhook/telegram&amp;amp;secret_token=&amp;lt;SECRET&amp;gt;"&lt;/code&gt;
&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Blood and Tears in the Migration Process: Technical Summary
&lt;/h2&gt;

&lt;p&gt;During the deployment process, we debugged several times, and here are the three major "pits" summarized:&lt;/p&gt;

&lt;h3&gt;
  
  
  Pitfall 1: Confusion of Image Sources
&lt;/h3&gt;

&lt;p&gt;At first, I tried to Pull &lt;code&gt;openabdev/openab&lt;/code&gt; from Docker Hub, but it always failed. Finally, I found that the current stable image of the project is placed in &lt;strong&gt;GitHub Container Registry (GHCR)&lt;/strong&gt;.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Solution&lt;/strong&gt;: You must use &lt;code&gt;ghcr.io/openabdev/openab:latest&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Pitfall 2: Hardcoded Configuration Path
&lt;/h3&gt;

&lt;p&gt;OpenAB's Dockerfile expects the configuration file path to be &lt;code&gt;/etc/openab/config.toml&lt;/code&gt;. I initially mounted it to &lt;code&gt;/app/config.toml&lt;/code&gt;, which caused the container to crash immediately after startup and report an error.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Solution&lt;/strong&gt;: Correct the Docker Volume mount path to &lt;code&gt;/etc/openab/config.toml&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Pitfall 3: Security Secret Token Verification Failed
&lt;/h3&gt;

&lt;p&gt;Even if the URL is correct, Telegram messages are still rejected by the Gateway. The log shows &lt;code&gt;invalid or missing secret_token&lt;/code&gt;.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Reason&lt;/strong&gt;: &lt;code&gt;openab-gateway&lt;/code&gt; generates an internal checksum to prevent illegal requests.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Solution&lt;/strong&gt;: You must extract the Token from the Gateway container and pass it as the &lt;code&gt;secret_token&lt;/code&gt; parameter when &lt;code&gt;setWebhook&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Summary: The Perfect AI Bridging Solution
&lt;/h2&gt;

&lt;p&gt;Through this architecture, I successfully built a fully self-hosted, secure, and efficient AI assistant on GCP. It does not rely on expensive subscriptions, but directly utilizes Gemini's API capabilities and uses Telegram as the interaction interface.&lt;/p&gt;

&lt;p&gt;If you also want to set up a dedicated ACP bridge on the cloud, this combination of GCE + Docker + Cloudflare Tunnel will be the most balanced and stable choice.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>gemini</category>
      <category>googlecloud</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>GCP in Action: Building a Persistent AI Assistant with GCE, Hermes Agent, and Telegram</title>
      <dc:creator>Evan Lin</dc:creator>
      <pubDate>Sat, 02 May 2026 12:01:42 +0000</pubDate>
      <link>https://forem.com/gde/gcp-in-action-building-a-persistent-ai-assistant-with-gce-hermes-agent-and-telegram-1mlg</link>
      <guid>https://forem.com/gde/gcp-in-action-building-a-persistent-ai-assistant-with-gce-hermes-agent-and-telegram-1mlg</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr4z6ajwagaqahnm2rv2k.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr4z6ajwagaqahnm2rv2k.png" alt="image-20260502161538962" width="800" height="436"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Background
&lt;/h1&gt;

&lt;p&gt;After solving the LINE Bot's Vertex AI migration, I started thinking: Could there be an AI assistant that is "more proactive" and "has long-term memory"? At this time, I set my sights on &lt;a href="https://github.com/nousresearch/hermes-agent" rel="noopener noreferrer"&gt;NousResearch's open-source &lt;strong&gt;Hermes Agent&lt;/strong&gt;&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Unlike a typical Chatbot, Hermes is designed as an "operating system that breathes". It can execute Shell commands, write Python scripts, manage long-term memory, and even stay in touch with you via different Gateways (Telegram, Discord) at any time.&lt;/p&gt;

&lt;p&gt;To make it available 24/7, I chose to deploy it on &lt;strong&gt;Google Compute Engine (GCE)&lt;/strong&gt;. This article will document the deployment process from scratch, as well as the pitfalls I encountered when configuring the latest &lt;strong&gt;Gemini 2.5 Flash&lt;/strong&gt; model.&lt;/p&gt;




&lt;h2&gt;
  
  
  Environment Parameter Preparation
&lt;/h2&gt;

&lt;p&gt;Before you start, please make sure you have these necessary parameters:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;PROJECT_ID&lt;/strong&gt;: &lt;code&gt;YOUR_PROJECT_ID&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LOCATION&lt;/strong&gt;: &lt;code&gt;global&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GOOGLE_API_KEY&lt;/strong&gt;: &lt;code&gt;YOUR_GOOGLE_API_KEY&lt;/code&gt; (Obtained from Google AI Studio)&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Step 1: Create a GCE Instance
&lt;/h2&gt;

&lt;p&gt;Hermes Agent needs some computing power to handle tool use. It is recommended to use the &lt;code&gt;e2-medium&lt;/code&gt; specification.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;gcloud compute instances create hermes-agent-vm &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--project&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;YOUR_PROJECT_ID &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--zone&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;us-central1-a &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--machine-type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;e2-medium &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--image-family&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;ubuntu-2204-lts &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--image-project&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;ubuntu-os-cloud &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--boot-disk-size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;30GB &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--metadata&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;startup-script&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'#!/bin/bash
        apt-get update
        apt-get install -y git curl python3-pip python3-venv nodejs npm
    '&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Step 2: Install Hermes Agent
&lt;/h2&gt;

&lt;p&gt;After SSHing into the VM, use the official one-click installation script directly.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;Enter the VM&lt;/strong&gt;:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;gcloud compute ssh hermes-agent-vm &lt;span class="nt"&gt;--zone&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;us-central1-a

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;Execute the installation&lt;/strong&gt;:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-fsSL&lt;/span&gt; https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash
&lt;span class="nb"&gt;source&lt;/span&gt; ~/.bashrc

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Step 3: Configure Gemini 2.5 Flash (SOP Practice)
&lt;/h2&gt;

&lt;p&gt;This is the most likely place to step on a landmine in the entire exercise. Hermes may default to pointing to non-existent or outdated model identifiers.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Create a configuration file&lt;/strong&gt;: In &lt;code&gt;~/.hermes/config.yaml&lt;/code&gt;, we must precisely specify &lt;strong&gt;Gemini 2.5 Flash&lt;/strong&gt;, and &lt;strong&gt;do not include the &lt;code&gt;google/&lt;/code&gt; prefix&lt;/strong&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Set the API Key&lt;/strong&gt;: Write the key and permission settings in &lt;code&gt;~/.hermes/.env&lt;/code&gt;:&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Step 4: Connect to Telegram and Background Persistence
&lt;/h2&gt;

&lt;p&gt;To prevent the Agent from disappearing after the SSH connection is lost, we use &lt;strong&gt;Systemd&lt;/strong&gt; to manage it.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;Create a Systemd service&lt;/strong&gt; (&lt;code&gt;/etc/systemd/system/hermes.service&lt;/code&gt;):
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight systemd"&gt;&lt;code&gt;&lt;span class="k"&gt;[Unit]&lt;/span&gt;
&lt;span class="nt"&gt;Description&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;Hermes Agent Gateway
&lt;span class="nt"&gt;After&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;network.target

&lt;span class="k"&gt;[Service]&lt;/span&gt;
&lt;span class="nt"&gt;Type&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;simple
&lt;span class="nt"&gt;User&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;root
&lt;span class="nt"&gt;Environment&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;HOME=/root
&lt;span class="nt"&gt;Environment&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;PYTHONUNBUFFERED=1
&lt;span class="nt"&gt;ExecStart&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;/usr/local/lib/hermes-agent/venv/bin/hermes gateway run
&lt;span class="nt"&gt;Restart&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;always
&lt;span class="nt"&gt;RestartSec&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;10

&lt;span class="k"&gt;[Install]&lt;/span&gt;
&lt;span class="nt"&gt;WantedBy&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;multi-user.target

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;Start the service&lt;/strong&gt;:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;systemctl daemon-reload
&lt;span class="nb"&gt;sudo &lt;/span&gt;systemctl &lt;span class="nb"&gt;enable &lt;/span&gt;hermes
&lt;span class="nb"&gt;sudo &lt;/span&gt;systemctl restart hermes

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Blood and Tears in the Migration Process: Why Isn't My Agent Responding?
&lt;/h2&gt;

&lt;p&gt;Even with the correct configuration, I still encountered the dilemma of "the Agent reads messages but doesn't reply". After checking the logs (&lt;code&gt;journalctl -u hermes&lt;/code&gt;), I found several deep pitfalls:&lt;/p&gt;

&lt;h3&gt;
  
  
  Pitfall 1: The 404 Ghost of Gemini 3.0
&lt;/h3&gt;

&lt;p&gt;I tried to pursue the latest version when configuring, and used &lt;code&gt;gemini-3-flash-preview&lt;/code&gt;. As a result, the logs spewed out a bunch of &lt;strong&gt;404 Model Not Found&lt;/strong&gt;. &lt;strong&gt;Reason&lt;/strong&gt;: The internal &lt;code&gt;auxiliary_client.py&lt;/code&gt; of Hermes hardcodes many &lt;code&gt;gemini-3-flash-preview&lt;/code&gt; as the default value. When these auxiliary functions (such as generating titles) report errors, it will affect the reply logic of the entire Gateway. &lt;strong&gt;Solution&lt;/strong&gt;: Manually define all &lt;code&gt;auxiliary&lt;/code&gt; models as &lt;code&gt;gemini-2.5-flash&lt;/code&gt; in &lt;code&gt;config.yaml&lt;/code&gt;, or directly patch the source code with &lt;code&gt;sed&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pitfall 2: Prefix Confusion of Model Identifiers
&lt;/h3&gt;

&lt;p&gt;In different SDKs, some people use &lt;code&gt;google/gemini-2.5-flash&lt;/code&gt;, and some people use &lt;code&gt;gemini-2.5-flash&lt;/code&gt;. &lt;strong&gt;Experience&lt;/strong&gt;: In Hermes' Gemini Provider, &lt;strong&gt;using the short name &lt;code&gt;gemini-2.5-flash&lt;/code&gt; directly is the safest&lt;/strong&gt;. Adding &lt;code&gt;google/&lt;/code&gt; will instead cause API routing errors.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pitfall 3: Conflict between Systemd and "Processes Already Running"
&lt;/h3&gt;

&lt;p&gt;When you manually run &lt;code&gt;hermes gateway&lt;/code&gt; and then start the service, the system will report &lt;code&gt;Gateway already running (PID xxxx)&lt;/code&gt;. &lt;strong&gt;Solution&lt;/strong&gt;: Before &lt;code&gt;ExecStart&lt;/code&gt; in Systemd, you can add an &lt;code&gt;ExecStartPre=/usr/bin/pkill -9 -f hermes || true&lt;/code&gt; to ensure a clean environment every time you start.&lt;/p&gt;




&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;Now, my dedicated Hermes Agent is running stably on GCE and is available via Telegram at any time. It can not only help me find information, but also run some simple computing scripts for me directly on the cloud VM.&lt;/p&gt;

&lt;p&gt;This deployment taught me: &lt;strong&gt;In the face of rapidly updating models, the official documentation (or MCP tool query) is the only truth&lt;/strong&gt;. Don't blindly pursue the latest version number; ensuring that the identifier matches the current API environment is the key to stable operation.&lt;/p&gt;

&lt;p&gt;If you also want a 24-hour AI digital double, get a machine set up according to this SOP!&lt;/p&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>googlecloud</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>Gemini 3.1: Native TTS for Easier, More Powerful Summary Reading</title>
      <dc:creator>Evan Lin</dc:creator>
      <pubDate>Sat, 02 May 2026 10:08:42 +0000</pubDate>
      <link>https://forem.com/gde/gemini-31-native-tts-for-easier-more-powerful-summary-reading-2ep9</link>
      <guid>https://forem.com/gde/gemini-31-native-tts-for-easier-more-powerful-summary-reading-2ep9</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwuiw99cubzhnf5ex409y.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwuiw99cubzhnf5ex409y.png" alt="Finder 2026-04-16 21.43.57" width="800" height="1739"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Background
&lt;/h1&gt;

&lt;p&gt;In the previous practical session, we used Gemini 3.1 Flash Live to achieve speech recognition, and through the "side-attack" method of the Gemini 2.5 Live API, we barely achieved the text-to-speech (TTS) function.&lt;/p&gt;

&lt;p&gt;But in April 2026, Google officially released &lt;a href="https://blog.google/innovation-and-ai/models-and-research/gemini-models/gemini-3-1-flash-tts/" rel="noopener noreferrer"&gt;&lt;strong&gt;Gemini 3.1 Flash TTS&lt;/strong&gt;&lt;/a&gt;. This is a native model specifically designed for audio output, no longer requiring a Live WebSocket, and can directly output high-quality audio through the standard &lt;code&gt;generate_content&lt;/code&gt; process.&lt;/p&gt;

&lt;p&gt;As a developer, of course, you want to follow up immediately with a more elegant and native solution. This article will share how to upgrade the LINE Bot's text-to-speech summary function to Gemini 3.1 Native TTS, and the "asynchronous pit" encountered in the process.&lt;/p&gt;




&lt;h2&gt;
  
  
  Technical Upgrade: From Live API to Native TTS
&lt;/h2&gt;

&lt;p&gt;The previous reading function was simulated using the Gemini 2.5 Live API. Although it was usable, it had several shortcomings:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;High complexity&lt;/strong&gt;: Requires managing the WebSocket connection lifecycle.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Model limitations&lt;/strong&gt;: Must use a specific &lt;code&gt;native-audio&lt;/code&gt; model, and primarily supports &lt;code&gt;us-central1&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Fixed return format&lt;/strong&gt;: The sampling rate is usually fixed at 16kHz.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The emergence of &lt;strong&gt;Gemini 3.1 Flash TTS&lt;/strong&gt; changed all this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Model name&lt;/strong&gt;: &lt;code&gt;gemini-3.1-flash-tts-preview&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Consistent interface&lt;/strong&gt;: Uses the familiar &lt;code&gt;generate_content_stream&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Dynamic parameters&lt;/strong&gt;: Supports automatically detecting the sampling rate from the returned MIME type (usually increased to 24kHz, better sound quality).&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Core Code Evolution (tools/tts_tool.py)
&lt;/h2&gt;

&lt;p&gt;The new implementation has become more concise, with the focus on the &lt;code&gt;response_modalities=["audio"]&lt;/code&gt; setting:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;text_to_speech&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;tuple&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;bytes&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;genai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;GOOGLE_AI_API_KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;http_options&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;api_version&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;v1beta&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;

    &lt;span class="n"&gt;contents&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Content&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;parts&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
                &lt;span class="c1"&gt;# Add localization instructions to make the tone more natural
&lt;/span&gt;                &lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Part&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Please use Traditional Chinese with Taiwanese vocabulary, and read the following summary in a friendly and natural tone. ## Transcript:&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="n"&gt;config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;GenerateContentConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;response_modalities&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;audio&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="n"&gt;speech_config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;SpeechConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;voice_config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;VoiceConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;prebuilt_voice_config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;PrebuiltVoiceConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;voice_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Zephyr&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;pcm_chunks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="n"&gt;sample_rate&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;24000&lt;/span&gt; &lt;span class="c1"&gt;# Default value
&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# ⚠️ This is the big pit that almost made me stay up all night fixing it
&lt;/span&gt;        &lt;span class="n"&gt;response_stream&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;aio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;models&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate_content_stream&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gemini-3.1-flash-tts-preview&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;contents&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;contents&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;response_stream&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;parts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;part&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;parts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;part&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;inline_data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                        &lt;span class="n"&gt;pcm_chunks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;part&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;inline_data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                        &lt;span class="c1"&gt;# Get the sampling rate dynamically from the MIME type (e.g. audio/L16;rate=24000)
&lt;/span&gt;                        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;part&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;inline_data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mime_type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                            &lt;span class="n"&gt;sample_rate&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;parse_rate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;part&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;inline_data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mime_type&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;TTS Error: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;raise&lt;/span&gt;

    &lt;span class="n"&gt;pcm_bytes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;b&lt;/span&gt;&lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pcm_chunks&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;duration_ms&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pcm_bytes&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sample_rate&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Subsequently, it is also converted to m4a via ffmpeg and sent to LINE...
&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  The Pitfall: The Missing &lt;code&gt;await&lt;/code&gt;
&lt;/h2&gt;

&lt;p&gt;This upgrade encountered a very subtle &lt;code&gt;TypeError&lt;/code&gt;, which kept popping up after remote deployment:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;TypeError: 'async for' requires an object with __aiter__ method, got coroutine&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  ❌ Incorrect Writing
&lt;/h3&gt;

&lt;p&gt;When I wrote it according to the example, I intuitively thought I could directly &lt;code&gt;async for&lt;/code&gt; a method:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# This is wrong!
&lt;/span&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;aio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;models&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate_content_stream&lt;/span&gt;&lt;span class="p"&gt;(...):&lt;/span&gt;
    &lt;span class="k"&gt;pass&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  ✅ Correct Solution
&lt;/h3&gt;

&lt;p&gt;In the asynchronous version of the Google GenAI Python SDK, &lt;code&gt;generate_content_stream&lt;/code&gt; itself is an &lt;code&gt;async&lt;/code&gt; function, and it &lt;strong&gt;returns&lt;/strong&gt; an iterator. So you must &lt;code&gt;await&lt;/code&gt; to get that iterator, and then perform &lt;code&gt;async for&lt;/code&gt; on it.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Correct approach: two steps
&lt;/span&gt;&lt;span class="n"&gt;response_stream&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;aio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;models&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate_content_stream&lt;/span&gt;&lt;span class="p"&gt;(...)&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;response_stream&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;pass&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This detail may not exist in general synchronous code or some older SDKs, but when dealing with the asynchronous stream of 3.1 Flash TTS, this is the key to whether it can run successfully.&lt;/p&gt;




&lt;h2&gt;
  
  
  Localization Adjustment: Making the Bot Speak "Taiwanese"
&lt;/h2&gt;

&lt;p&gt;Although the summary itself is already in Traditional Chinese, the TTS model sometimes has non-native accents or vocabulary when reading. We solved this problem through Prompt Engineering:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Please use &lt;strong&gt;Taiwanese vocabulary&lt;/strong&gt; in Traditional Chinese, and read it in a &lt;strong&gt;friendly and natural&lt;/strong&gt; tone..."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;After adding this line of instruction, the audio output by Gemini is closer to the habits of Taiwanese users in terms of intonation and sentence breaks, which greatly enhances the friendliness of the "reading summary".&lt;/p&gt;




&lt;h2&gt;
  
  
  Summary: Changes Brought by Native TTS
&lt;/h2&gt;

&lt;p&gt;After migrating from Live API to Native TTS:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;More stable connection&lt;/strong&gt;: No longer need to maintain a long-term WebSocket.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Improved sound quality&lt;/strong&gt;: Native support for 24kHz sampling rate.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Easy to maintain&lt;/strong&gt;: The amount of code is reduced by about 30%, and the logic is more direct.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This experience also reminds me that even a seemingly mature SDK should carefully check the return value type when dealing with the &lt;code&gt;async&lt;/code&gt; mode.&lt;/p&gt;

&lt;p&gt;If you also want your LINE Bot to speak, Gemini 3.1 Flash TTS is definitely the best choice at the moment.&lt;/p&gt;

&lt;p&gt;The complete code has been updated to &lt;a href="https://github.com/kkdai/linebot-helper-python" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;, see you next time!&lt;/p&gt;

</description>
      <category>ai</category>
      <category>api</category>
      <category>gemini</category>
      <category>google</category>
    </item>
    <item>
      <title>GCP in Action: Migrating a LINE Bot from AI Studio to Vertex AI to Solve 429 Errors</title>
      <dc:creator>Evan Lin</dc:creator>
      <pubDate>Sat, 02 May 2026 10:08:31 +0000</pubDate>
      <link>https://forem.com/gde/gcp-in-action-migrating-a-line-bot-from-ai-studio-to-vertex-ai-to-solve-429-errors-47jo</link>
      <guid>https://forem.com/gde/gcp-in-action-migrating-a-line-bot-from-ai-studio-to-vertex-ai-to-solve-429-errors-47jo</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4oswk3rx7cz9bcvye4uy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4oswk3rx7cz9bcvye4uy.png" alt="image-20260421011411264" width="713" height="369"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Background
&lt;/h1&gt;

&lt;p&gt;Recently, the LINE business card assistant robot (&lt;code&gt;linebot-namecard-python&lt;/code&gt;) deployed on Google Cloud Run suddenly went down. After checking the logs with &lt;code&gt;gcloud logging read&lt;/code&gt;, the following ruthless error appeared:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;google.api_core.exceptions.ResourceExhausted: 429 Your billing account has exceeded its monthly spending cap.&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;It turned out that we used the API Key provided by Google AI Studio (&lt;code&gt;google.generativeai&lt;/code&gt; package) for rapid development, and as a result, we silently maxed out the monthly free quota.&lt;/p&gt;

&lt;p&gt;As a developer who needs to launch a service, it's time to "level up" the architecture and migrate the model calls to the enterprise-grade &lt;strong&gt;Google Cloud Vertex AI&lt;/strong&gt;, directly using GCP's IAM permissions and billing system. This article will share the migration process and the various pitfalls encountered along the way.&lt;/p&gt;




&lt;h2&gt;
  
  
  Technical Upgrade: From AI Studio to Vertex AI
&lt;/h2&gt;

&lt;p&gt;To migrate a project from the Google AI Studio SDK to Vertex AI, there are three main steps:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Replace the dependency package&lt;/strong&gt;: In &lt;code&gt;requirements.txt&lt;/code&gt;, remove the old &lt;code&gt;google.generativeai&lt;/code&gt; and replace it with &lt;code&gt;google-cloud-aiplatform&lt;/code&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Update environment variable settings&lt;/strong&gt;: In &lt;code&gt;config.py&lt;/code&gt;, we no longer need &lt;code&gt;GEMINI_API_KEY&lt;/code&gt;, but instead use GCP's &lt;code&gt;PROJECT_ID&lt;/code&gt; and &lt;code&gt;LOCATION&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;PROJECT_ID&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;PROJECT_ID&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;LOCATION&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;LOCATION&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;global&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c1"&gt;# Default to global
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;Core code rewriting (gemini_utils.py)&lt;/strong&gt;: Although the SDK interface of Vertex AI is similar, the handling of multimodal data (such as images) is slightly stricter. We need to convert &lt;code&gt;PIL.Image&lt;/code&gt; to the &lt;code&gt;vertexai.generative_models.Part&lt;/code&gt; format:&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Pitfall 1: Residual Old SDK Causing Cloud Run Startup Failure
&lt;/h2&gt;

&lt;p&gt;Happily, I updated the environment variables with &lt;code&gt;gcloud run services update&lt;/code&gt;, but the Cloud Run deployment failed, and the container couldn't even start.&lt;/p&gt;

&lt;p&gt;After checking the logs, I found:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;ModuleNotFoundError: No module named 'google.generativeai'&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Reason&lt;/strong&gt;: Although &lt;code&gt;gemini_utils.py&lt;/code&gt; has been rewritten, the main program &lt;code&gt;app/main.py&lt;/code&gt; still contains &lt;code&gt;import google.generativeai as genai&lt;/code&gt; and the initialization code &lt;code&gt;genai.configure(api_key=...)&lt;/code&gt;. Since the package has been removed from &lt;code&gt;requirements.txt&lt;/code&gt;, the container will naturally fail to find the module and crash during startup.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Solution&lt;/strong&gt;: Globally grep the project, completely remove all references to the old SDK, and then repackage the Docker image using Cloud Build and push it again.&lt;/p&gt;




&lt;h2&gt;
  
  
  Pitfall 2: Vertex AI Model Name and Region Restrictions (404 Not Found)
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxectrt95bp3j5f92txsp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxectrt95bp3j5f92txsp.png" alt="Google Chrome 2026-04-21 01.12.46" width="800" height="705"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The code is cleaned up, and the container also starts successfully, but when I send a business card image on LINE, the robot throws a 500 error. After reviewing the logs again, this time it's:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;google.api_core.exceptions.NotFound: 404 Publisher Model ... gemini-1.5-flash was not found or your project does not have access to it.&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This is the biggest pit I encountered this time! In Google AI Studio, you can casually use the alias &lt;code&gt;gemini-1.5-flash&lt;/code&gt;; &lt;strong&gt;but in certain regions of Vertex AI (such as &lt;code&gt;asia-east1&lt;/code&gt; Taiwan), you must specify the exact version number&lt;/strong&gt;, such as &lt;code&gt;gemini-1.5-flash-002&lt;/code&gt;, otherwise the API will directly tell you that the model cannot be found.&lt;/p&gt;

&lt;h3&gt;
  
  
  Advanced Challenge: I want to try Gemini 3.0 Flash Preview!
&lt;/h3&gt;

&lt;p&gt;To solve this problem, I had an idea. Since I'm going to change it, why not upgrade directly to the latest &lt;code&gt;gemini-3-flash-preview&lt;/code&gt;!&lt;/p&gt;

&lt;p&gt;As a result, I wrote a test script and found:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  ❌ &lt;code&gt;asia-east1&lt;/code&gt; (Taiwan): 404 Not Found&lt;/li&gt;
&lt;li&gt;  ❌ &lt;code&gt;us-central1&lt;/code&gt; (Central US): 404 Not Found&lt;/li&gt;
&lt;li&gt;  ✅ &lt;strong&gt;&lt;code&gt;global&lt;/code&gt; (Global): SUCCESS!&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That's right, currently this preview model on Vertex AI &lt;strong&gt;is only available in the &lt;code&gt;global&lt;/code&gt; region&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Final Solution&lt;/strong&gt;:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; Change the default region in &lt;code&gt;config.py&lt;/code&gt; to &lt;code&gt;global&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt; Call &lt;code&gt;vertexai.init(project="line-vertex", location="global")&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt; The Cloud Run environment variable &lt;code&gt;--update-env-vars="LOCATION=global"&lt;/code&gt; must also be aligned.&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Summary: Changes Brought by Vertex AI
&lt;/h2&gt;

&lt;p&gt;After some effort, the business card robot finally came back to life and used the latest Gemini 3 Flash model. After migrating from AI Studio to Vertex AI, several significant benefits have been brought:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;Get rid of Quota Anxiety&lt;/strong&gt;: No longer limited by AI Studio's free quota or Spending Cap, directly deduct through GCP billing, suitable for production environments.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Security Enhancement&lt;/strong&gt;: Removed the plaintext API Key in the environment variables and used GCP's Default Application Credentials (IAM) for authentication, making the architecture more secure.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Stability&lt;/strong&gt;: Enterprise-grade SLA guarantee.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This experience also reminded me that when using Vertex AI on GCP, &lt;strong&gt;you must first check the official documentation to confirm the correspondence between "Region" and "Model Name"&lt;/strong&gt; to avoid being overwhelmed by 404 errors after deployment.&lt;/p&gt;

&lt;p&gt;If you also have a project that is about to move from AI Studio to Vertex AI, I hope this pitfall record can help you avoid some detours!&lt;/p&gt;

</description>
      <category>ai</category>
      <category>googlecloud</category>
      <category>python</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>[Gemini] Building a LINE E-commerce Chatbot That Can "Tell Stories from Images"</title>
      <dc:creator>Evan Lin</dc:creator>
      <pubDate>Sun, 29 Mar 2026 02:08:30 +0000</pubDate>
      <link>https://forem.com/evanlin/gemini-building-a-line-e-commerce-chatbot-that-can-tell-stories-from-images-41i0</link>
      <guid>https://forem.com/evanlin/gemini-building-a-line-e-commerce-chatbot-that-can-tell-stories-from-images-41i0</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuc7ulj3k2ehr5j0fwdch.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuc7ulj3k2ehr5j0fwdch.png" alt="image-20260225234804185" width="800" height="860"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxxox886v85qv8909apuo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxxox886v85qv8909apuo.png" alt="image-20260225234701217" width="800" height="858"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Reference articles:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://ai.google.dev/gemini-api/docs/function-calling?hl=zh-tw#multimodal" rel="noopener noreferrer"&gt;Gemini API - Function Calling with Multimodal&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/kkdai/linebot-gemini-multimodel-funcal" rel="noopener noreferrer"&gt;GitHub: linebot-gemini-multimodel-funcal&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://cloud.google.com/vertex-ai/generative-ai/docs/multimodal/function-calling#mm-fr" rel="noopener noreferrer"&gt;Vertex AI - Multimodal Function Response&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Complete code &lt;a href="https://github.com/kkdai/linebot-gemini-multimodel-funcal" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Background
&lt;/h1&gt;

&lt;p&gt;I believe many people have used the combination of LINE Bot + Function Calling. When a user asks "What clothes did I buy last month?", the Bot calls the database query function, retrieves the order data, and then Gemini answers based on that JSON:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="err"&gt;Traditional&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;process&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;designed&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;by&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;developers:&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="err"&gt;User:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Help me see the jacket I bought before"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;Bot:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="err"&gt;Call&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;get_order_history()&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;Function&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;returns:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"product_name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Brown pilot jacket"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"order_date"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2026-01-15"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;...&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;Gemini:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"You bought a brown pilot jacket on January 15th for NT$1,890."&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The answer is completely correct, but it always feels like something is missing—the user is talking about "that jacket," and Gemini is just restating the text in the JSON, with no way to "confirm" what the jacket looks like. If there happen to be three jackets in the database, the AI can't even determine which one is the one the user remembers.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;AI can read text, but it can't see pictures&lt;/strong&gt;—this limitation has always been a blind spot in the traditional Function Calling architecture.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvkluxi9r5zkhj1vys100.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvkluxi9r5zkhj1vys100.png" alt="Google Chrome 2026-02-26 10.34.51" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fw3gpn1tkbj80ifh65vsr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fw3gpn1tkbj80ifh65vsr.png" alt="Google Chrome 2026-02-26 10.34.58" width="800" height="443"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This problem was truly solved only after Gemini introduced &lt;strong&gt;Multimodal Function Response&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  What is Multimodal Function Response?
&lt;/h2&gt;

&lt;p&gt;The traditional Function Calling process is as follows:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[User message] → Gemini → [function_call] → [Execute function] → [Return JSON] → Gemini → [Text answer]

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Multimodal Function Response&lt;/strong&gt; changes that middle step. The function can not only return JSON, but also include images (JPEG/PNG/WebP) or documents (PDF) in the same response:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fge58c6ayrjas18sl2qjz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fge58c6ayrjas18sl2qjz.png" alt="Google Chrome 2026-02-25 23.04.28" width="800" height="439"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[User message] → Gemini → [function_call] → [Execute function] → [Return JSON + image bytes] → Gemini → [Text answer that has seen the image]

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When Gemini generates the next round of answers, it can "see" both the structured data and the image returned by the function, thereby generating richer and more accurate responses.&lt;/p&gt;

&lt;p&gt;The official currently supported media formats:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Category&lt;/th&gt;
&lt;th&gt;Supported formats&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Image&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;image/jpeg&lt;/code&gt;, &lt;code&gt;image/png&lt;/code&gt;, &lt;code&gt;image/webp&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Document&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;application/pdf&lt;/code&gt;, &lt;code&gt;text/plain&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The application scenarios for this feature are very broad: e-commerce customer service (identifying product images), medical consultation (analyzing PDF inspection reports), design review (giving suggestions based on screenshots)... almost all scenarios that require "functions to return visual data for AI analysis" are applicable.&lt;/p&gt;




&lt;h2&gt;
  
  
  Project Goal
&lt;/h2&gt;

&lt;p&gt;This time, I used Multimodal Function Response to create a &lt;strong&gt;LINE e-commerce customer service robot&lt;/strong&gt;, demonstrating the following scenario:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;User: "Help me see the jacket I bought before" Bot (traditional): "You bought a brown pilot jacket." Bot (Multimodal): "From the photo, you can see that this is a brown pilot jacket, made of lightweight nylon, with metal zipper pockets on the sides. This is your January 15th order ORD-2026-0115, for a total of NT$1,890, and it has been delivered." + &lt;strong&gt;Product photo&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The difference is obvious: Gemini really "saw" the jacket, rather than just restating the text in the database.&lt;/p&gt;




&lt;h2&gt;
  
  
  Architecture Design
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Why not use Google ADK?
&lt;/h3&gt;

&lt;p&gt;Originally, this repo used Google ADK (Agent Development Kit) to manage the Agent. The &lt;code&gt;Runner&lt;/code&gt; and &lt;code&gt;Agent&lt;/code&gt; of ADK encapsulated the entire process of Function Calling, which was very convenient.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;But Multimodal Function Response requires manually including image bytes in the &lt;code&gt;parts&lt;/code&gt; of the function response, and ADK completely encapsulates this layer, so it can't be intervened.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;So this time, I directly used &lt;code&gt;google.genai.Client&lt;/code&gt; to implement the iterative cycle of function calls myself:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Old architecture (ADK)
&lt;/span&gt;&lt;span class="n"&gt;runner&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Runner&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;root_agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;...)&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;runner&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run_async&lt;/span&gt;&lt;span class="p"&gt;(...):&lt;/span&gt;
    &lt;span class="p"&gt;...&lt;/span&gt; &lt;span class="c1"&gt;# ADK handles all function calls for you, but you can't control the response content
&lt;/span&gt;
&lt;span class="c1"&gt;# New architecture (directly use google.genai)
&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;aio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;models&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate_content&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;contents&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;contents&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;GenerateContentConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;ECOMMERCE_TOOLS&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# Handle function calls yourself, include images yourself
&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Overall architecture
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;LINE User
    │
    ▼ POST /
FastAPI Webhook Handler
    │
    ▼
EcommerceAgent.process_message(text, line_user_id)
    │
    ├─ ① Call Gemini (with conversation history)
    │
    ├─ ② Gemini decides to call a tool → function_call
    │
    ├─ ③ _execute_tool()
    │ ├─ Execute query function (search_products / get_order_history / get_product_details)
    │ └─ Read real product photos in the img/ directory (Unsplash JPEG)
    │
    ├─ ④ Construct Multimodal Function Response
    │ └─ FunctionResponsePart(inline_data=FunctionResponseBlob(data=image_bytes))
    │
    ├─ ⑤ Call Gemini again (Gemini sees the image + data)
    │
    └─ ⑥ Return (ai_text, image_bytes)
    │
    ▼
LINE Reply:
  TextSendMessage(text=ai_text)
  ImageSendMessage(url=BOT_HOST_URL/images/{uuid}) ← FastAPI /images endpoint provides

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  How to get product images?
&lt;/h3&gt;

&lt;p&gt;This demo uses real &lt;strong&gt;Unsplash clothing photography photos&lt;/strong&gt;. Each of the five products corresponds to an actual photo of the item, stored in the &lt;code&gt;img/&lt;/code&gt; directory. The reading logic is very simple:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;generate_product_image&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;product&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;bytes&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Read the product image and return JPEG bytes.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;product&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;image_path&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rb&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each product in &lt;code&gt;PRODUCTS_DB&lt;/code&gt; has an &lt;code&gt;image_path&lt;/code&gt; field pointing to the corresponding image file:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Product ID&lt;/th&gt;
&lt;th&gt;Name&lt;/th&gt;
&lt;th&gt;Image&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;P001&lt;/td&gt;
&lt;td&gt;Brown pilot jacket&lt;/td&gt;
&lt;td&gt;tobias-tullius-…-unsplash.jpg&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;P002&lt;/td&gt;
&lt;td&gt;White cotton university T&lt;/td&gt;
&lt;td&gt;mediamodifier-…-unsplash.jpg&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;P003&lt;/td&gt;
&lt;td&gt;Dark blue denim jacket&lt;/td&gt;
&lt;td&gt;caio-coelho-…-unsplash.jpg&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;P004&lt;/td&gt;
&lt;td&gt;Beige knitted shawl&lt;/td&gt;
&lt;td&gt;milada-vigerova-…-unsplash.jpg&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;P005&lt;/td&gt;
&lt;td&gt;Light blue simple T-shirt&lt;/td&gt;
&lt;td&gt;cristofer-maximilian-…-unsplash.jpg&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The image bytes read have two uses:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; As &lt;code&gt;FunctionResponseBlob&lt;/code&gt; to include for Gemini analysis—real photos allow Gemini to describe the actual fabric texture and tailoring details&lt;/li&gt;
&lt;li&gt; Temporarily stored in the &lt;code&gt;image_cache&lt;/code&gt; dict, provided to the LINE Bot for display through the FastAPI &lt;code&gt;/images/{uuid}&lt;/code&gt; endpoint&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Detailed explanation of the core code
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Step 1: Define tools (FunctionDeclaration)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;google.genai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;types&lt;/span&gt;

&lt;span class="n"&gt;ECOMMERCE_TOOLS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;function_declarations&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;FunctionDeclaration&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;get_order_history&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Query the current user&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s order history&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;parameters&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Schema&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Type&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;OBJECT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;properties&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;time_range&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Schema&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                        &lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Type&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;STRING&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                        &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Time range: all / last_month / last_3_months&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                        &lt;span class="n"&gt;enum&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;all&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;last_month&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;last_3_months&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                    &lt;span class="p"&gt;),&lt;/span&gt;
                &lt;span class="p"&gt;},&lt;/span&gt;
                &lt;span class="n"&gt;required&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[],&lt;/span&gt;
            &lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="c1"&gt;# ... search_products, get_product_details
&lt;/span&gt;    &lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 2: Function call cycle (up to 5 iterations)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;process_message&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;line_user_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;contents&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_get_history&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;line_user_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Content&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;parts&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Part&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)])&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;_iteration&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="c1"&gt;# Up to 5 times, to prevent infinite loops
&lt;/span&gt;        &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;aio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;models&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate_content&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;contents&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;contents&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;GenerateContentConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;system_instruction&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;_SYSTEM_INSTRUCTION&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;ECOMMERCE_TOOLS&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;model_content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;candidates&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;
        &lt;span class="n"&gt;contents&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model_content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Find all function_call parts
&lt;/span&gt;        &lt;span class="n"&gt;fc_parts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;model_content&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;parts&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;function_call&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;function_call&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;fc_parts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="c1"&gt;# No function call → final text response
&lt;/span&gt;            &lt;span class="n"&gt;final_text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;model_content&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;parts&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;break&lt;/span&gt;

        &lt;span class="c1"&gt;# Has function call → execute tool, include image
&lt;/span&gt;        &lt;span class="n"&gt;tool_parts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;fc_part&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;fc_parts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;result_dict&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;image_bytes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;_execute_tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;fc_part&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;function_call&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="nf"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fc_part&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;function_call&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                &lt;span class="n"&gt;line_user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;tool_parts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_build_multimodal_response&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fc_part&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;function_call&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;result_dict&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;image_bytes&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;contents&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Content&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;parts&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;tool_parts&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 3: Construct Multimodal Function Response (the most critical step)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_build_multimodal_response&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;func_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;result_dict&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;image_bytes&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;multimodal_parts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;image_bytes&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# ⚠️ Note: Use FunctionResponseBlob here, not types.Blob!
&lt;/span&gt;        &lt;span class="n"&gt;multimodal_parts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;FunctionResponsePart&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;inline_data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;FunctionResponseBlob&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                    &lt;span class="n"&gt;mime_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;image/jpeg&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;image_bytes&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;# raw bytes, SDK handles base64 internally
&lt;/span&gt;                &lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Part&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_function_response&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;func_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;result_dict&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;# Structured JSON data
&lt;/span&gt;        &lt;span class="n"&gt;parts&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;multimodal_parts&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;# ← Image is here! Gemini can "see" it after receiving it
&lt;/span&gt;    &lt;span class="p"&gt;)&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Gemini will receive both &lt;code&gt;result_dict&lt;/code&gt; (order JSON) and &lt;code&gt;image_bytes&lt;/code&gt; (product image) in the next &lt;code&gt;generate_content&lt;/code&gt; call, and the generated answer can therefore describe the visual content of the image.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 4: LINE Bot simultaneously returns text + image
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# main.py
&lt;/span&gt;
&lt;span class="n"&gt;ai_text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;image_bytes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;ecommerce_agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;process_message&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;msg_text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;line_user_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;reply_messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nc"&gt;TextSendMessage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;ai_text&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;image_bytes&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;image_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;uuid&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;uuid4&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
    &lt;span class="n"&gt;image_cache&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;image_id&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;image_bytes&lt;/span&gt; &lt;span class="c1"&gt;# Temporary storage
&lt;/span&gt;    &lt;span class="n"&gt;image_url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;BOT_HOST_URL&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/images/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;image_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="c1"&gt;# FastAPI provides service
&lt;/span&gt;    &lt;span class="n"&gt;reply_messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="nc"&gt;ImageSendMessage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;original_content_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;image_url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;preview_image_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;image_url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;get_line_bot_api&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;reply_message&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;reply_token&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;reply_messages&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;LINE Bot's &lt;code&gt;reply_message&lt;/code&gt; supports returning multiple messages at once (up to 5), so text and images can be sent simultaneously.&lt;/p&gt;




&lt;h2&gt;
  
  
  Potholes
&lt;/h2&gt;

&lt;h3&gt;
  
  
  ❌ Pitfall 1: &lt;code&gt;FunctionResponseBlob&lt;/code&gt; is not &lt;code&gt;Blob&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;The most common pitfall: When constructing multimodal image parts, &lt;strong&gt;you cannot use &lt;code&gt;types.Blob&lt;/code&gt;&lt;/strong&gt;, you must use &lt;code&gt;types.FunctionResponseBlob&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# ❌ Error (will TypeError)
&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;FunctionResponsePart&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;inline_data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Blob&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mime_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;image/jpeg&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;image_bytes&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# ✅ Correct
&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;FunctionResponsePart&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;inline_data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;FunctionResponseBlob&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mime_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;image/jpeg&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;image_bytes&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Although both have &lt;code&gt;mime_type&lt;/code&gt; and &lt;code&gt;data&lt;/code&gt; fields, the &lt;code&gt;inline_data&lt;/code&gt; field type of &lt;code&gt;FunctionResponsePart&lt;/code&gt; is &lt;code&gt;FunctionResponseBlob&lt;/code&gt;, and Pydantic validation will directly reject &lt;code&gt;Blob&lt;/code&gt;. You can confirm this with &lt;code&gt;python -c "from google.genai import types; print(types.FunctionResponsePart.model_fields)"&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  ❌ Pitfall 2: &lt;code&gt;aiohttp.ClientSession&lt;/code&gt; cannot be created at the module level
&lt;/h3&gt;

&lt;p&gt;The original code directly created &lt;code&gt;aiohttp.ClientSession()&lt;/code&gt; at the module level:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# ❌ Old method: module level
&lt;/span&gt;&lt;span class="n"&gt;session&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;aiohttp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;ClientSession&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="c1"&gt;# Will warn or error if there is no running event loop
&lt;/span&gt;&lt;span class="n"&gt;async_http_client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;AiohttpAsyncHttpClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When importing &lt;code&gt;main.py&lt;/code&gt; in pytest tests, because there is no running event loop, &lt;code&gt;RuntimeError: no running event loop&lt;/code&gt; will appear. The solution is to change to lazy initialization, and create it only when it is actually needed for the first time:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# ✅ New method: lazy init
&lt;/span&gt;&lt;span class="n"&gt;_line_bot_api&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_line_bot_api&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="k"&gt;global&lt;/span&gt; &lt;span class="n"&gt;_line_bot_api&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;_line_bot_api&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;session&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;aiohttp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;ClientSession&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="c1"&gt;# Called within the async route handler, guaranteeing an event loop
&lt;/span&gt;        &lt;span class="n"&gt;_line_bot_api&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;AsyncLineBotApi&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;channel_access_token&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;AiohttpAsyncHttpClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;_line_bot_api&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  ❌ Pitfall 3: LINE Bot needs HTTPS URL to send images
&lt;/h3&gt;

&lt;p&gt;Gemini receives raw bytes, but LINE Bot's &lt;code&gt;ImageSendMessage&lt;/code&gt; requires a &lt;strong&gt;publicly accessible HTTPS URL&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The solution is to add a &lt;code&gt;/images/{image_id}&lt;/code&gt; endpoint in FastAPI, temporarily store the read image bytes in the &lt;code&gt;image_cache&lt;/code&gt; dict, and LINE gets the image through this endpoint:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@app.get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/images/{image_id}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;serve_image&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;image_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;image_bytes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;image_cache&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;image_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;image_bytes&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;HTTPException&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;status_code&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;404&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;detail&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Image not found&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;Response&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;image_bytes&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;media_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;image/jpeg&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Use &lt;code&gt;ngrok&lt;/code&gt; to expose port 8000 for local development, and use the service URL directly after Cloud Run deployment.&lt;/p&gt;




&lt;h2&gt;
  
  
  Demo Display
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Mock database (default data for demo)
&lt;/h3&gt;

&lt;p&gt;The system has 5 built-in products (all with real Unsplash photos), and each LINE user automatically binds two demo orders when querying orders for the first time:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Order number&lt;/th&gt;
&lt;th&gt;Date&lt;/th&gt;
&lt;th&gt;Product&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;ORD-2026-0115&lt;/td&gt;
&lt;td&gt;2026-01-15&lt;/td&gt;
&lt;td&gt;P001 Brown pilot jacket&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ORD-2026-0108&lt;/td&gt;
&lt;td&gt;2026-01-08&lt;/td&gt;
&lt;td&gt;P003 Dark blue denim jacket&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Scenario 1: "Help me see the jacket I bought before"
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User sends: "Help me see the jacket I bought before"

[Gemini → function_call]
  get_order_history(time_range="all")

[_execute_tool execution]
  - get_order_history() returns two orders (P001, P003)
  - Read img/tobias-tullius-...-unsplash.jpg → Brown pilot jacket real photo bytes

[Multimodal Function Response]
  Part.from_function_response(
    name="get_order_history",
    response={"orders": [...], "order_count": 2},
    parts=[FunctionResponsePart(inline_data=FunctionResponseBlob(data=&amp;lt;photo&amp;gt;))]
  )

[Gemini responds after seeing the real photo]
  "From the photo, you can see that this is a brown pilot jacket, made of lightweight nylon with
   a glossy feel, and a metal zipper pocket on the left sleeve. This is your January 15, 2026
   order ORD-2026-0115, for a total of NT$1,890, status: delivered."

LINE displays: [Text] + [Brown pilot jacket real photo]

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Scenario 2: "Are there any dark blue jackets?"
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[Gemini → function_call]
  search_products(description="dark blue jacket", color="dark blue")

[Gemini sees the real photo of the P003 dark blue denim jacket]
  "Yes! This dark blue denim jacket (P003) in the photo features a retro stitching design,
   a lapel with metal buttons, and a very complete garment feel, priced at NT$1,490, with 8 in stock."

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Scenario 3: "What are the features of the P004 knitted shawl?"
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[Gemini → function_call]
  get_product_details(product_id="P004")

[Gemini sees the real photo of the beige knitted shawl]
  "The photo shows a beige handmade crochet shawl, with a V-neck design and tassels at the bottom,
   you can see the light lace-like mesh weave, elegant texture, priced at NT$1,290."

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Traditional Function Response vs Multimodal Function Response
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Traditional&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Multimodal&lt;/strong&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Function return&lt;/td&gt;
&lt;td&gt;Pure JSON&lt;/td&gt;
&lt;td&gt;JSON + image/PDF bytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Gemini perception&lt;/td&gt;
&lt;td&gt;Text data&lt;/td&gt;
&lt;td&gt;Text + visual content&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Answer quality&lt;/td&gt;
&lt;td&gt;"You bought a brown pilot jacket"&lt;/td&gt;
&lt;td&gt;"You can see the nylon texture in the photo, with a zipper pocket on the left sleeve..."&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;API difference&lt;/td&gt;
&lt;td&gt;&lt;code&gt;Part.from_function_response(name, response)&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;Part.from_function_response(name, response, parts=[...])&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Applicable scenarios&lt;/td&gt;
&lt;td&gt;Pure text data queries&lt;/td&gt;
&lt;td&gt;Scenarios that require visual recognition/confirmation&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Analysis and Outlook
&lt;/h2&gt;

&lt;p&gt;This implementation gave me a new understanding of Gemini's Function Calling capabilities.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The problem that Multimodal Function Response truly solves&lt;/strong&gt; is to allow AI agents to bring in visual information in the action of "calling an external system" itself, instead of first checking text and then uploading images separately. This will be an important basic capability in areas highly related to visuals, such as e-commerce, medicine, and design.&lt;/p&gt;

&lt;p&gt;However, there are still a few limitations worth noting:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Image URLs cannot be used directly&lt;/strong&gt;: Gemini's &lt;code&gt;FunctionResponseBlob&lt;/code&gt; requires raw bytes, and URLs cannot be filled in directly (this is different from bringing images directly in the prompt). If the image is originally a URL, you need to download it with &lt;code&gt;requests.get()&lt;/code&gt; to bytes before passing it in.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;No display_name can also be used&lt;/strong&gt;: The official documentation examples have &lt;code&gt;display_name&lt;/code&gt; and &lt;code&gt;$ref&lt;/code&gt; JSON reference, but in actual testing in google-genai 1.49.0, it can also work normally without filling in display_name, and Gemini can still see and analyze the image.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Model limitations&lt;/strong&gt;: The official mark supports the Gemini 3 series, but &lt;code&gt;gemini-2.0-flash&lt;/code&gt; can also handle it normally in actual testing, and the API structure is the same.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;There are many directions that can be extended in the future: let users send their own product photos for the Bot to compare, include PDF catalogs in the function response for Gemini to read directly, or let the Bot analyze the report images converted from DICOM in medical scenarios... As long as visual data can be obtained from external systems, Multimodal Function Response can make the AI's answers more in-depth.&lt;/p&gt;

&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;The focus of this LINE Bot implementation is only one sentence: &lt;strong&gt;Let the function response carry the image, and Gemini's answer will be upgraded from "restating data" to "telling a story based on the picture"&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The core API is just these few lines, but it takes a lot of details to get the whole process working:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# The complete way for Gemini to see the image returned by the function
&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Part&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_function_response&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;get_order_history&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;orders&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[...]},&lt;/span&gt;
    &lt;span class="n"&gt;parts&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;FunctionResponsePart&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;inline_data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;FunctionResponseBlob&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt; &lt;span class="c1"&gt;# ← Not types.Blob!
&lt;/span&gt;                &lt;span class="n"&gt;mime_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;image/jpeg&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;image_bytes&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The complete code is on &lt;a href="https://github.com/kkdai/linebot-gemini-multimodel-funcal" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;, feel free to clone and play with it.&lt;/p&gt;

&lt;p&gt;See you next time!&lt;/p&gt;

</description>
      <category>ai</category>
      <category>api</category>
      <category>gemini</category>
      <category>llm</category>
    </item>
    <item>
      <title>Gemini Tool Combo: Building a LINE Meetup Helper with Maps Grounding and Places API in a Single API Call</title>
      <dc:creator>Evan Lin</dc:creator>
      <pubDate>Sun, 29 Mar 2026 02:07:59 +0000</pubDate>
      <link>https://forem.com/gde/gemini-tool-combo-building-a-line-meetup-helper-with-maps-grounding-and-places-api-in-a-single-api-3ppd</link>
      <guid>https://forem.com/gde/gemini-tool-combo-building-a-line-meetup-helper-with-maps-grounding-and-places-api-in-a-single-api-3ppd</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1ljj7q6yd4dju6v6uxg2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1ljj7q6yd4dju6v6uxg2.png" alt="image-20260327164715459" width="800" height="446"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Reference articles:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;a href="https://blog.google/innovation-and-ai/technology/developers-tools/gemini-api-tooling-updates/" rel="noopener noreferrer"&gt;Gemini API tooling updates: context circulation, tool combos and Maps grounding for Gemini 3&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;  &lt;a href="https://developers.google.com/maps/documentation/places/web-service/nearby-search" rel="noopener noreferrer"&gt;Google Places API (New) - searchNearby&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;  &lt;a href="https://github.com/kkdai/linebot-spot-finder" rel="noopener noreferrer"&gt;GitHub: linebot-spot-finder&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;  Complete code &lt;a href="https://github.com/kkdai/linebot-spot-finder" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; (Meeting Helper LINE Bot Spot Finder)&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Background
&lt;/h1&gt;

&lt;p&gt;The combination of LINE Bot + Gemini is already very common. Whether it's using Google Search Grounding to let the model look up real-time information or using Function Calling to let the model call custom logic, they are both mature when used alone.&lt;/p&gt;

&lt;p&gt;But what if you want to achieve both "map location context" and "query real ratings" &lt;strong&gt;in the same question&lt;/strong&gt;?&lt;/p&gt;

&lt;p&gt;Taking restaurant search as an example, the traditional approach usually looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User: "Help me find a hot pot restaurant nearby with a rating of 4 stars or above"

Solution A (using only Maps Grounding):
Gemini has map context, but the rating information is described by AI itself, and accuracy is not guaranteed.

Solution B (using only Places API):
You can get real ratings, but there is no map context, and Gemini doesn't know where the user is.

To have both, you usually need to make two API calls, or manually connect them yourself.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;AI can search maps and call external APIs, but doing both in a single call&lt;/strong&gt;—has always been an awkward blank in the old Gemini API architecture.&lt;/p&gt;

&lt;p&gt;Until March 17, 2026, Google released &lt;a href="https://blog.google/innovation-and-ai/technology/developers-tools/gemini-api-tooling-updates/" rel="noopener noreferrer"&gt;Gemini API Tooling Updates&lt;/a&gt; (by Mariano Cocirio), which provided an official solution to this problem.&lt;/p&gt;




&lt;h2&gt;
  
  
  What are Tool Combinations?
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl69w6em7cc4jzvdxmdiu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl69w6em7cc4jzvdxmdiu.png" alt="image-20260327163136077" width="800" height="341"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Google announced three core features in this &lt;a href="https://blog.google/innovation-and-ai/technology/developers-tools/gemini-api-tooling-updates/" rel="noopener noreferrer"&gt;update&lt;/a&gt;:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Tool Combinations&lt;/strong&gt; Developers can now attach built-in tools (such as Google Search, Google Maps) and custom Function Declarations simultaneously in a &lt;strong&gt;single Gemini API call&lt;/strong&gt;. The model decides which tool to call and when to call it, and finally integrates the results to generate an answer.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Maps Grounding&lt;/strong&gt; Gemini can now directly perceive map data, not just text descriptions of "location", but truly has spatial context—knowing where the user is and what's nearby.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Context Circulation&lt;/strong&gt; Allows the context between multi-turn tool calls to flow naturally, and the model can fully remember the results of the first tool call when making the second call.&lt;/p&gt;

&lt;p&gt;The key to this change is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Old approach (two tools cannot coexist)
&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;google_search&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;GoogleSearch&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;function_declarations&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;MY_FN&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="c1"&gt;# New approach (the same Tool object, both coexist)
&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;google_maps&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;GoogleMaps&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="n"&gt;function_declarations&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;MY_FN&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;One line of modification opens up a whole new combination method.&lt;/p&gt;




&lt;h2&gt;
  
  
  Project Goal
&lt;/h2&gt;

&lt;p&gt;This time, I used Tool Combinations to transform the existing &lt;strong&gt;linebot-spot-finder&lt;/strong&gt;, upgrading it from "only Maps Grounding for rough answers" to "Google Maps context + Places API real data":&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;After the user sends their GPS location, they enter: "Please find a hot pot restaurant with a rating of 4 stars or above, suitable for group dining, and list the name, address, and review summary."&lt;/p&gt;

&lt;p&gt;Bot (old Maps Grounding): "There are several hot pot restaurants nearby, and the ratings are good." (AI describes it itself, which may not be accurate)&lt;/p&gt;

&lt;p&gt;Bot (new Tool Combo): "Lao Wang Hot Pot | 100 Shimin Avenue, Xinyi District, Taipei City | Rating 4.6 (312) | Reviews: Large portions, great value for money, suitable for group dining; efficient service, fast serving."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The difference is: Gemini now receives both map context (where you are) and the &lt;strong&gt;real structured data&lt;/strong&gt; (rating numbers, review text) from the Places API, so the answer changes from a "vague description" to "informed information".&lt;/p&gt;




&lt;h2&gt;
  
  
  Architecture Design
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Overall Message Flow
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;LINE User sends GPS location
    │
    ▼
handle_location() → session.metadata stores lat/lng
    │
    └──► Returns Quick Reply (restaurant / gas station / parking lot)

LINE User sends text question (e.g. "Find a hot pot restaurant with a rating of 4 stars or above")
    │
    ▼
handle_text()
    │
    ├── session has lat/lng?
    │ Yes → tool_combo_search(query, lat, lng) ← Focus of this article
    │ No → fallback: Gemini Chat + Google Search
    │
    └──► Returns natural language answer
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Tool Combo Agentic Loop
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;tool_combo_search(query, lat, lng)
         │
         ▼
  Step 1: generate_content()
  tools = [google_maps + search_nearby_restaurants]
         │
         ▼
  response.candidates[0].content.parts has function_call?
       ╱ ╲
      Yes   No
      │     │
      ▼     ▼
  _execute_function()  Directly returns response.text
  → _call_places_api()
    (Places API searchNearby)
    Returns rating, address, reviews
      │
      ▼
  Collect into a single Content(role="user")
  Add to history
      │
      ▼
  Step 3: generate_content(contents=history)
  Gemini integrates map context + Places data
      │
      ▼
  Returns final.text
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Why not put lat/lng in Function Declaration?
&lt;/h3&gt;

&lt;p&gt;This is an important design decision.&lt;/p&gt;

&lt;p&gt;If you add &lt;code&gt;lat&lt;/code&gt;/&lt;code&gt;lng&lt;/code&gt; to the parameters of &lt;code&gt;SEARCH_NEARBY_RESTAURANTS_FN&lt;/code&gt;, Gemini will fill in the coordinates itself—but it fills in the "approximate location" inferred from the conversation, not the user's actual GPS coordinates, and the error can be as high as several kilometers.&lt;/p&gt;

&lt;p&gt;The correct approach is to let the Python dispatcher extract the precise coordinates from &lt;code&gt;session.metadata&lt;/code&gt; and &lt;strong&gt;inject&lt;/strong&gt; them:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_execute_function&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;lat&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;lng&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;search_nearby_restaurants&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;_call_places_api&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;lat&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;lat&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;lng&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;lng&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;# ← Inject from session, don't let Gemini guess
&lt;/span&gt;            &lt;span class="n"&gt;keyword&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;keyword&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="n"&gt;min_rating&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;float&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;min_rating&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;4.0&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Core Code Details
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Step 1: Define Function Declaration
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;google.genai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;types&lt;/span&gt;

&lt;span class="n"&gt;SEARCH_NEARBY_RESTAURANTS_FN&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;FunctionDeclaration&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;search_nearby_restaurants&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Search for nearby restaurants using Google Places API, and return the rating, address, and user reviews.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;lat/lng is automatically included by the system and does not need to be provided.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;parameters&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Schema&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Type&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;OBJECT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;properties&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;keyword&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Schema&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Type&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;STRING&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Restaurant type or keyword, such as: hot pot, hot pot, Italian&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;min_rating&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Schema&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Type&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NUMBER&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Minimum rating threshold (1–5), default 4.0&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;radius_m&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Schema&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Type&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;INTEGER&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Search radius (meters), default 1000&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The description clearly tells the model "lat/lng is included by the system", avoiding the model filling in the coordinates itself in the args.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2: Places API Call
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;httpx&lt;/span&gt;

&lt;span class="n"&gt;PLACES_API_URL&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://places.googleapis.com/v1/places:searchNearby&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;PLACES_FIELD_MASK&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;places.displayName,&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;places.rating,&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;places.userRatingCount,&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;places.formattedAddress,&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;places.reviews&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_call_places_api&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;lat&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;lng&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;keyword&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;min_rating&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;4.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;radius_m&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;body&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;includedTypes&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;restaurant&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;maxResultCount&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;locationRestriction&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;circle&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;center&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;latitude&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;lat&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;longitude&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;lng&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;radiusMeters&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;radius_m&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;httpx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;PLACES_API_URL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;X-Goog-Api-Key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;GOOGLE_MAPS_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;X-Goog-FieldMask&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;PLACES_FIELD_MASK&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;body&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;10.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;raise_for_status&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="n"&gt;restaurants&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;place&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;places&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[]):&lt;/span&gt;
        &lt;span class="n"&gt;rating&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;place&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rating&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;rating&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;min_rating&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;continue&lt;/span&gt;
        &lt;span class="n"&gt;reviews&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
            &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;place&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;reviews&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[])[:&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{}).&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="n"&gt;restaurants&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;place&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;displayName&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;address&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;place&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;formattedAddress&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rating&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;rating&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rating_count&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;place&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;userRatingCount&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;reviews&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;reviews&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;})&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;restaurants&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;restaurants&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 3: Tool Combo Main Function (Agentic Loop)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;tool_combo_search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;lat&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;lng&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;genai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;vertexai&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;project&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;GOOGLE_CLOUD_PROJECT&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="n"&gt;location&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;GOOGLE_CLOUD_LOCATION&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;us-central1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="n"&gt;http_options&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;HttpOptions&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;api_version&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;v1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;enriched_query&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;User&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s current location: latitude &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;lat&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;, longitude &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;lng&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;.&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Please answer in traditional Chinese using Taiwanese terminology, and do not use markdown format.&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Question: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;tool_config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;GenerateContentConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;google_maps&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;GoogleMaps&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="c1"&gt;# ← Maps grounding
&lt;/span&gt;                &lt;span class="n"&gt;function_declarations&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;SEARCH_NEARBY_RESTAURANTS_FN&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="c1"&gt;# ← Places API
&lt;/span&gt;            &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# ── Step 1 ──────────────────────────────────────────────────────
&lt;/span&gt;    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;models&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate_content&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;TOOL_COMBO_MODEL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;contents&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;enriched_query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;tool_config&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;candidates&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;（Unable to get a reply）&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="n"&gt;history&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Content&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;parts&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Part&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;enriched_query&lt;/span&gt;&lt;span class="p"&gt;)]),&lt;/span&gt;
        &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;candidates&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="c1"&gt;# ── Step 2：Processing function_call ──────────────────────────────────
&lt;/span&gt;    &lt;span class="n"&gt;function_response_parts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;part&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;candidates&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;parts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;part&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;function_call&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;fn&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;part&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;function_call&lt;/span&gt;
            &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;_execute_function&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="p"&gt;{}),&lt;/span&gt; &lt;span class="n"&gt;lat&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;lng&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;function_response_parts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Part&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                    &lt;span class="n"&gt;function_response&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;FunctionResponse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                        &lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;fn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;fn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;function_response_parts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;history&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Content&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;parts&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;function_response_parts&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

        &lt;span class="c1"&gt;# ── Step 3 ────────────────────────────────────────────────────
&lt;/span&gt;        &lt;span class="n"&gt;final&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;models&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate_content&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;TOOL_COMBO_MODEL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;contents&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;history&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;tool_config&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;final&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;（Unable to get a reply）&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;（Unable to get a reply）&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Pitfalls Encountered
&lt;/h2&gt;

&lt;h3&gt;
  
  
  ❌ Pitfall 1: &lt;code&gt;Part.from_function_response()&lt;/code&gt; does not accept the &lt;code&gt;id&lt;/code&gt; parameter
&lt;/h3&gt;

&lt;p&gt;This is the easiest pitfall to step into this time, and the error only explodes when &lt;strong&gt;real model calls&lt;/strong&gt; are made, and unit tests almost never detect it.&lt;/p&gt;

&lt;p&gt;Originally, I wrote it like this, referring to the official example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# ❌ Error——TypeError occurs at runtime
&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Part&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_function_response&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;fn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;# ← This parameter does not exist!
&lt;/span&gt;    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;fn_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The actual signature of &lt;code&gt;from_function_response&lt;/code&gt; is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;(*, name: str, response: dict, parts: Optional[list] = None) -&amp;gt; Part
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;There is no &lt;code&gt;id&lt;/code&gt; parameter at all. Every time the model actually triggers a function_call, the program will throw a &lt;code&gt;TypeError&lt;/code&gt; at this line, and then silently enter the except of Step 3, returning an error message, and the results of the Places API are never truly returned to Gemini.&lt;/p&gt;

&lt;p&gt;The correct way is to directly construct &lt;code&gt;types.FunctionResponse&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# ✅ Correct
&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Part&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;function_response&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;FunctionResponse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;fn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;fn_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can immediately confirm the parameter list with &lt;code&gt;python -c "from google.genai import types; help(types.Part.from_function_response)"&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  ❌ Pitfall 2: &lt;code&gt;include_server_side_tool_invocations=True&lt;/code&gt; causes Pydantic to explode
&lt;/h3&gt;

&lt;p&gt;I thought I should add this parameter after seeing the official documentation example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# ❌ Error
&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;GenerateContentConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[...],&lt;/span&gt;
    &lt;span class="n"&gt;include_server_side_tool_invocations&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;# ← The installed SDK version does not support it
&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In &lt;code&gt;google-genai 1.49.0&lt;/code&gt;, this field is not in the model fields of &lt;code&gt;GenerateContentConfig&lt;/code&gt;, and Pydantic will directly throw an &lt;code&gt;extra_forbidden&lt;/code&gt; validation error. Just remove it, and the function is completely normal.&lt;/p&gt;

&lt;h3&gt;
  
  
  ❌ Pitfall 3: &lt;code&gt;textQuery&lt;/code&gt; is a parameter of &lt;code&gt;searchText&lt;/code&gt;, not &lt;code&gt;searchNearby&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;I thought "if there is a keyword, then bring it into the Places API", and intuitively added it to the request body:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# ❌ Error——Invalid field for searchNearby endpoint
&lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;keyword&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;body&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;textQuery&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;keyword&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;searchNearby&lt;/code&gt; only accepts fields such as &lt;code&gt;includedTypes&lt;/code&gt;, &lt;code&gt;locationRestriction&lt;/code&gt;; &lt;code&gt;textQuery&lt;/code&gt; is a parameter of the &lt;code&gt;searchText&lt;/code&gt; endpoint. Adding this field will not report an error (in some versions), but the keyword will not take effect at all.&lt;/p&gt;

&lt;p&gt;The correct approach is to leave the keyword in the description of the Function Declaration for Gemini to refer to, let the model translate the intent to &lt;code&gt;enriched_query&lt;/code&gt;, let Maps Grounding handle the keyword semantics, and Places API is only responsible for returning real rating data.&lt;/p&gt;

&lt;h3&gt;
  
  
  ❌ Pitfall 4: No guard for &lt;code&gt;response.candidates[0]&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;When the model encounters security filtering, RECITATION, or other abnormal termination, &lt;code&gt;candidates&lt;/code&gt; may be an empty list, and then directly &lt;code&gt;response.candidates[0]&lt;/code&gt; is &lt;code&gt;IndexError&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# ❌ No guard
&lt;/span&gt;&lt;span class="n"&gt;history&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Content&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;parts&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Part&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;enriched_query&lt;/span&gt;&lt;span class="p"&gt;)]),&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;candidates&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;# ← If candidates is empty, it will explode
&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="c1"&gt;# ✅ Add guard
&lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;candidates&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;（Unable to get a reply）&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="n"&gt;history&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[...]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Demo Display
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvionmd6lsyr2srm5gg87.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvionmd6lsyr2srm5gg87.png" alt="image-20260327163200329" width="800" height="1739"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Scenario 1: "Find a hot pot restaurant with a rating of 4 stars or above for group dining"
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User sends: GPS location (Xinyi District, Taipei City, 25.0441, 121.5598)

User enters: "Please find a hot pot restaurant with a rating of 4 stars or above, suitable for group dining, and list the name, address, and review summary."

[Step 1: Gemini receives query + map context]
  → Detects the need for restaurant data, emit function_call:
    search_nearby_restaurants(keyword="hot pot", min_rating=4.0)

[Step 2: Python calls Places API]
  → lat=25.0441, lng=121.5598 injected from session
  → Returns 3 restaurants with a rating ≥ 4.0, including review text

[Step 3: Gemini integrates Maps context + Places data]
  → "Lao Wang Hot Pot｜100 Shimin Avenue, Xinyi District｜⭐ 4.6 (312)
      Review summary: Large portions, great value for money, a top choice for friends to dine; fast service, fresh dishes.
     ... (3 restaurants in total)"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Scenario 2: "Are there any high-value Japanese restaurants?"
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User enters: "Are there any high-value Japanese restaurants nearby?"

[Step 1: Gemini]
  → function_call: search_nearby_restaurants(keyword="Japanese cuisine", min_rating=4.0)

[Step 2: Places API]
  → Returns 2 Japanese restaurants that meet the rating criteria

[Step 3: Gemini]
  → "There are two recommendations:
      Washoku ○○｜...｜⭐ 4.4｜Reviews: Weekday lunch set is only 280 yuan, very fresh.
      ..."
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Demo Script Quick Test
&lt;/h3&gt;

&lt;p&gt;No need for LINE Bot, directly on the local machine:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Only test Tool Combo (main function)&lt;/span&gt;
python demo.py combo

&lt;span class="c"&gt;# Run all three functions&lt;/span&gt;
python demo.py all
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Old Architecture vs. New Architecture
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Old Architecture (Maps Grounding only)&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;New Architecture (Tool Combo)&lt;/strong&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Tool&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;google_maps&lt;/code&gt; (built-in)&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;google_maps&lt;/code&gt; + &lt;code&gt;search_nearby_restaurants&lt;/code&gt; (custom)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Rating Data&lt;/td&gt;
&lt;td&gt;Gemini describes it itself (may not be accurate)&lt;/td&gt;
&lt;td&gt;Places API real numbers&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Reviews&lt;/td&gt;
&lt;td&gt;AI generated&lt;/td&gt;
&lt;td&gt;Real user reviews (up to 3)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;API Call Count&lt;/td&gt;
&lt;td&gt;1 time&lt;/td&gt;
&lt;td&gt;1 time (Step1) + 1 time (Step3) = 2 times, but transparent to the user&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Accuracy&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Custom Filtering&lt;/td&gt;
&lt;td&gt;Rely on prompt&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;min_rating&lt;/code&gt;, &lt;code&gt;radius_m&lt;/code&gt; precise control&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Analysis and Outlook
&lt;/h2&gt;

&lt;p&gt;This implementation has given me a clearer understanding of the potential of Gemini Tool Combinations.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The problem that Tool Combinations truly solves&lt;/strong&gt; is that Grounding and Function Calling are no longer mutually exclusive. Previously, to achieve "map context + real external data", you could only manually connect two APIs yourself at the application layer, or use Gemini's text generation to "simulate" external data (unreliable). Now the model itself knows when to use map context and when to call the Places API, and developers only need to attach the tools.&lt;/p&gt;

&lt;p&gt;However, there are also a few things to note about this implementation:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;&lt;code&gt;lat/lng&lt;/code&gt; injection mode is very important&lt;/strong&gt;: You can't let the model guess the coordinates itself, you must inject them from the session, otherwise the positioning accuracy will be very poor. This mode also applies to all function calling scenarios that "have session status".&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;The cost of two &lt;code&gt;generate_content&lt;/code&gt; calls&lt;/strong&gt;: The agentic loop of Tool Combo requires two model calls, and the token consumption is about 1.5–2 times that of a single call. This needs to be especially considered for scenarios with high latency requirements.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;SDK version differences&lt;/strong&gt;: Different versions of &lt;code&gt;google-genai&lt;/code&gt; have different support for the fields of &lt;code&gt;GenerateContentConfig&lt;/code&gt;, and new fields like &lt;code&gt;include_server_side_tool_invocations&lt;/code&gt; should be used after confirming the version number, otherwise Pydantic validation errors are hard to track.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Future directions that can be extended:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Connect the Postback quick replies (click the "Find Restaurant" button) to Tool Combo, so that each entry can get real ratings&lt;/li&gt;
&lt;li&gt;  Add the &lt;code&gt;searchText&lt;/code&gt; endpoint to support more complex keyword searches (e.g. Michelin recommendations)&lt;/li&gt;
&lt;li&gt;  Tool Combo combined with other built-in tools (such as &lt;code&gt;google_search&lt;/code&gt;) to achieve more complex multi-tool chaining&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;The core concept of this modification is only one sentence: &lt;strong&gt;Put Google Maps grounding and the Places API function tool in the same &lt;code&gt;types.Tool&lt;/code&gt;, and Gemini will coordinate the two in a single conversation.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The key code is only these few lines:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# This is all the magic of Tool Combo
&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;google_maps&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;GoogleMaps&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="c1"&gt;# ← Maps context
&lt;/span&gt;    &lt;span class="n"&gt;function_declarations&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;SEARCH_NEARBY_RESTAURANTS_FN&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="c1"&gt;# ← Places API
&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;But to make it really work, you also need to pay attention to: the construction method of &lt;code&gt;FunctionResponse&lt;/code&gt;, the guard of &lt;code&gt;candidates&lt;/code&gt;, the correct fields of the Places API endpoint, and the injection of &lt;code&gt;lat/lng&lt;/code&gt; from the session instead of letting the model guess.&lt;/p&gt;

&lt;p&gt;The complete code is on &lt;a href="https://github.com/kkdai/linebot-spot-finder" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;, feel free to clone and play with it.&lt;/p&gt;

&lt;p&gt;See you next time!&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Gemini 3.1: Real-World Voice Recognition with Flash Live: Making Your LINE Bot Understand You</title>
      <dc:creator>Evan Lin</dc:creator>
      <pubDate>Sun, 29 Mar 2026 02:07:26 +0000</pubDate>
      <link>https://forem.com/gde/gemini-31-real-world-voice-recognition-with-flash-live-making-your-line-bot-understand-you-560o</link>
      <guid>https://forem.com/gde/gemini-31-real-world-voice-recognition-with-flash-live-making-your-line-bot-understand-you-560o</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjl6ycarc8j4uczflmtoj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjl6ycarc8j4uczflmtoj.png" alt="image-20260328203306501" width="800" height="1739"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Background
&lt;/h1&gt;

&lt;p&gt;Google released &lt;strong&gt;Gemini 3.1 Flash Live&lt;/strong&gt; at the end of March 2026 &lt;a href="https://blog.google/innovation-and-ai/models-and-research/gemini-models/gemini-3-1-flash-live/" rel="noopener noreferrer"&gt;March&lt;/a&gt;, focusing on "making audio AI more natural and reliable." This model is specifically designed for real-time two-way voice conversations, with low latency, interruptibility, and multi-language support.&lt;/p&gt;

&lt;p&gt;I happened to have a LINE Bot project (&lt;a href="https://github.com/kkdai/linebot-helper-python" rel="noopener noreferrer"&gt;linebot-helper-python&lt;/a&gt;) on hand, which already handles text, images, URLs, PDFs, and YouTube, but completely ignores voice messages:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User sends a voice message
Bot: (Silence)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This time, I'll add voice support and share a few pitfalls I encountered.&lt;/p&gt;




&lt;h2&gt;
  
  
  Design Decision: Flash Live or Standard Gemini API?
&lt;/h2&gt;

&lt;p&gt;The first question: Gemini 3.1 Flash Live is designed for &lt;strong&gt;real-time streaming&lt;/strong&gt;, but LINE's voice messages are &lt;strong&gt;pre-recorded m4a files&lt;/strong&gt;, not real-time audio streams.&lt;/p&gt;

&lt;p&gt;Using Flash Live to process pre-recorded files is like using a live streaming camera to take photos – technically feasible, but the wrong tool.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Decided to use the standard Gemini API&lt;/strong&gt; – directly passing the audio bytes as inline data, and getting the transcribed text in one call. It's simpler and more suitable for this scenario.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frdx9agqz7jujs89xky8x.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frdx9agqz7jujs89xky8x.png" alt="image-20260328203340798" width="800" height="322"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Architecture Design
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Integration Approach
&lt;/h3&gt;

&lt;p&gt;This repo already has a complete Orchestrator architecture, which automatically routes to different Agents (Chat, Content, Location, Vision, GitHub) based on the message content. The goal for voice messages is clear:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Convert voice to text, and then treat it as a regular text message and pass it into the Orchestrator – allowing all existing features to automatically support voice input.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;User says "Help me search for nearby gas stations" → transcribed into text → Orchestrator determines it's a location query → LocationAgent processes it. No need to implement separate logic for voice.&lt;/p&gt;

&lt;h3&gt;
  
  
  Complete Flow
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User sends AudioMessage (m4a)
    │
    ▼ handle_audio_message()
    │
    ├─ ① LINE SDK downloads audio bytes
    │ get_message_content(message_id) → iter_content()
    │
    ├─ ② Gemini transcription
    │ tools/audio_tool.py → transcribe_audio()
    │ model: gemini-3.1-flash-lite-preview
    │
    ├─ ③ Reply #1: "You said: {transcription}"
    │ reply_message() (consumes reply token)
    │
    └─ ④ Reply #2: Orchestrator routing
            handle_text_message_via_orchestrator(push_user_id=user_id)
            ↓
            push_message() (reply token already used, use push instead)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Why two replies?
&lt;/h3&gt;

&lt;p&gt;The replies are divided into two parts to let the user &lt;strong&gt;see the transcription result immediately&lt;/strong&gt;, without waiting for the Orchestrator to finish processing to know if the Bot understood what they said.&lt;/p&gt;




&lt;h2&gt;
  
  
  Core Code Explanation
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Step 1: Audio Transcription Tool (tools/audio_tool.py)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;google&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;genai&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;google.genai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;types&lt;/span&gt;

&lt;span class="n"&gt;TRANSCRIPTION_MODEL&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gemini-3.1-flash-lite-preview&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;transcribe_audio&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;audio_bytes&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;bytes&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mime_type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;audio/mp4&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Transcribe audio bytes to text using Gemini.
    LINE voice messages are always m4a, MIME type is always audio/mp4.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;genai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;vertexai&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;project&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;GOOGLE_CLOUD_PROJECT&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="n"&gt;location&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;GOOGLE_CLOUD_LOCATION&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;us-central1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;audio_part&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Part&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_bytes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;audio_bytes&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mime_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;mime_type&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;aio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;models&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate_content&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;TRANSCRIPTION_MODEL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;contents&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Content&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;parts&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
                    &lt;span class="n"&gt;audio_part&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Part&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Please transcribe the above audio content into text completely, preserving the original language, and do not add any explanations or prefixes.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                &lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Design principle: The function itself does not catch exceptions, allowing the upper-level handler to handle error responses uniformly.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2: Handler Main Flow (main.py)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;handle_audio_message&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;MessageEvent&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Handle audio (voice) messages — transcribe and route through Orchestrator.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;user_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;source&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;
    &lt;span class="n"&gt;replied&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt; &lt;span class="c1"&gt;# Track if the reply token has been used
&lt;/span&gt;    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# Download audio
&lt;/span&gt;        &lt;span class="n"&gt;message_content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;line_bot_api&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_message_content&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;audio_bytes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;b&lt;/span&gt;&lt;span class="sh"&gt;""&lt;/span&gt;
        &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;message_content&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;iter_content&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
            &lt;span class="n"&gt;audio_bytes&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt;

        &lt;span class="c1"&gt;# Transcription
&lt;/span&gt;        &lt;span class="n"&gt;transcription&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;transcribe_audio&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;audio_bytes&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Empty transcription (silent or too short)
&lt;/span&gt;        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;transcription&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
            &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;line_bot_api&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;reply_message&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;reply_token&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nc"&gt;TextSendMessage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Unable to recognize voice content, please re-record.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt;

        &lt;span class="c1"&gt;# Reply #1: Let the user confirm the transcription result (consumes reply token)
&lt;/span&gt;        &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;line_bot_api&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;reply_message&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;reply_token&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nc"&gt;TextSendMessage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You said: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;transcription&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;replied&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;

        &lt;span class="c1"&gt;# Reply #2: Send to Orchestrator, using push_message (token already used)
&lt;/span&gt;        &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;handle_text_message_via_orchestrator&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;transcription&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
            &lt;span class="n"&gt;push_user_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Error handling audio for &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;exc_info&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;error_text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;LineService&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;format_error_message&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;processing voice message&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;error_msg&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;TextSendMessage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;error_text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;replied&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="c1"&gt;# reply token has been consumed, use push instead
&lt;/span&gt;            &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;line_bot_api&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;push_message&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;error_msg&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
        &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;line_bot_api&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;reply_message&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;reply_token&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;error_msg&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 3: Enabling Orchestrator to Support External Text Input
&lt;/h3&gt;

&lt;p&gt;The original &lt;code&gt;handle_text_message_via_orchestrator&lt;/code&gt; directly reads &lt;code&gt;event.message.text&lt;/code&gt;. AudioMessage doesn't have &lt;code&gt;.text&lt;/code&gt;, so add two optional parameters:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;handle_text_message_via_orchestrator&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;MessageEvent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;# ← External text input (voice transcription)
&lt;/span&gt;    &lt;span class="n"&gt;push_user_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;# ← Use push_message when set
&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;msg&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;orchestrator&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;process_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;response_text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;format_orchestrator_response&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;reply_msg&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;TextSendMessage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;response_text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;push_user_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;line_bot_api&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;push_message&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;push_user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;reply_msg&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
        &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;line_bot_api&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;reply_message&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;reply_token&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;reply_msg&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;error_msg&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;TextSendMessage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;LineService&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;format_error_message&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;processing your question&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;push_user_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;line_bot_api&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;push_message&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;push_user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;error_msg&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
        &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;line_bot_api&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;reply_message&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;reply_token&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;error_msg&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;text is not None&lt;/code&gt; (instead of &lt;code&gt;text or ...&lt;/code&gt;) is intentional – in case the voice transcription results in an empty string, allow the empty string to pass through (and then be intercepted by the upper-level &lt;code&gt;if not transcription.strip()&lt;/code&gt;), instead of falling back to &lt;code&gt;event.message.text&lt;/code&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Pitfalls Encountered
&lt;/h2&gt;

&lt;h3&gt;
  
  
  ❌ Pitfall 1: &lt;code&gt;Part.from_text()&lt;/code&gt; does not accept positional arguments
&lt;/h3&gt;

&lt;p&gt;The first TypeError encountered:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# ❌ Error (TypeError: Part.from_text() takes 1 positional argument but 2 were given)
&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Part&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Please transcribe the above audio content into text completely, preserving the original language, and do not add any explanations or prefixes.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# ✅ Correct
&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Part&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Please transcribe the above audio content into text completely, preserving the original language, and do not add any explanations or prefixes.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In this version of the SDK, &lt;code&gt;text&lt;/code&gt; in &lt;code&gt;Part.from_text()&lt;/code&gt; is a keyword argument, or use the &lt;code&gt;Part(text=...)&lt;/code&gt; constructor directly for more safety.&lt;/p&gt;

&lt;h3&gt;
  
  
  ❌ Pitfall 2: LINE reply token can only be used once
&lt;/h3&gt;

&lt;p&gt;LINE's reply token is &lt;strong&gt;one-time use&lt;/strong&gt;. Once &lt;code&gt;reply_message()&lt;/code&gt; is called, the token is invalidated.&lt;/p&gt;

&lt;p&gt;This project's voice flow will call twice:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; Reply #1 (display transcription text) → &lt;strong&gt;consumes token&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt; Reply #2 (Orchestrator result) → &lt;strong&gt;token is invalid, will receive LINE 400 error&lt;/strong&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The solution is to have the Orchestrator handler support &lt;code&gt;push_message&lt;/code&gt; mode (via the &lt;code&gt;push_user_id&lt;/code&gt; parameter), and Reply #2 changes to &lt;code&gt;push_message&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Error handling should also be noted: if Orchestrator throws an exception after Reply #1 succeeds, the &lt;code&gt;reply_message&lt;/code&gt; cannot be used in the except block, and it also needs to be changed to &lt;code&gt;push_message&lt;/code&gt;. This is the purpose of the &lt;code&gt;replied&lt;/code&gt; flag in the code.&lt;/p&gt;

&lt;h3&gt;
  
  
  ❌ Pitfall 3: Gemini Flash Live is not suitable for pre-recorded files
&lt;/h3&gt;

&lt;p&gt;Not a real "pitfall", but worth clarifying:&lt;/p&gt;

&lt;p&gt;Gemini 3.1 Flash Live is designed for &lt;strong&gt;real-time two-way streaming&lt;/strong&gt;, which has the overhead of connection establishment and streaming protocols. LINE voice messages are complete pre-recorded m4a files, which can be processed once.&lt;/p&gt;

&lt;p&gt;Using &lt;code&gt;client.aio.models.generate_content()&lt;/code&gt; directly to pass inline audio bytes is simpler, and the delay is not bad. Leave Flash Live for scenarios that truly require real-time conversations.&lt;/p&gt;




&lt;h2&gt;
  
  
  Effect Demonstration
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Scenario 1: Voice Command Query
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User sends: [Voice] Help me search for cafes near Taipei Main Station

Bot Reply #1: You said: Help me search for cafes near Taipei Main Station
Bot Reply #2: [LocationAgent replies with a list of nearby cafes]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Scenario 2: Voice Question
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User sends: [Voice] What's the difference between Gemini and GPT-4

Bot Reply #1: You said: What's the difference between Gemini and GPT-4
Bot Reply #2: [ChatAgent with Google Search Grounding replies with comparison results]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Scenario 3: Voice Send URL
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User sends: [Voice] Help me summarize this article https://example.com/article

Bot Reply #1: You said: Help me summarize this article https://example.com/article
Bot Reply #2: [ContentAgent fetches and summarizes the article]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The text transcribed from voice goes directly into the Orchestrator, and all existing URL detection and intent determination work as usual, with zero extra logic.&lt;/p&gt;




&lt;h2&gt;
  
  
  Traditional Text Input vs. Voice Input
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Text Input&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Voice Input&lt;/strong&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Input Format&lt;/td&gt;
&lt;td&gt;TextMessage&lt;/td&gt;
&lt;td&gt;AudioMessage (m4a)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Pre-processing&lt;/td&gt;
&lt;td&gt;None&lt;/td&gt;
&lt;td&gt;Gemini transcription&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;reply token&lt;/td&gt;
&lt;td&gt;Direct use&lt;/td&gt;
&lt;td&gt;Reply #1 consumes, Reply #2 changes to push&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Orchestrator&lt;/td&gt;
&lt;td&gt;Direct routing&lt;/td&gt;
&lt;td&gt;Route after transcription&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Supported Functions&lt;/td&gt;
&lt;td&gt;All&lt;/td&gt;
&lt;td&gt;All (no additional settings required)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Error Handling&lt;/td&gt;
&lt;td&gt;reply_message&lt;/td&gt;
&lt;td&gt;replied flag determines reply/push&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Analysis and Outlook
&lt;/h2&gt;

&lt;p&gt;What I am most satisfied with in this integration is that &lt;strong&gt;I hardly need to change the Orchestrator itself&lt;/strong&gt;. As long as the voice is converted to text at the input end, all the routing logic, Agent calls, and error handling are automatically inherited.&lt;/p&gt;

&lt;p&gt;Gemini's multimodal audio understanding performs very stably in this scenario – Traditional Chinese, Taiwanese accents, and sentences mixed with English can basically be transcribed accurately.&lt;/p&gt;

&lt;p&gt;Future directions for extension:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Multi-language automatic detection&lt;/strong&gt;: Tell Gemini to preserve the original language during transcription, Japanese voice → Japanese transcription, and then the Orchestrator decides whether to translate&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Group voice support&lt;/strong&gt;: Currently limited to 1:1, voice messages in groups are temporarily ignored&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Long recording summary&lt;/strong&gt;: Recordings exceeding a certain length go directly to ContentAgent for summarization, instead of being treated as commands&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Extension: 🔊 Read Summary Aloud – Make the Bot Speak
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fghuiv4o4wq6yt1jyupmu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fghuiv4o4wq6yt1jyupmu.png" alt="Preview Program 2026-03-28 20.33.53" width="800" height="1739"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Voice recognition allows the Bot to "understand" what the user is saying. After this is done, the next question naturally arises:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Can the Bot respond by speaking?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The Gemini Live API has a setting &lt;code&gt;response_modalities: ["AUDIO"]&lt;/code&gt;, which can directly output an audio PCM stream. I connected it to another scenario – &lt;strong&gt;reading summaries aloud&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Function Design
&lt;/h3&gt;

&lt;p&gt;Each time the Bot summarizes a URL, YouTube, or PDF, a "🔊 Read Aloud" QuickReply button will appear below the message. When the user presses it, the Bot sends the summary text into Gemini Live TTS, converts the PCM audio to m4a, and then sends it back using &lt;code&gt;AudioSendMessage&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;URL summary complete
    │
    ▼ [🔊 Read Aloud] QuickReply button
    │
User presses the button → PostbackEvent
    │
    ▼ handle_read_aloud_postback()
    │
    ├─ ① Retrieve the summary text from summary_store (10 minutes TTL)
    │
    ├─ ② Gemini Live API → PCM audio
    │ model: gemini-live-2.5-flash-native-audio
    │ response_modalities: ["AUDIO"]
    │
    ├─ ③ ffmpeg transcoding: PCM → m4a
    │ s16le, 16kHz, mono → AAC
    │
    └─ ④ AudioSendMessage sent to the user
            original_content_url: /audio/{uuid}
            duration: {ms}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Core Code (tools/tts_tool.py)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;LIVE_MODEL&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gemini-live-2.5-flash-native-audio&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;text_to_speech&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;tuple&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;bytes&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;genai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;vertexai&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;project&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;VERTEX_PROJECT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;location&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;us-central1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;response_modalities&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;AUDIO&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]}&lt;/span&gt;

    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;aio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;live&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;connect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;LIVE_MODEL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;send_client_content&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;turns&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Content&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;parts&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Part&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)]),&lt;/span&gt;
            &lt;span class="n"&gt;turn_complete&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;pcm_chunks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
        &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;receive&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;server_content&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;server_content&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;model_turn&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;part&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;server_content&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;model_turn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;parts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;part&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;inline_data&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;part&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;inline_data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                        &lt;span class="n"&gt;pcm_chunks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;part&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;inline_data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;server_content&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;server_content&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;turn_complete&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="k"&gt;break&lt;/span&gt;

    &lt;span class="n"&gt;pcm_bytes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;b&lt;/span&gt;&lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pcm_chunks&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;duration_ms&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pcm_bytes&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;32000&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c1"&gt;# 16kHz × 16-bit mono
&lt;/span&gt;
    &lt;span class="c1"&gt;# PCM → m4a (temp file mode, avoid moov atom problem)
&lt;/span&gt;    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;tempfile&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;NamedTemporaryFile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;suffix&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.pcm&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;delete&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pcm_bytes&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;pcm_path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;
    &lt;span class="n"&gt;m4a_path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pcm_path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.pcm&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.m4a&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;subprocess&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ffmpeg&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-y&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;s16le&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-ar&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;16000&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-ac&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
         &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-i&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;pcm_path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-c:a&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;aac&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;m4a_path&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="n"&gt;check&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;capture_output&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;m4a_path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rb&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;duration_ms&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Pitfalls of Read Aloud Function
&lt;/h2&gt;

&lt;h3&gt;
  
  
  ❌ Pitfall 4: Completely Different Model Name
&lt;/h3&gt;

&lt;p&gt;The first attempt at Gemini Live TTS was:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;LIVE_MODEL&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gemini-3.1-flash-live-preview&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Following the inference of &lt;code&gt;gemini-3.1-flash-lite-preview&lt;/code&gt; used for voice recognition, the result was a direct 1008 policy violation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Publisher Model `projects/line-vertex/locations/global/publishers/google/
models/gemini-3.1-flash-live-preview` was not found
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Listing the available models on Vertex AI revealed that the model naming rules for Live/native audio are completely different:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# ✅ Correct
&lt;/span&gt;&lt;span class="n"&gt;LIVE_MODEL&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gemini-live-2.5-flash-native-audio&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;There is &lt;strong&gt;no Live version&lt;/strong&gt; of Gemini 3.1 on Vertex AI. The Live/native audio feature is currently the 2.5 generation, and the naming format is &lt;code&gt;gemini-live-{version}-{variant}-native-audio&lt;/code&gt;, which is completely separate from the general model &lt;code&gt;gemini-{version}-flash-{variant}&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  ❌ Pitfall 5: &lt;code&gt;GOOGLE_CLOUD_LOCATION=global&lt;/code&gt; Causes Live API to Disconnect
&lt;/h3&gt;

&lt;p&gt;After changing to the correct model name, the error message was still the same:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Publisher Model `projects/line-vertex/locations/global/...` was not found
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This time the model name was correct, but &lt;code&gt;locations/global&lt;/code&gt; was strange – we clearly set &lt;code&gt;us-central1&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Investigating the source code of the Google GenAI SDK revealed:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# _api_client.py
&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;location&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;location&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="n"&gt;env_location&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;location&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;location&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;global&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt; &lt;span class="c1"&gt;# ← here
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;location or env_location&lt;/code&gt; – if the passed-in &lt;code&gt;location&lt;/code&gt; is an empty string, it will fall back to &lt;code&gt;global&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The root cause of the problem is the environment variable of Cloud Run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"GOOGLE_CLOUD_LOCATION"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"value"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"global"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;GOOGLE_CLOUD_LOCATION&lt;/code&gt; was set to the &lt;code&gt;"global"&lt;/code&gt; string. &lt;code&gt;os.getenv("GOOGLE_CLOUD_LOCATION", "us-central1")&lt;/code&gt; did not get &lt;code&gt;"us-central1"&lt;/code&gt;, but &lt;code&gt;"global"&lt;/code&gt; – then the SDK obediently connected to the global endpoint, but &lt;code&gt;gemini-live-2.5-flash-native-audio&lt;/code&gt; does not have BidiGenerateContent support in global.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Endpoint&lt;/th&gt;
&lt;th&gt;Standard API&lt;/th&gt;
&lt;th&gt;Live API&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;global&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;✅ Available&lt;/td&gt;
&lt;td&gt;❌ Model not here&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;us-central1&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;✅ Available&lt;/td&gt;
&lt;td&gt;✅ &lt;code&gt;gemini-live-2.5-flash-native-audio&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Solution: Hardcode the location of the Live API, and don't read from the env var:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# ❌ Affected by GOOGLE_CLOUD_LOCATION=global
&lt;/span&gt;&lt;span class="n"&gt;VERTEX_LOCATION&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;GOOGLE_CLOUD_LOCATION&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;us-central1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# ✅ Hardcoded, not affected by env var
&lt;/span&gt;&lt;span class="n"&gt;VERTEX_LOCATION&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;us-central1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="c1"&gt;# Live API needs a regional endpoint
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Voice Recognition vs. Read Summary Aloud
&lt;/h2&gt;

&lt;p&gt;The two functions use completely different Gemini APIs:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Voice Recognition&lt;/th&gt;
&lt;th&gt;Read Summary Aloud&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Direction&lt;/td&gt;
&lt;td&gt;Audio → Text&lt;/td&gt;
&lt;td&gt;Text → Audio&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;API&lt;/td&gt;
&lt;td&gt;Standard &lt;code&gt;generate_content&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Live API &lt;code&gt;BidiGenerateContent&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Model&lt;/td&gt;
&lt;td&gt;&lt;code&gt;gemini-3.1-flash-lite-preview&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;gemini-live-2.5-flash-native-audio&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Location&lt;/td&gt;
&lt;td&gt;Follows env var&lt;/td&gt;
&lt;td&gt;Hardcoded &lt;code&gt;us-central1&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Output Format&lt;/td&gt;
&lt;td&gt;text&lt;/td&gt;
&lt;td&gt;PCM → ffmpeg → m4a&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;LINE Message Type&lt;/td&gt;
&lt;td&gt;Input: &lt;code&gt;AudioMessage&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Output: &lt;code&gt;AudioSendMessage&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;The release of Gemini 3.1 Flash Live makes audio AI more worthy of serious consideration. This time, both voice recognition and read summary aloud were integrated into the LINE Bot:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Voice Recognition&lt;/strong&gt;: Standard Gemini API, pre-recorded m4a one-time transcription, connected to the existing Orchestrator&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Read Summary Aloud&lt;/strong&gt;: Gemini Live TTS, summary text to PCM, ffmpeg to m4a, &lt;code&gt;AudioSendMessage&lt;/code&gt; returns&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The most troublesome part is not the function itself, but &lt;strong&gt;finding the correct model name&lt;/strong&gt; and &lt;strong&gt;locating the SDK's location logic&lt;/strong&gt; – neither of these are clearly written in a prominent place in the documentation, and the answer can only be found by listing available models and reading the SDK source code.&lt;/p&gt;

&lt;p&gt;The full code is on &lt;a href="https://github.com/kkdai/linebot-helper-python" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;, feel free to refer to it.&lt;/p&gt;

&lt;p&gt;See you next time!&lt;/p&gt;

</description>
      <category>ai</category>
      <category>gemini</category>
      <category>python</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>Building an Agent Skill Hub: From Skill Development to Automated Multilingual Documentation Deployment on GitHub Pages</title>
      <dc:creator>Evan Lin</dc:creator>
      <pubDate>Fri, 27 Mar 2026 01:45:20 +0000</pubDate>
      <link>https://forem.com/evanlin/building-an-agent-skill-hub-from-skill-development-to-automated-multilingual-documentation-5ae7</link>
      <guid>https://forem.com/evanlin/building-an-agent-skill-hub-from-skill-development-to-automated-multilingual-documentation-5ae7</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8cu37bxccsz7i7k6wl0n.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8cu37bxccsz7i7k6wl0n.png" alt="image-20260322225856161" width="800" height="692"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Reference links:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/kkdai/agent-skill-hub" rel="noopener noreferrer"&gt;Agent Skill Hub Repository&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://plateaukao.github.io/whisperASR/" rel="noopener noreferrer"&gt;whisperASR Reference Example&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.github.com/en/pages" rel="noopener noreferrer"&gt;GitHub Pages Official Documentation&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This article documents how I built a skill description specification from scratch and created a GitHub Pages documentation site that supports both Chinese and English, drawing inspiration from minimalist aesthetics, while developing the &lt;strong&gt;Agent Skill Hub (2026 Skill Library)&lt;/strong&gt;.&lt;/p&gt;

&lt;h1&gt;
  
  
  Background
&lt;/h1&gt;

&lt;p&gt;With the popularity of AI Agents (such as OpenClaw or Gemini CLI), we found that "how to quickly understand and execute specific tasks for the Agent" has become key. Instead of writing long prompts every time, it's better to package common operations into standardized &lt;strong&gt;Skills&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;To facilitate community communication and Agent reading, I created &lt;code&gt;agent-skill-hub&lt;/code&gt;. But code alone is not enough; we also need a decent "facade" – a document website that is both aesthetically pleasing and has technical details.&lt;/p&gt;




&lt;h2&gt;
  
  
  🛠️ Step 1: Standardize Skill Descriptions (SKILL.md)
&lt;/h2&gt;

&lt;p&gt;In &lt;code&gt;agent-skill-hub&lt;/code&gt;, each skill (such as &lt;code&gt;gcp-helper&lt;/code&gt; or &lt;code&gt;n8n-executor&lt;/code&gt;) has a &lt;code&gt;SKILL.md&lt;/code&gt;. The structure of this file is crucial because it's not just for humans to read, but also for LLMs to read:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Name &amp;amp; Description&lt;/strong&gt;: Let the Agent know what this is.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;When to Use&lt;/strong&gt;: Define trigger scenarios.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Core Pattern&lt;/strong&gt;: Provide standard instruction examples.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Common Mistakes&lt;/strong&gt;: Reduce errors caused by Agent hallucinations.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  🎨 Step 2: Design Style — Tribute to Minimalist Aesthetics
&lt;/h2&gt;

&lt;p&gt;When designing the web pages under the &lt;code&gt;docs&lt;/code&gt; directory, I referenced the style of &lt;strong&gt;whisperASR&lt;/strong&gt;. That design of a dark background with bright accent colors (Teal) is very in line with the aesthetics of modern developers:&lt;/p&gt;

&lt;h3&gt;
  
  
  Visual Element Highlights:
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;Gradient Title&lt;/strong&gt;: Use &lt;code&gt;linear-gradient&lt;/code&gt; to create a high-end texture.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Teal Accent Color&lt;/strong&gt;: Use &lt;code&gt;#14b8a6&lt;/code&gt; as the highlight color for key buttons and titles.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Card-style Layout&lt;/strong&gt;: Clearly present the icons and introductions of each skill, with good responsive design.&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  🌐 Step 3: Multilingual Support and Automatic Switching
&lt;/h2&gt;

&lt;p&gt;To make it available to developers worldwide, I adopted a directory-structured language management method:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;docs/
├── index.html (Language detection and redirection)
├── en/ (English version)
│ └── skills/
└── zh/ (Traditional Chinese version)
    └── skills/

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I added a simple JavaScript snippet to the root directory's &lt;code&gt;index.html&lt;/code&gt;, which automatically redirects to the correct language based on the user's browser settings:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;lang&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;navigator&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;language&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="nb"&gt;navigator&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;userLanguage&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;lang&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;startsWith&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;zh&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nb"&gt;window&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;location&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;href&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;./zh/index.html&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nb"&gt;window&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;location&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;href&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;./en/index.html&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  🚀 Step 4: GitHub Pages Deployment Process
&lt;/h2&gt;

&lt;p&gt;In 2026, the most recommended deployment method is to put the content in the &lt;code&gt;docs/&lt;/code&gt; directory of the main branch, which can keep the &lt;code&gt;main&lt;/code&gt; branch clean while keeping development and documentation synchronized.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Prepare the Directory Structure
&lt;/h3&gt;

&lt;p&gt;Create all the necessary directories at once using the command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;mkdir&lt;/span&gt; &lt;span class="nt"&gt;-p&lt;/span&gt; docs/en/skills docs/zh/skills docs/assets/css

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2. Git Commit and Push
&lt;/h3&gt;

&lt;p&gt;After completing HTML/CSS development, execute the standard Git process:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git add docs/
git commit &lt;span class="nt"&gt;-m&lt;/span&gt; &lt;span class="s2"&gt;"docs: add GitHub Pages documentation in English and Chinese"&lt;/span&gt;
git push origin main

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3. Enable GitHub Pages Settings
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt; Go to &lt;strong&gt;Settings &amp;gt; Pages&lt;/strong&gt; in the GitHub repository.&lt;/li&gt;
&lt;li&gt; Under &lt;strong&gt;Build and deployment&lt;/strong&gt;, in &lt;strong&gt;Branch&lt;/strong&gt;, select the &lt;code&gt;main&lt;/code&gt; branch and the &lt;code&gt;/docs&lt;/code&gt; folder.&lt;/li&gt;
&lt;li&gt; Click &lt;strong&gt;Save&lt;/strong&gt;, and the website will be online in a few minutes.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnj277v30sramun4c3ae5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnj277v30sramun4c3ae5.png" alt="image-20260322225932252" width="800" height="476"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  🛠️ Common Pitfalls and Troubleshooting
&lt;/h2&gt;

&lt;h3&gt;
  
  
  ❓ Why can't the webpage style (CSS) be loaded?
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Reason:&lt;/strong&gt; In HTML files under subdirectories (such as &lt;code&gt;en/skills/&lt;/code&gt;), the referenced paths must correctly use relative paths. &lt;strong&gt;Correction:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight html"&gt;&lt;code&gt;&lt;span class="c"&gt;&amp;lt;!-- In the home page index.html --&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;link&lt;/span&gt; &lt;span class="na"&gt;rel=&lt;/span&gt;&lt;span class="s"&gt;"stylesheet"&lt;/span&gt; &lt;span class="na"&gt;href=&lt;/span&gt;&lt;span class="s"&gt;"../assets/css/style.css"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
&lt;span class="c"&gt;&amp;lt;!-- In the skill detail page --&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;link&lt;/span&gt; &lt;span class="na"&gt;rel=&lt;/span&gt;&lt;span class="s"&gt;"stylesheet"&lt;/span&gt; &lt;span class="na"&gt;href=&lt;/span&gt;&lt;span class="s"&gt;"../../assets/css/style.css"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  ❓ How to ensure that the Agent can correctly read the document?
&lt;/h3&gt;

&lt;p&gt;We have retained a large number of semantic tags (&lt;code&gt;article&lt;/code&gt;, &lt;code&gt;h2&lt;/code&gt;, &lt;code&gt;pre&lt;/code&gt;, &lt;code&gt;code&lt;/code&gt;) in the HTML, so that the Agent can more accurately capture the core logic when performing RAG (Retrieval-Augmented Generation) or directly reading the webpage.&lt;/p&gt;




&lt;h2&gt;
  
  
  🏁 Conclusion
&lt;/h2&gt;

&lt;p&gt;Through this development, I have realized the importance of "documentation as product". A good AI skill library, in addition to powerful program logic, also needs a clear, intuitive, and multilingual-friendly navigation system.&lt;/p&gt;

&lt;p&gt;If you also want to create a professional facade for your AI project, you might as well refer to the &lt;code&gt;docs/&lt;/code&gt; structure layout. Happy Coding! 🦞&lt;/p&gt;




</description>
      <category>agents</category>
      <category>automation</category>
      <category>documentation</category>
      <category>github</category>
    </item>
    <item>
      <title>Security Declaration for AI Agents: Deep Dive into A2AS (Agent-to-Agent Security) Certification Mechanism</title>
      <dc:creator>Evan Lin</dc:creator>
      <pubDate>Fri, 27 Mar 2026 01:45:10 +0000</pubDate>
      <link>https://forem.com/evanlin/security-declaration-for-ai-agents-deep-dive-into-a2as-agent-to-agent-security-certification-2okf</link>
      <guid>https://forem.com/evanlin/security-declaration-for-ai-agents-deep-dive-into-a2as-agent-to-agent-security-certification-2okf</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimg.shields.io%2Fbadge%2FA2AS-CERTIFIED-f3af80" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimg.shields.io%2Fbadge%2FA2AS-CERTIFIED-f3af80" alt="A2AS-CERTIFIED" width="110" height="20"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Reference links:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://a2as.org" rel="noopener noreferrer"&gt;A2AS.org Official Website&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.a2as.org/certified/agents/kkdai/linebot-adk" rel="noopener noreferrer"&gt;linebot-adk Project Certification Page&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This article documents an interesting Pull Request I received while maintaining &lt;strong&gt;linebot-adk (LINE Bot Agent Development Kit)&lt;/strong&gt;: adding the &lt;strong&gt;A2AS security certificate&lt;/strong&gt; to the project. This is not just a YAML file, but a significant milestone for AI Agents to move towards "industrial-grade security" in 2026.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa0eydihb3vmxsrfh8k20.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa0eydihb3vmxsrfh8k20.png" alt="Google Chrome 2026-03-26 22.45.44" width="800" height="598"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Background
&lt;/h1&gt;

&lt;p&gt;When we develop Agents like &lt;code&gt;linebot-adk&lt;/code&gt; that have Tool Use (Function Calling) capabilities, the biggest concern for users is often: "Will this Agent issue commands without my permission?" or "What data can it access?".&lt;/p&gt;

&lt;p&gt;Traditionally, we could only write explanations in &lt;code&gt;README.md&lt;/code&gt;, but that's for humans to read, not for system verification. This is why &lt;strong&gt;A2AS (Agent-to-Agent Security)&lt;/strong&gt; emerged – it's hailed as the "HTTPS of the AI world".&lt;/p&gt;




&lt;h2&gt;
  
  
  🛠️ Step 1: Understanding the BASIC Model of A2AS
&lt;/h2&gt;

&lt;p&gt;A2AS is not just a name; it has a complete &lt;strong&gt;BASIC security model&lt;/strong&gt; behind it, designed to solve the trust issue between AI Agents:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;(B)ehavior Certificates&lt;/strong&gt;: Declarative certificates that clearly define the behavior boundaries of the Agent.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;(A)uthenticated Prompts&lt;/strong&gt;: Ensures that the source of prompts is trustworthy and traceable.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;(S)ecurity Boundaries&lt;/strong&gt;: Uses structured tags (such as &lt;code&gt;&amp;lt;a2as:user&amp;gt;&lt;/code&gt;) to isolate untrusted input.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;(I)n-Context Defenses&lt;/strong&gt;: Embeds defense logic in prompts to reject malicious injections.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;(C)odified Policies&lt;/strong&gt;: Writes business rules into code and enforces them during inference.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  🎨 Step 2: Deconstructing a2as.yaml – The Agent's ID Card
&lt;/h2&gt;

&lt;p&gt;In PR #1 received by &lt;code&gt;linebot-adk&lt;/code&gt;, the most core change was the addition of &lt;code&gt;a2as.yaml&lt;/code&gt;. This file is like the Agent's "digital signature", making the code logic explicit:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;manifest&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;subject&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;kkdai/linebot-adk&lt;/span&gt;
    &lt;span class="na"&gt;scope&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;main.py&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;multi_tool_agent/agent.py&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
  &lt;span class="na"&gt;issued&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;by&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;A2AS.org&lt;/span&gt;
    &lt;span class="na"&gt;url&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;https://a2as.org/certified/agents/kkdai/linebot-adk&lt;/span&gt;

&lt;span class="na"&gt;agents&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;root_agent&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;instance&lt;/span&gt;
    &lt;span class="na"&gt;models&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;gemini-2.5-flash&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
    &lt;span class="na"&gt;tools&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;get_weather&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;get_current_time&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Why is this important?
&lt;/h3&gt;

&lt;p&gt;This certificate is directly linked to the content of our &lt;code&gt;main.py&lt;/code&gt;. When the certificate declares &lt;code&gt;tools: [get_weather, get_current_time]&lt;/code&gt;, it means this is a &lt;strong&gt;limited-authorization&lt;/strong&gt; Agent. If it tries to execute &lt;code&gt;delete_database&lt;/code&gt;, the security monitoring system can immediately detect that it is outside the certificate scope.&lt;/p&gt;




&lt;h2&gt;
  
  
  🌐 Step 3: Combining Code Logic
&lt;/h2&gt;

&lt;p&gt;In &lt;code&gt;linebot-adk&lt;/code&gt;, we used Google's &lt;strong&gt;ADK (Agent Development Kit)&lt;/strong&gt; to build the Agent. The A2AS certificate can accurately map our program architecture:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Tool Declaration and Implementation
&lt;/h3&gt;

&lt;p&gt;In &lt;code&gt;multi_tool_agent/agent.py&lt;/code&gt;, we defined two tools:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_weather&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;city&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# Implement the logic to get the weather
&lt;/span&gt;    &lt;span class="bp"&gt;...&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_current_time&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;city&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# Implement the logic to get the time
&lt;/span&gt;    &lt;span class="bp"&gt;...&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The A2AS certificate will register these &lt;code&gt;function&lt;/code&gt;s in the &lt;code&gt;tools&lt;/code&gt; block, ensuring that the Agent's capability boundaries are transparent and auditable.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Runner and Execution Loop
&lt;/h3&gt;

&lt;p&gt;In &lt;code&gt;main.py&lt;/code&gt;, we start the Agent through &lt;code&gt;Runner&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;runner&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Runner&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;root_agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;app_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;APP_NAME&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;session_service&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;session_service&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;manifest.subject.scope&lt;/code&gt; in the certificate marks &lt;code&gt;main.py&lt;/code&gt;, which means the entire startup process (including FastAPI's Webhook processing) is within the A2AS compliant scope.&lt;/p&gt;




&lt;h2&gt;
  
  
  🚀 Step 4: Why is this the "HTTPS of the AI world"?
&lt;/h2&gt;

&lt;p&gt;Imagine if you want a "travel agent Agent" to talk to a "hotel reservation Agent".&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Without A2AS&lt;/strong&gt;: The travel Agent can only "blindly trust" the hotel Agent.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;With A2AS&lt;/strong&gt;: The travel Agent can first check the other party's &lt;code&gt;a2as.yaml&lt;/code&gt; certificate. If the other party claims to have the right to "modify orders" but the certificate doesn't say so, the travel Agent can refuse the transaction.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This &lt;strong&gt;"verify first, then execute"&lt;/strong&gt; model is the trust network that A2AS wants to build.&lt;/p&gt;




&lt;h2&gt;
  
  
  🛠️ Common Pitfalls and Troubleshooting
&lt;/h2&gt;

&lt;h3&gt;
  
  
  ❓ What if the certificate expires or the Commit Hash doesn't match?
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Reason:&lt;/strong&gt; A2AS certificates are bound to a specific Git Commit. When you modify the logic of &lt;code&gt;agent.py&lt;/code&gt; but don't update the certificate, the verification will fail. &lt;strong&gt;Correction:&lt;/strong&gt; Every time you modify the core functions of the Agent (such as adding a Tool or changing the Model), you must regenerate and sign &lt;code&gt;a2as.yaml&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  ❓ Does using A2AS increase latency?
&lt;/h3&gt;

&lt;p&gt;No. A2AS is mainly a "declarative" and "structured" specification. During the inference phase, it uses structured tags (S in the BASIC model) to help LLMs distinguish between instructions and data, which can reduce the hallucinations caused by the model's confusion and improve execution efficiency.&lt;/p&gt;




&lt;h2&gt;
  
  
  🏁 Conclusion
&lt;/h2&gt;

&lt;p&gt;Through the introduction of this A2AS certificate, &lt;code&gt;linebot-adk&lt;/code&gt; is no longer just a simple LINE Bot example; it has become a transparent Agent that meets the 2026 security standards. In an era where AI agents are gradually penetrating our lives, "transparency" is the best defense.&lt;/p&gt;

&lt;p&gt;If you are also developing AI Agents, you might as well go to &lt;a href="https://a2as.org" rel="noopener noreferrer"&gt;A2AS.org&lt;/a&gt; and add that badge of trust to your project. Happy Coding! 🦞&lt;/p&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>opensource</category>
      <category>security</category>
    </item>
    <item>
      <title>Deploying OpenClaw on Google Cloud VM: Avoiding Sudo and NVM Pitfalls</title>
      <dc:creator>Evan Lin</dc:creator>
      <pubDate>Sun, 01 Mar 2026 14:04:54 +0000</pubDate>
      <link>https://forem.com/gde/deploying-openclaw-on-google-cloud-vm-avoiding-sudo-and-nvm-pitfalls-92k</link>
      <guid>https://forem.com/gde/deploying-openclaw-on-google-cloud-vm-avoiding-sudo-and-nvm-pitfalls-92k</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fatem7u193qqdcdo7bfox.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fatem7u193qqdcdo7bfox.png" alt="OpenClaw on GCP" width="800" height="436"&gt;&lt;/a&gt;&lt;em&gt;(Image generated by &lt;a href="https://github.com/kkdai/nanobanana" rel="noopener noreferrer"&gt;Nano Banana&lt;/a&gt; - Gemini Image Generation)&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;References:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://openclaw.ai/" rel="noopener noreferrer"&gt;OpenClaw Official Website&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://yu-wenhao.com/zh-TW/blog/openclaw-tools-skills-tutorial/" rel="noopener noreferrer"&gt;OpenClaw Practical Tutorial: Chinese FAQ and Recommended Skills&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://yu-wenhao.com/zh-TW/blog/2026-02-04-is-openclaw-safe-security-guide/" rel="noopener noreferrer"&gt;OpenClaw Security Guide: Security Enhancement Recommendations&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://youtu.be/FC3Wo3ew130" rel="noopener noreferrer"&gt;YouTube Tutorial: Deploying OpenClaw on GCP&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This article documents the complete solution process for the permission, environment variable, and process persistence issues encountered when installing &lt;strong&gt;OpenClaw (2026 Latest Version)&lt;/strong&gt; in a Debian/Ubuntu environment on Google Cloud Platform (GCP).&lt;/p&gt;

&lt;h1&gt;
  
  
  Preface
&lt;/h1&gt;

&lt;p&gt;The AI Agent field has been very popular recently. &lt;strong&gt;OpenClaw&lt;/strong&gt;, as an open-source AI agent that can operate 24 hours a day, has impressed people with its powerful system access and browsing capabilities. For security reasons, deploying it on a cloud VM (such as GCP GCE) is the most ideal approach, which can ensure 24/7 online availability and isolate sensitive local data.&lt;/p&gt;

&lt;p&gt;However, in the default Debian/Ubuntu environment of GCP, due to the permission mechanism being slightly different from that of a general Desktop Linux, following the official script for installation often leads to many pitfalls.&lt;/p&gt;




&lt;h2&gt;
  
  
  🛠️ Basic Installation Process of OpenClaw on GCP
&lt;/h2&gt;

&lt;p&gt;Before we get into troubleshooting, let's quickly go through the standard installation logic:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Create a VM Instance
&lt;/h3&gt;

&lt;p&gt;Create a new VM in the GCP Console:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Machine type&lt;/strong&gt;: Recommended &lt;code&gt;e2-small&lt;/code&gt; or &lt;code&gt;e2-medium&lt;/code&gt; (depending on your Agent load).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Operating system&lt;/strong&gt;: Recommended to choose &lt;strong&gt;Ubuntu 24.04 LTS&lt;/strong&gt; or &lt;strong&gt;Debian 12&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hard disk&lt;/strong&gt;: Recommended 20GB or more.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2. Connect and Basic Updates
&lt;/h3&gt;

&lt;p&gt;After entering the VM via SSH, first perform a system update:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;sudo apt update &amp;amp;&amp;amp; sudo apt upgrade -y
sudo apt install -y git curl build-essential

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3. Officially Install OpenClaw
&lt;/h3&gt;

&lt;p&gt;The official website provides a one-click installation script:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;curl -fsSL https://openclaw.ai/install.sh | bash

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;But!&lt;/strong&gt; If you directly execute the above script, you will usually encounter the following two serious permission and path problems on GCP.&lt;/p&gt;




&lt;h2&gt;
  
  
  🛠️ Problem 1: "HAL 9000" Style Denial of sudo-rs
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Symptom:&lt;/strong&gt; When executing the official installation script, the following error is encountered with &lt;code&gt;sudo-rs&lt;/code&gt;:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;sudo-rs: I'm sorry evanslin. I'm afraid I can't do that&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Reason:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;Interaction Restriction&lt;/strong&gt;: The script executed via &lt;code&gt;curl ... | bash&lt;/code&gt; cannot obtain password input from the terminal when &lt;code&gt;sudo&lt;/code&gt; is required.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;No Password Account&lt;/strong&gt;: GCP defaults to using SSH Key login, and the user account usually does not have a physical password set, leading to &lt;code&gt;sudo&lt;/code&gt; authentication failure.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Solution:&lt;/strong&gt; Use &lt;strong&gt;NVM (Node Version Manager)&lt;/strong&gt; to install Node.js, and build the environment under the user directory, completely avoiding the &lt;code&gt;sudo&lt;/code&gt; requirement.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# 1. Install NVM
curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.39.7/install.sh | bash

# Reload shell configuration
source ~/.bashrc

# 2. Install Node.js
nvm install node # Recommended version v25.7.0+

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  🛠️ Problem 2: NVM Path and Environment Variables
&lt;/h2&gt;

&lt;p&gt;After using NVM, although &lt;code&gt;sudo&lt;/code&gt; is avoided, a new problem arises: when you log in again or execute commands using a non-interactive shell, the system may not be able to find the &lt;code&gt;node&lt;/code&gt; or &lt;code&gt;openclaw&lt;/code&gt; command.&lt;/p&gt;

&lt;p&gt;This is because the NVM path is dynamically loaded. It is recommended to ensure that the following content exists in &lt;code&gt;~/.bashrc&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;export NVM_DIR="$HOME/.nvm"
[-s "$NVM_DIR/nvm.sh"] &amp;amp;&amp;amp; \. "$NVM_DIR/nvm.sh"
[-s "$NVM_DIR/bash_completion"] &amp;amp;&amp;amp; \. "$NVM_DIR/bash_completion"

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  🛠️ Problem 3: How to Make OpenClaw Run 24/7 Stably?
&lt;/h2&gt;

&lt;p&gt;After installation, in order to keep the Agent running after closing the SSH window, I switched from the original GCP Web SSH to using the local &lt;code&gt;gcloud&lt;/code&gt; CLI, but I also found a new small pitfall.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Why gcloud ssh can't find openclaw?
&lt;/h3&gt;

&lt;p&gt;This is usually because GCP's &lt;code&gt;gcloud compute ssh&lt;/code&gt; may create a new username based on your &lt;strong&gt;local account name&lt;/strong&gt;, instead of using the account you used when installing on the VM (e.g., &lt;code&gt;evanslin&lt;/code&gt;).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Verification method:&lt;/strong&gt; Please enter the following in the "Web SSH" and "Local gcloud SSH" windows respectively:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;whoami

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Root cause:&lt;/strong&gt; If the web version shows &lt;code&gt;evanslin&lt;/code&gt;, but the gcloud version shows a name like &lt;code&gt;evan_lin_yourdomain_com&lt;/code&gt;, then the home directory paths of the two are completely different, and your NVM and OpenClaw settings will of course "disappear".&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Solution:&lt;/strong&gt; When executing the &lt;code&gt;gcloud&lt;/code&gt; command, &lt;strong&gt;explicitly specify&lt;/strong&gt; the account to log in to:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;gcloud compute ssh evanslin@openclaw-evanlin

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This will ensure that you return to the correct environment!&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Use tmux and Startup Script to Achieve Perfect Execution
&lt;/h3&gt;

&lt;p&gt;In order to ensure that environment variables can be loaded correctly in any SSH session (web version or gcloud version), and to keep OpenClaw running stably in the background, it is recommended to use the following "scripted" startup method.&lt;/p&gt;

&lt;h4&gt;
  
  
  Step 1: Create a Startup Script
&lt;/h4&gt;

&lt;p&gt;In a window where you can normally execute &lt;code&gt;openclaw&lt;/code&gt; (usually Web SSH), create a startup script:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;cat &amp;lt;&amp;lt; 'EOF' &amp;gt; ~/start_openclaw.sh
#!/bin/bash
# 1. Force loading NVM path
export NVM_DIR="$HOME/.nvm"
[-s "$NVM_DIR/nvm.sh"] &amp;amp;&amp;amp; \. "$NVM_DIR/nvm.sh"

# 2. Automatically correct PATH (please adjust the path according to your Node version)
export PATH="$HOME/.nvm/versions/node/v25.7.0/bin:$PATH"

# 3. Execute command
openclaw "$@"
EOF

# Grant execution permission
chmod +x ~/start_openclaw.sh

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Step 2: Verify the Script
&lt;/h4&gt;

&lt;p&gt;From now on, no matter where you log in from, please use this script uniformly. Test in the &lt;code&gt;gcloud ssh&lt;/code&gt; window:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;~/start_openclaw.sh gateway

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If it can run successfully, it means the path has been manually connected!&lt;/p&gt;

&lt;h4&gt;
  
  
  Step 3: Combine tmux to Solve the Disconnection Problem
&lt;/h4&gt;

&lt;p&gt;Now we combine the script with &lt;code&gt;tmux&lt;/code&gt; to achieve true 24/7 background operation:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;Open a new session&lt;/strong&gt;: &lt;code&gt;tmux new -s openclaw&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Execute the script inside&lt;/strong&gt;: &lt;code&gt;~/start_openclaw.sh gateway&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Perfectly detach&lt;/strong&gt;: Press &lt;code&gt;Ctrl + B&lt;/code&gt; and release, then press &lt;code&gt;D&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Reconnect at any time&lt;/strong&gt;: Next time you log in, execute &lt;code&gt;tmux a -t openclaw&lt;/code&gt;.&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;The key to deploying OpenClaw on GCP is &lt;strong&gt;"user directory priority"&lt;/strong&gt;. By using NVM to avoid the system-level &lt;code&gt;sudo-rs&lt;/code&gt; restriction, not only is the installation process smoother, but it also makes it easier to switch Node.js versions to meet the latest requirements of OpenClaw.&lt;/p&gt;

&lt;p&gt;After successful deployment, don't forget to use &lt;code&gt;openclaw onboard&lt;/code&gt; to start configuring your API Keys and communication channels (such as Telegram or Discord).&lt;/p&gt;

&lt;p&gt;I hope this note can help developers who are also working hard on GCP. See you next time!&lt;/p&gt;

</description>
      <category>ai</category>
      <category>cloud</category>
      <category>google</category>
      <category>tutorial</category>
    </item>
  </channel>
</rss>
