<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: HatmanStack</title>
    <description>The latest articles on Forem by HatmanStack (@hatmanstack).</description>
    <link>https://forem.com/hatmanstack</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3729182%2F011ba3e8-dcee-418f-a84e-f1890b44b575.png</url>
      <title>Forem: HatmanStack</title>
      <link>https://forem.com/hatmanstack</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/hatmanstack"/>
    <language>en</language>
    <item>
      <title>Multimodal Rerankers: The Fix for Object Storage RAG</title>
      <dc:creator>HatmanStack</dc:creator>
      <pubDate>Thu, 05 Mar 2026 21:09:47 +0000</pubDate>
      <link>https://forem.com/hatmanstack/multimodal-rerankers-the-fix-for-object-storage-rag-2662</link>
      <guid>https://forem.com/hatmanstack/multimodal-rerankers-the-fix-for-object-storage-rag-2662</guid>
      <description>&lt;p&gt;&lt;strong&gt;TL;DR:&lt;/strong&gt; Filtered HNSW search on object storage has a precision problem that existing solutions can't touch. At small scale, an adaptive boost works. At large scale, multimodal cross-encoders that process images and text through joint cross-attention are the architecture that fixes this.&lt;/p&gt;

&lt;p&gt;I've been running &lt;a href="https://github.com/HatmanStack/RAGStack-Lambda" rel="noopener noreferrer"&gt;RAGStack-Lambda&lt;/a&gt; on S3 Vectors with a multimodal corpus, ~60% images with metadata. In my &lt;a href="https://dev.to/hatmanstack/why-filtered-queries-return-lower-relevancy-in-s3-vectors-and-what-to-do-about-it-2587"&gt;last post&lt;/a&gt;, I documented why filtered queries consistently return ~10% lower relevancy, sometimes surfacing the wrong results entirely. The root cause is HNSW graph disconnection from post-filtering compounded by quantization noise in smaller candidate pools.&lt;/p&gt;

&lt;p&gt;I solved it at my scale with an adaptive boost that keeps filtered results ~5% above unfiltered, scaling dynamically with how aggressively the filter shrinks the candidate pool. At ~1500 documents, that's enough. This post is about what comes next, not for me, but for anyone building multimodal RAG on object-storage vector databases at scale.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;The Object Storage Problem&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;ACORN (Approximate nearest neighbor Constraint-Optimized Retrieval Network) is the proven solution for filtered HNSW search. When a first-hop neighbor fails the filter, ACORN checks that neighbor's neighbors, a two-hop expansion that keeps the graph connected through filtered-out nodes. It's predicate-agnostic, meaning you don't need to know your filters at index time. Weaviate, Qdrant, and Elasticsearch have all adopted it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The catch&lt;/strong&gt;: ACORN assumes the graph is in memory. The two-hop expansion requires cheap random access to neighbor lists. S3 Vectors stores the graph on object storage. Every additional hop is an S3 read, the latency and cost would undermine the entire reason you're using S3 Vectors in the first place.&lt;/p&gt;

&lt;p&gt;Text rerankers don't help either. I tried oversampling at 3x and reranking with Cohere Rerank 3.5 via Bedrock. Results got worse, the reranker was evaluating metadata strings like &lt;code&gt;"people: judy wilson, topic: family_photos"&lt;/code&gt;, not what cross-encoders are trained on. If your corpus is majority images, text rerankers can't see what matters.&lt;/p&gt;

&lt;p&gt;So the graph-level fix requires AWS to build something new. I've filed a &lt;a href="https://repost.aws/questions/QUjrm6KygfTBiwaaKpwqY_lQ/filtered-query-relevancy-degradation-in-s3-vectors-and-a-potential-architectural-fix" rel="noopener noreferrer"&gt;feature request&lt;/a&gt; for filter-aware traversal. But from the application layer, you need a different approach entirely.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Multimodal Cross-Encoders: The Architecture That Fits&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;While multimodal cross-attention has existed in research for years, over the last few months, a new class of production-ready rerankers has finally made this architecture viable for joint image-text processing at scale. This is architecturally different from both bi-encoders and text cross-encoders.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3a3d84ptl6gsaone966y.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3a3d84ptl6gsaone966y.jpeg" alt="Multimodal Architecture Diagram" width="800" height="436"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Bi-encoder (standard retrieval):&lt;/strong&gt; Images and text are embedded independently at ingestion time. At query time, you're comparing pre-computed vectors with a distance calculation. Fast, cheap, but the query never "sees" the image: it only sees where the image landed in vector space.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multimodal cross-encoder (reranking):&lt;/strong&gt; At query time, the candidate image is chunked into patches and passed through a vision encoder to produce visual tokens. The query becomes text tokens. Both are concatenated into a single sequence and fed through a Vision-Language Model with full cross-attention. The query tokens attend directly to spatial features in the image. An MLP head outputs a single relevance score.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The critical difference: the text of your query is directly interacting with the pixels of the candidate image. It's not comparing two independent embeddings: it's reasoning about whether this specific image is relevant to this specific query. This is the piece that was missing when I tried text rerankers.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;The Two-Stage Architecture&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F77s0ln3xh4cvp7i9oxv9.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F77s0ln3xh4cvp7i9oxv9.jpeg" alt="RAG Architecture" width="800" height="436"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Stage 1&lt;/strong&gt; stays cheap. S3 Vectors does approximate retrieval with all its existing limitations: graph holes, quantization noise, the works. But instead of needing Stage 1 to be precise, you just need it to get the right answer somewhere in a larger candidate set.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Stage 2&lt;/strong&gt; is where precision happens. The multimodal cross-encoder actually looks at each candidate image alongside the query text. It doesn't care that S3 Vectors returned a visually similar but wrong photo: it can see the image and read the query, so it can reason about whether this is actually a photo of a rare blue bird in the jungle.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Stage 3&lt;/strong&gt; is unchanged. Better candidates in, better generation out.&lt;/p&gt;

&lt;p&gt;This architecture decouples retrieval cost from retrieval precision. S3 Vectors gives you the 90% cost reduction on storage. The reranker handles the precision that the graph can't deliver under filtering. You stop asking the vector database to do something its storage medium won't allow.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;The Cost Question&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;A VLM forward pass per candidate is orders of magnitude more expensive than a cosine similarity calculation. If you're oversampling at 5x with a target k of 10, that's 50 VLM inference calls per query on a self-hosted GPU endpoint.&lt;/p&gt;

&lt;p&gt;At small scale, this doesn't make sense. The adaptive boost costs nothing and handles the problem well enough. You'd be adding a GPU endpoint to solve a problem that a multiplication operation already addresses.&lt;/p&gt;

&lt;p&gt;At large scale, the math inverts. The boost becomes unreliable, the failure modes become invisible, and the cost of wrong results in downstream generation: hallucinations, user trust erosion, bad decisions: exceeds the cost of a reranking endpoint. The GPU cost is also amortized differently at scale: a SageMaker endpoint running inference for thousands of queries per hour is a different proposition than one sitting idle for a dozen queries a day.&lt;/p&gt;

&lt;p&gt;The crossover point depends on corpus size, query volume, and tolerance for imprecision. But for anyone building multimodal RAG on object storage at enterprise scale, this architecture is where the industry is heading.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;What's Available Now&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Three models have appeared in the last few months:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Jina Reranker M0&lt;/strong&gt;: Built on Qwen-VL. Outputs a scalar relevance score from concatenated query and image/text documents. Open weights.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Llama Nemotron-Rerank-VL&lt;/strong&gt;: Nvidia's cross-encoder optimized for reranking visual documents against text queries.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Qwen3-VL Reranker&lt;/strong&gt;: Open-weight model tailored for vision-language reranking pipelines.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;None are available through Bedrock yet. They're all self-hosted, which means adding a SageMaker endpoint or GPU-backed instance. Training a custom multimodal reranker for a specific domain is also viable: fine-tune a lightweight VLM with contrastive loss on positive and hard-negative query-image-text pairs from your own corpus.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Where This Leaves Us&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;The filtered search problem on object storage has three layers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;em&gt;Graph connectivity&lt;/em&gt;: needs an infrastructure fix from AWS. ACORN's approach doesn't transfer to object storage without adaptation. I've filed the &lt;a href="https://repost.aws/questions/QUjrm6KygfTBiwaaKpwqY_lQ/filtered-query-relevancy-degradation-in-s3-vectors-and-a-potential-architectural-fix" rel="noopener noreferrer"&gt;feature request&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;Score calibration&lt;/em&gt;: the adaptive boost handles this now. It keeps filtered results surfacing above unfiltered regardless of selectivity. At small to medium scale, this is the right answer.&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;Relevance evaluation&lt;/em&gt;: multimodal cross-encoders are the first architecture that can actually determine whether an image is relevant to a query, not just whether its vector is close. This is the layer that matters at scale, and the models just arrived.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you're running multimodal RAG on S3 Vectors today, the adaptive boost is probably sufficient. If you're planning for millions of vectors with filtered search across images and text, the two-stage architecture with a multimodal reranker is the path forward. The pieces exist now: they just haven't been assembled for this specific problem yet.&lt;/p&gt;

</description>
      <category>rag</category>
      <category>ai</category>
      <category>programming</category>
      <category>opensource</category>
    </item>
    <item>
      <title>How to Deploy a Gradio App on AWS — Two Approaches Compared</title>
      <dc:creator>HatmanStack</dc:creator>
      <pubDate>Fri, 06 Feb 2026 21:08:27 +0000</pubDate>
      <link>https://forem.com/hatmanstack/how-to-deploy-a-gradio-app-on-aws-two-approaches-compared-410n</link>
      <guid>https://forem.com/hatmanstack/how-to-deploy-a-gradio-app-on-aws-two-approaches-compared-410n</guid>
      <description>&lt;p&gt;Gradio makes it easy to build ML demo interfaces, but deploying them to production is another story. Hosting platforms like HuggingFace Spaces work for prototyping, but when builds start failing due to dependency drift and you need reliability, you need your own infrastructure.&lt;/p&gt;

&lt;p&gt;In this tutorial, you'll learn two ways to deploy a Gradio application on AWS:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;App Runner&lt;/strong&gt; — an always-on managed service ($0.12/day)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Lambda with container images&lt;/strong&gt; — a serverless, pay-per-use approach (pennies per invocation)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Both approaches use real configuration files from a working deployment. By the end, you'll understand the cost and architectural tradeoffs well enough to choose the right one for your project.&lt;/p&gt;

&lt;h2&gt;
  
  
  Prerequisites
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;An AWS account&lt;/li&gt;
&lt;li&gt;A working Gradio application&lt;/li&gt;
&lt;li&gt;Basic familiarity with Docker and AWS services&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Option 1: AWS App Runner
&lt;/h2&gt;

&lt;p&gt;App Runner is a managed service for web applications and containers. You point it at a repository or container registry, and it handles scaling, load balancing, and TLS. Most of the configuration lives in an &lt;code&gt;apprunner.yaml&lt;/code&gt; file in your repo's root directory:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;  &lt;span class="na"&gt;version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1.0&lt;/span&gt;
  &lt;span class="na"&gt;runtime&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;python312&lt;/span&gt;
  &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;pre-run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;echo "Installing dependencies..."&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;pip3 install --upgrade pip&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;pip3 install -r requirements.txt&lt;/span&gt;
    &lt;span class="na"&gt;command&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;python3 app.py&lt;/span&gt;
    &lt;span class="na"&gt;network&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;port&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;7860&lt;/span&gt;
      &lt;span class="na"&gt;env&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;APP_PORT&lt;/span&gt;
    &lt;span class="na"&gt;env&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;GRADIO_SERVER_NAME&lt;/span&gt;
        &lt;span class="na"&gt;value&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;0.0.0.0"&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;GRADIO_SERVER_PORT&lt;/span&gt;
        &lt;span class="na"&gt;value&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;7860"&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;RATE_LIMIT&lt;/span&gt;
        &lt;span class="na"&gt;value&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;20"&lt;/span&gt;
    &lt;span class="na"&gt;secrets&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;MY_SECRET&lt;/span&gt;
        &lt;span class="na"&gt;value-from&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;arn:aws:secretsmanager:us-west-2:&amp;lt;your-secret-arn&amp;gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Nothing about your Gradio code changes. The custom configuration lets you specify Python 3.12 and other settings not available through the App Runner console.&lt;/p&gt;

&lt;h2&gt;
  
  
  App Runner Cost Breakdown
&lt;/h2&gt;

&lt;p&gt;You're billed $0.0064 per vCPU-hour and $0.007 per GB-hour. You can scale down to 0.25 vCPU and 0.5 GB of memory, which works out to roughly $0.12 per day for an always-on service that auto-scales under load.&lt;/p&gt;

&lt;p&gt;One thing to remember: grant the Instance Security Role permissions to communicate with other AWS services. If your Gradio app calls Bedrock, Secrets Manager, or S3, you need to add those permissions to the container's security role — not just the deployment role.&lt;/p&gt;

&lt;h2&gt;
  
  
  Option 2: AWS Lambda with Container Images
&lt;/h2&gt;

&lt;p&gt;While App Runner's always-on cost is reasonable, Lambda would be ideal for applications with bursty or infrequent traffic. Lambda has a 250 MB size limit for combined layers and deployment packages, and aggressively trimming Gradio's dependencies to fit a zip deployment isn't practical.&lt;/p&gt;

&lt;p&gt;Instead, you can use a container image with the &lt;a href="https://github.com/awslabs/aws-lambda-web-adapter" rel="noopener noreferrer"&gt;https://github.com/awslabs/aws-lambda-web-adapter&lt;/a&gt;, which lets Lambda run any HTTP application — including Gradio.&lt;/p&gt;

&lt;p&gt;You need two files in your repo: a Dockerfile and a buildspec.yaml for CodeBuild.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;  The Dockerfile

  FROM public.ecr.aws/docker/library/python:3.12-slim

  WORKDIR /app

  COPY requirements.txt .
  RUN pip install --no-cache-dir --upgrade pip
  RUN pip install --no-cache-dir -r requirements.txt

  COPY . .
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The Lambda Web Adapter translates Lambda invocations into HTTP requests&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;  &lt;span class="s"&gt;COPY --from=public.ecr.aws/awsguru/aws-lambda-adapter:0.9.0 /lambda-adapter /opt/extensions/lambda-adapter&lt;/span&gt;

  &lt;span class="s"&gt;CMD ["python3", "app.py"]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The key line is the COPY --from that pulls in the Lambda Web Adapter. This adapter sits between Lambda's invocation model and your HTTP application, translating Lambda events into standard HTTP requests that Gradio understands.&lt;/p&gt;

&lt;p&gt;The CodeBuild Spec&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;  &lt;span class="na"&gt;version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;0.2&lt;/span&gt;

  &lt;span class="na"&gt;env&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;variables&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;AWS_REGION&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;us-west-2"&lt;/span&gt;
      &lt;span class="na"&gt;AWS_ACCOUNT_ID&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;&amp;lt;your-account-id&amp;gt;"&lt;/span&gt;
      &lt;span class="na"&gt;IMAGE_REPO_NAME&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;production/gradio-demo"&lt;/span&gt;
      &lt;span class="na"&gt;IMAGE_TAG&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;latest"&lt;/span&gt;

  &lt;span class="na"&gt;phases&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;pre_build&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;commands&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;echo Logging in to Amazon ECR...&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;aws ecr get-login-password --region $AWS_REGION | docker login --username AWS --password-stdin $AWS_ACCOUNT_ID.dkr.ecr.$AWS_REGION.amazonaws.com&lt;/span&gt;

    &lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;commands&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;IMAGE_URI=$AWS_ACCOUNT_ID.dkr.ecr.$AWS_REGION.amazonaws.com/$IMAGE_REPO_NAME:$IMAGE_TAG&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;docker build -t $IMAGE_URI .&lt;/span&gt;

    &lt;span class="na"&gt;post_build&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;commands&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;docker push $IMAGE_URI&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Before running the build, create a repository in ECR to store the container image. In CodeBuild, create a new project using an S3 bucket or GitHub as the source, and select an EC2 compute environment (not Lambda compute — Lambda build containers don't include Docker).&lt;/p&gt;

&lt;h2&gt;
  
  
  Configuring Gradio for Lambda
&lt;/h2&gt;

&lt;p&gt;One critical change: your Gradio app must listen on 0.0.0.0 port 8080 so the Lambda Web Adapter can route traffic to it. Update your launch call:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;  server_port &lt;span class="o"&gt;=&lt;/span&gt; int&lt;span class="o"&gt;(&lt;/span&gt;os.environ.get&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"AWS_LAMBDA_HTTP_PORT"&lt;/span&gt;, 8080&lt;span class="o"&gt;))&lt;/span&gt;
  demo.launch&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;server_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"0.0.0.0"&lt;/span&gt;, &lt;span class="nv"&gt;server_port&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;server_port&lt;span class="o"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Deploying the Lambda Function
&lt;/h2&gt;

&lt;p&gt;In Lambda, create a new function using the container image approach. Select your ECR image and enable a Function URL — that's all you need to get the Gradio app accessible over HTTPS.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lambda Cost Optimization
&lt;/h2&gt;

&lt;p&gt;AWS recommends running container images with 2048 or 4096 MB of memory, but Gradio typically consumes 125–300 MB during operation. Setting the Lambda function to 512 MB works well and provides a buffer.&lt;/p&gt;

&lt;p&gt;Here's how the costs compare:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;
  ┌────────────────────────────────────────────┬────────────────────┐
  │               Configuration                │        Cost        │
  ├────────────────────────────────────────────┼────────────────────┤
  │ 4096 MB, always-on (EventBridge keep-warm) │ ~$5.76/day         │
  ├────────────────────────────────────────────┼────────────────────┤
  │ 512 MB, always-on (EventBridge keep-warm)  │ ~$0.71/day         │
  ├────────────────────────────────────────────┼────────────────────┤
  │ 512 MB, on-demand (cold starts)            │ ~$0.002/invocation │
  └────────────────────────────────────────────┴────────────────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The tradeoff with on-demand is cold starts — a few extra seconds on the first request. But once the Gradio frontend loads, successive calls just grab container role credentials and are fast.&lt;/p&gt;

&lt;h2&gt;
  
  
  Handling Cold Starts
&lt;/h2&gt;

&lt;p&gt;The Lambda Web Adapter checks your app's readiness by polling / (the root path). Gradio and FastAPI provide a dedicated health endpoint at /healthz that responds faster during startup. Set this in your Lambda environment variables:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;  &lt;span class="nv"&gt;AWS_LWA_READINESS_CHECK_PATH&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;/healthz
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This reduces the chance of the adapter timing out before your app finishes initializing.&lt;/p&gt;

&lt;p&gt;Which Should You Choose?&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;
  ┌──────────────────┬────────────────┬───────────────────────────────────┐
  │      Factor      │   App Runner   │              Lambda               │
  ├──────────────────┼────────────────┼───────────────────────────────────┤
  │ Cost (always-on) │ ~$0.12/day     │ ~$0.71/day (512 MB)               │
  ├──────────────────┼────────────────┼───────────────────────────────────┤
  │ Cost (on-demand) │ Not applicable │ Pennies per invocation            │
  ├──────────────────┼────────────────┼───────────────────────────────────┤
  │ Cold starts      │ None           │ A few seconds                     │
  ├──────────────────┼────────────────┼───────────────────────────────────┤
  │ Scaling          │ Automatic      │ Automatic                         │
  ├──────────────────┼────────────────┼───────────────────────────────────┤
  │ Setup complexity │ Lower          │ Higher (Docker + CodeBuild + ECR) │
  └──────────────────┴────────────────┴───────────────────────────────────┘

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Choose App Runner if you want simplicity, consistent response times, and your app gets steady traffic.&lt;/p&gt;

&lt;p&gt;Choose Lambda if your traffic is bursty, you want to minimize costs during idle periods, or you're already invested in the serverless ecosystem.&lt;/p&gt;

&lt;p&gt;Either service works well for Gradio deployments. The right choice depends on your traffic pattern and how much you care about cold starts versus idle costs.&lt;/p&gt;

</description>
      <category>aws</category>
      <category>machinelearning</category>
      <category>python</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>How to Connect Google Forms to Snowflake Using Cloud Run</title>
      <dc:creator>HatmanStack</dc:creator>
      <pubDate>Fri, 06 Feb 2026 20:58:25 +0000</pubDate>
      <link>https://forem.com/hatmanstack/how-to-connect-google-forms-to-snowflake-using-cloud-run-5112</link>
      <guid>https://forem.com/hatmanstack/how-to-connect-google-forms-to-snowflake-using-cloud-run-5112</guid>
      <description>&lt;p&gt;Every major cloud provider has tools to collect survey data within their own ecosystem. But what if you need form responses to land in a data warehouse on a different platform? That takes a little&lt;br&gt;
  wiring.&lt;/p&gt;

&lt;p&gt;In this tutorial, you'll build a pipeline that automatically sends Google Forms responses to a Snowflake table. The architecture uses four components:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Google Forms&lt;/strong&gt; — collects the data&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Apps Script&lt;/strong&gt; — triggers on each form submission&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Google Cloud Run&lt;/strong&gt; — runs a Node.js service that connects to Snowflake&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Snowflake Node Connector&lt;/strong&gt; — inserts the data using parameterized queries&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;By the end, every form submission will automatically appear in your Snowflake warehouse within a couple of minutes.&lt;/p&gt;
&lt;h2&gt;
  
  
  Prerequisites
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;A Google Cloud account with billing enabled&lt;/li&gt;
&lt;li&gt;A Snowflake account (a free trial works)&lt;/li&gt;
&lt;li&gt;Basic familiarity with Node.js and Docker&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  Step 1: Set Up the Google Form and Apps Script
&lt;/h2&gt;

&lt;p&gt;Create a Google Form and navigate to the &lt;strong&gt;Responses&lt;/strong&gt; tab. Click the Google Sheets icon to create a linked spreadsheet — this is where form responses land before we forward them to Snowflake.&lt;/p&gt;

&lt;p&gt;Open the linked spreadsheet, then go to &lt;strong&gt;Extensions → Apps Script&lt;/strong&gt;. Create a function that runs on each form submission:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;  &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;myFunction&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;var&lt;/span&gt; &lt;span class="nx"&gt;ss&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;SpreadsheetApp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getActiveSpreadsheet&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="kd"&gt;var&lt;/span&gt; &lt;span class="nx"&gt;sheet&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;ss&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getSheets&lt;/span&gt;&lt;span class="p"&gt;()[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
    &lt;span class="kd"&gt;var&lt;/span&gt; &lt;span class="nx"&gt;range&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;sheet&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getRange&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;A2:E&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="nx"&gt;range&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sort&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="na"&gt;column&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;ascending&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="kd"&gt;var&lt;/span&gt; &lt;span class="nx"&gt;url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;""&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="c1"&gt;// We'll fill this in after deploying Cloud Run&lt;/span&gt;
    &lt;span class="kd"&gt;var&lt;/span&gt; &lt;span class="nx"&gt;headers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;contentType&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;application/json&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;headers&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;X-PW-AccessToken&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;&amp;lt;TOKEN&amp;gt;&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;X-PW-Application&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;developer_api&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;X-PW-UserEmail&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;&amp;lt;YOUR_EMAIL&amp;gt;&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;};&lt;/span&gt;
    &lt;span class="nx"&gt;UrlFetchApp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This script sorts entries by timestamp (newest first), then makes an HTTP request to our Cloud Run service. Leave the url empty for now — we'll fill it in after deploying the connector.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 2: Create a Google Cloud Service Account
&lt;/h2&gt;

&lt;p&gt;Service accounts let applications authenticate with Google APIs without using a personal login. Our Cloud Run service needs one to read from the Google Sheet.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;In Google Cloud Console, go to IAM &amp;amp; Admin → Service Accounts → Create Service Account&lt;/li&gt;
&lt;li&gt;Save the generated email address — you'll need it later to share the Google Sheet&lt;/li&gt;
&lt;li&gt;Go to the Keys tab → Add Key → JSON. This downloads a credential file to your machine&lt;/li&gt;
&lt;li&gt;Enable the Google Sheets API in your project (APIs &amp;amp; Services → Enable APIs)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Keep the downloaded JSON file — we'll include it in our Cloud Run deployment as creds.json.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 3: Prepare Your Snowflake Table
&lt;/h2&gt;

&lt;p&gt;Before the connector can insert data, the destination table must exist. In Snowflake, run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;  &lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;DATABASE&lt;/span&gt; &lt;span class="n"&gt;IF&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;EXISTS&lt;/span&gt; &lt;span class="n"&gt;DEMO_DB&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;SCHEMA&lt;/span&gt; &lt;span class="n"&gt;IF&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;EXISTS&lt;/span&gt; &lt;span class="n"&gt;DEMO_DB&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;PUBLIC&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;IF&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;EXISTS&lt;/span&gt; &lt;span class="n"&gt;DEMO_DB&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;PUBLIC&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;SHEETS&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;TS&lt;/span&gt; &lt;span class="n"&gt;STRING&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;NAME&lt;/span&gt; &lt;span class="n"&gt;STRING&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;DAYS&lt;/span&gt; &lt;span class="n"&gt;STRING&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;DIET&lt;/span&gt; &lt;span class="n"&gt;STRING&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;PAY&lt;/span&gt; &lt;span class="n"&gt;STRING&lt;/span&gt;
  &lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Adjust the column names and types to match your form's fields.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 4: Build the Node.js Connector
&lt;/h2&gt;

&lt;p&gt;The connector is a small Express server that reads the latest form entry from Google Sheets and inserts it into Snowflake. Create three files:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;  &lt;span class="nx"&gt;index&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;js&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;require&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;path&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;google&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;require&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;googleapis&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;sheets&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;google&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sheets&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;v4&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;snow&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;require&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;snowflake-sdk&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;express&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;require&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;express&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;app&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;express&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;getInvite&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;auth&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nx"&gt;google&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;auth&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;GoogleAuth&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
      &lt;span class="na"&gt;keyFile&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;__dirname&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;creds.json&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
      &lt;span class="na"&gt;scopes&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;https://www.googleapis.com/auth/spreadsheets&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;
    &lt;span class="nx"&gt;google&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;options&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;auth&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;sheets&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;spreadsheets&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;values&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
      &lt;span class="na"&gt;spreadsheetId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;&amp;lt;your-sheets-id&amp;gt;&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;// From the sheet URL&lt;/span&gt;
      &lt;span class="na"&gt;range&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;A2:E2&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;row&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;values&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;row&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="nx"&gt;row&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;No data found.&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
      &lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;connection&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;snow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;createConnection&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
      &lt;span class="na"&gt;account&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;&amp;lt;locator&amp;gt;.&amp;lt;cloud-provider&amp;gt;&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;// e.g., xh45729.us-east-2.aws&lt;/span&gt;
      &lt;span class="na"&gt;username&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;&amp;lt;your-username&amp;gt;&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;password&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;&amp;lt;your-password&amp;gt;&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;warehouse&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;COMPUTE_WH&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;database&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;DEMO_DB&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;schema&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;PUBLIC&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;ACCOUNTADMIN&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;conn&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;connection&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;connect&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

    &lt;span class="c1"&gt;// Parameterized query prevents SQL injection — never concatenate&lt;/span&gt;
    &lt;span class="c1"&gt;// user-supplied values directly into SQL strings&lt;/span&gt;
    &lt;span class="nx"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
      &lt;span class="na"&gt;sqlText&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;INSERT INTO DEMO_DB.PUBLIC.SHEETS (TS, NAME, DAYS, DIET, PAY) VALUES (?, ?, ?, ?, ?)&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;binds&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="p"&gt;};&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;port&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;PORT&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="mi"&gt;8080&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nx"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;listen&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;port&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`Listening on port &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;port&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;

  &lt;span class="nx"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nf"&gt;getInvite&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;send&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Adding Data&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Your Google Sheet ID is the long string in the sheet's URL between /d/ and /edit:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;https://docs.google.com/spreadsheets/d/&amp;lt;sheets-id&amp;gt;/edit#gid=0&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;The Snowflake account identifier (.) is in the bottom-left of your Snowflake console, or in the login URL. For example: xh45729.us-east-2.aws. Check the &lt;a href="https://docs.snowflake.com/en/user-guide/admin-account-identifier" rel="noopener noreferrer"&gt;https://docs.snowflake.com/en/user-guide/admin-account-identifier&lt;/a&gt; if you're unsure — the format varies by deployment.&lt;/p&gt;

&lt;p&gt;Notice the parameterized query with ? placeholders and the binds array. This is important: parameterized queries prevent SQL injection by letting the database driver handle escaping. Never build SQL strings by concatenating user input directly.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;  Dockerfile

  FROM node:20-slim

  WORKDIR /usr/src/app

  &lt;span class="c"&gt;# Copy dependency manifests first — Docker caches this layer&lt;/span&gt;
  &lt;span class="c"&gt;# so npm install only re-runs when dependencies change&lt;/span&gt;
  COPY package*.json ./

  RUN npm install --production

  COPY . .

  EXPOSE 8080
  CMD ["node", "index.js"]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;  &lt;span class="kr"&gt;package&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;json&lt;/span&gt;

  &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;name&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;node-sheets-to-snow&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;version&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;2.0.0&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;description&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Google Sheets to Snowflake connector&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;main&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;index.js&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;scripts&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;start&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;node index.js&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;engines&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;node&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;&amp;gt;=20.0.0&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;dependencies&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;express&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;^4.21.0&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;googleapis&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;^144.0.0&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;snowflake-sdk&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;^2.0.2&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Step 5: Deploy to Google Cloud Run
&lt;/h2&gt;

&lt;p&gt;Place all four files (index.js, Dockerfile, package.json, creds.json) in a directory. Open Cloud Shell from the Google Cloud Console (the terminal icon in the top right), upload the files, and deploy:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;  &lt;span class="nb"&gt;cd &lt;/span&gt;your-project-directory
  gcloud run deploy &lt;span class="nt"&gt;--source&lt;/span&gt; &lt;span class="nb"&gt;.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Cloud Run will ask for a service name and region. It builds the container, deploys it, and returns a URL. Copy that URL.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 6: Wire Everything Together
&lt;/h2&gt;

&lt;p&gt;Two final connections:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Share the Google Sheet with your service account. Click the Share button on the spreadsheet and add the service account email from Step 2. This gives the Cloud Run service permission to read
form responses.&lt;/li&gt;
&lt;li&gt;Add the Cloud Run URL to your Apps Script. Go back to Extensions → Apps Script and paste the URL into the url variable in your function.&lt;/li&gt;
&lt;/ol&gt;

&lt;h1&gt;
  
  
  Testing It
&lt;/h1&gt;

&lt;p&gt;Submit a response through your Google Form. Within one to two minutes, the data should appear in your Snowflake table. The slight delay comes from Cloud Run's cold start — the container spins down when idle and takes a moment to restart on the first request.&lt;/p&gt;

&lt;p&gt;You can verify by running in Snowflake:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;  SELECT &lt;span class="k"&gt;*&lt;/span&gt; FROM DEMO_DB.PUBLIC.SHEETS ORDER BY TS DESC LIMIT 5&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The complete source code is available on &lt;a href="https://github.com/HatmanStack/snow-node-sheets-gpc" rel="noopener noreferrer"&gt;https://github.com/HatmanStack/snow-node-sheets-gpc&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>programming</category>
      <category>devops</category>
    </item>
    <item>
      <title>Why Filtered Queries Return Lower Relevancy in S3 Vectors (And What To Do About It)</title>
      <dc:creator>HatmanStack</dc:creator>
      <pubDate>Fri, 30 Jan 2026 00:15:03 +0000</pubDate>
      <link>https://forem.com/hatmanstack/why-filtered-queries-return-lower-relevancy-in-s3-vectors-and-what-to-do-about-it-2587</link>
      <guid>https://forem.com/hatmanstack/why-filtered-queries-return-lower-relevancy-in-s3-vectors-and-what-to-do-about-it-2587</guid>
      <description>&lt;p&gt;&lt;strong&gt;TL;DR:&lt;/strong&gt; The ~10% relevancy drop on filtered S3 Vector queries isn't a bug — it's quantization noise plus graph disconnection from post-filtering. Boost or re-rank to fix it.&lt;/p&gt;




&lt;p&gt;I've been running &lt;a href="https://github.com/HatmanStack/RAGStack-Lambda" rel="noopener noreferrer"&gt;RAGStack-Lambda&lt;/a&gt; with ~1500 documents in a knowledge base. After revamping my metadata for S3 Vectors, something weird happened: filtered queries started returning the wrong results. I'd search for a specific person with explicit filters and get back a picture of a &lt;em&gt;different&lt;/em&gt; person. The visual similarity was overpowering my metadata filters.&lt;/p&gt;

&lt;p&gt;After digging in, I found filtered results consistently score ~10% lower in relevancy than unfiltered queries — even for the same content. This isn't a bug. It's a predictable consequence of how S3 Vectors is architected.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Trade-Off You're Making
&lt;/h2&gt;

&lt;p&gt;S3 Vectors can cut your vector database costs by 90%. A billion vectors runs ~$46/month versus $660+ on Pinecone. The catch? You're trading precision for price.&lt;/p&gt;

&lt;p&gt;Two mechanisms cause the relevancy drop:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Quantization Noise
&lt;/h3&gt;

&lt;p&gt;S3 Vectors uses aggressive 4-bit Product Quantization to compress vectors — shrinking them by 64x so they can live on object storage instead of RAM.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Unfiltered search:&lt;/strong&gt; With millions of candidates, the sheer volume drowns out the approximation error. Strong matches still surface.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Filtered search:&lt;/strong&gt; Your candidate pool shrinks. The algorithm evaluates vectors that are further away in the space. Suddenly that quantization error is a significant chunk of your distance calculation. The ~10% drop corresponds to the noise floor of 4-bit quantization.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. The Disconnected Graph Problem
&lt;/h3&gt;

&lt;p&gt;S3 Vectors uses HNSW (Hierarchical Navigable Small World) — a graph where vectors connect to their neighbors. Search works by traversing edges to find the nearest match.&lt;/p&gt;

&lt;p&gt;When you filter, you're turning off nodes. Remove 90% of vectors and you create holes in the graph. The search algorithm gets trapped — the "bridge" edges to better regions have been filtered out. It settles for local minima instead of finding your actual best match.&lt;/p&gt;

&lt;p&gt;This is why I was getting the wrong person's photo. Visually similar, passed the traversal, but wrong.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Fix (What I Thought)
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Short-term&lt;/strong&gt;: A 1.25x boost for filtered results normalized my scores. Crude but effective.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Long-term&lt;/strong&gt;: Re-ranking. Oversample (request 3-5x your target k), then re-rank with a cross-encoder or Bedrock’s rerank API. Use S3 Vectors for cheap retrieval and smarter compute for precision.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Fix (What Actually Worked)
&lt;/h2&gt;

&lt;p&gt;I spent time implementing the ”&lt;em&gt;sophisticated&lt;/em&gt;” solution. Oversample filtered results at 3x, rerank with Cohere Rerank 3.5 via Bedrock, merge the slices fairly.&lt;/p&gt;

&lt;p&gt;Results got worse.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The problem&lt;/strong&gt;: rerankers are designed for text documents. My knowledge base is ~60% images with metadata. The reranker was evaluating synthesized text like “people: judy wilson, topic: family_photos” — not what cross-encoders are optimized for.&lt;/p&gt;

&lt;p&gt;Meanwhile, the raw vector similarity scores from visual embeddings were actually good relevance signals. I was replacing useful information with noise.&lt;/p&gt;

&lt;p&gt;I tried reranking both filtered and unfiltered slices. I tried reranking only filtered. I tried dropping visual-only results from unfiltered. Each ”&lt;em&gt;improvement&lt;/em&gt;” made things worse or traded one problem for another.&lt;/p&gt;

&lt;p&gt;The 1.25x boost I dismissed as “&lt;em&gt;crude&lt;/em&gt;”? It’s still running in production.&lt;/p&gt;

</description>
      <category>aws</category>
      <category>cloud</category>
      <category>ai</category>
      <category>opensource</category>
    </item>
    <item>
      <title>RAGStack-Lambda: Scale-to-Zero RAG with Multimodal Search</title>
      <dc:creator>HatmanStack</dc:creator>
      <pubDate>Fri, 23 Jan 2026 20:48:40 +0000</pubDate>
      <link>https://forem.com/hatmanstack/ragstack-lambda-scale-to-zero-rag-with-multimodal-search-d0j</link>
      <guid>https://forem.com/hatmanstack/ragstack-lambda-scale-to-zero-rag-with-multimodal-search-d0j</guid>
      <description>&lt;p&gt;Most RAG architectures charge you $300+/month for vector databases that run whether you're querying or not. RAGStack-Lambda scales to zero. $7-10/month for 1,000 documents.&lt;/p&gt;

&lt;p&gt;The trick is S3 Vectors + Lambda + Bedrock. You trade sub-50ms latency for hundreds of milliseconds. For chat interfaces and document Q&amp;amp;A, that's fine.&lt;/p&gt;

&lt;h2&gt;
  
  
  Beyond Text Search
&lt;/h2&gt;

&lt;p&gt;Amazon Nova embeddings put text, images, and video frames in the same vector space. Upload a photo, search with natural language, get semantically relevant results.&lt;/p&gt;

&lt;p&gt;For video: frames get visual embeddings and audio gets transcribed into 30-second chunks with speaker identification. Every chunk carries timestamp metadata. Query by what's said or what's shown — citations link directly to that segment.&lt;/p&gt;

&lt;h2&gt;
  
  
  Smarter Retrieval
&lt;/h2&gt;

&lt;p&gt;RAGStack doesn't just embed your content. It analyzes it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Metadata extraction&lt;/strong&gt; examines each document and pulls structured fields automatically — topic, document type, date range, whatever's relevant.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Filter generation&lt;/strong&gt; samples your knowledge base and creates few-shot examples based on what it finds. No manual curation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Multi-slice queries&lt;/strong&gt; run parallel retrievals using those generated filters. Instead of one broad search, you get multiple targeted queries returning more relevant results.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Stack
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;One-click AWS Marketplace deployment&lt;/li&gt;
&lt;li&gt;Framework-agnostic web component (one script tag)&lt;/li&gt;
&lt;li&gt;MCP server for Claude Desktop, Cursor, VS Code&lt;/li&gt;
&lt;li&gt;Everything runs in your account — no external control plane&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://github.com/HatmanStack/RAGStack-Lambda" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; | &lt;a href="//dhrmkxyt1t9pb.cloudfront.net"&gt;Demo&lt;/a&gt; | &lt;a href="https://portfolio.hatstack.fun/read/post/RAGStack-Lambda" rel="noopener noreferrer"&gt;Blog&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Login: &lt;a href="mailto:guest@hatstack.fun"&gt;guest@hatstack.fun&lt;/a&gt; / Guest@123&lt;/p&gt;

</description>
      <category>aws</category>
      <category>rag</category>
      <category>ai</category>
      <category>cloud</category>
    </item>
  </channel>
</rss>
