Forem: Gleb Otochkin

AlloyDB Agentic RAG Application with MCP Toolbox [Part 2]

Gleb Otochkin — Tue, 25 Nov 2025 21:18:50 +0000

This is Part 2 of the AlloyDB Agentic RAG application codelab, please start with Part 1.

7. Deploy the MCP Toolbox to Cloud Run

Now we can deploy the MCP Toolbox to Cloud Run. There are different ways how the MCP toolbox can be deployed. The simplest way is to run it from the command line but if we want to have it as a scalable and reliable service then Cloud Run is a better solution.

Prepare Client ID

To use booking functionality of the application we need to prepare OAuth 2.0 Client ID using Cloud Console. Without it we cannot sign into the application with our Google credentials to make a booking and record the booking to the database.

In the Cloud Console go to the APIs and Services and click on "OAuth consent screen". Here is a link to the page. It will open the Oauth Overview page where we click Get Started.

On the next page we provide the application name, user support email and click Next.

On the next screen we choose Internal for our application and click Next again.

Then again we provide contact email and click Next

Then we agree with Google API services policies and push the Create button.

It will lead us to the page where we can create an OAuth client.

On the screen we choose "Web Application" from the dropdown menu, put "Cymbal Air" as application and push the Add URI button.

The URIs represent trusted sources for the application and they depend on where you are trying to reach the application from. We put "http://localhost:8081" as authorized URI and "http://localhost:8081/login/google" as redirect URI. Those values would work if you put in your browser "http://localhost:8081" as a URI for connection. For example, when you connect through an SSH tunnel from your computer for example. I will show you how to do it later.

After pushing the "Create" button you get a popup window with your clients credentials. And the credentials will be recorded in the system. You always can copy the client ID to be used when you start your application.

Later you will see where you provide that client ID.

Create Service Account

We need a dedicated service account for our Cloud Run service with all required privileges. For our service we need access to AlloyDB and Cloud Secret Manager. As for the name for the service account we are going to use toolbox-identity.

Open another Cloud Shell tab using the sign "+" at the top.

In the new cloud shell tab execute:

export PROJECT_ID=$(gcloud config get-value project)
gcloud iam service-accounts create toolbox-identity

gcloud projects add-iam-policy-binding $PROJECT_ID \
  --member="serviceAccount:toolbox-identity@$PROJECT_ID.iam.gserviceaccount.com" \
  --role="roles/alloydb.client"
gcloud projects add-iam-policy-binding $PROJECT_ID \
  --member="serviceAccount:toolbox-identity@$PROJECT_ID.iam.gserviceaccount.com" \
  --role="roles/serviceusage.serviceUsageConsumer"
gcloud projects add-iam-policy-binding $PROJECT_ID \
  --member="serviceAccount:toolbox-identity@$PROJECT_ID.iam.gserviceaccount.com" \
  --role="roles/secretmanager.secretAccessor"

Please pay attention if you have any errors. The command is supposed to create a service account for cloud run service and grant privileges to work with secret manager, database and Vertex AI.

Close the tab by either pressing ctrl+d or executing command "exit" in the tab:

exit

Prepare MCP Toolbox Configuration

Prepare configuration file for the MCP Toolbox. You can read about all configuration options in the documentation but here we are going to use the sample tools.yaml file and replace some values such as cluster and instance name, AlloyDB password and the project id by our actual values.

Export AlloyDB Password:

export PGPASSWORD=<noted AlloyDB password>

Export client ID we prepared in the previous step:

export CLIENT_ID=<noted OAuth 2.0 client ID for our application>

Prepare configuration file.

PROJECT_ID=$(gcloud config get-value project)
ADBCLUSTER=alloydb-aip-01
sed -e "s/project: retrieval-app-testing/project: $(gcloud config get-value project)/g" \
-e "s/cluster: my-alloydb-cluster/cluster: $ADBCLUSTER/g" \
-e "s/instance: my-alloydb-instance/instance: $ADBCLUSTER-pr/g" \
-e "s/password: postgres/password: $PGPASSWORD\\n    ipType: private/g" \
-e "s/^ *clientId: .*/    clientId: $CLIENT_ID/g" \
cymbal-air-toolbox-demo/tools.yaml >~/tools.yaml

If you look into the file section defining the target data source you will see that we also added a line to use private IP for connection.

sources:
  my-pg-instance:
    kind: alloydb-postgres
    project: gleb-test-short-003-471020
    region: us-central1
    cluster: alloydb-aip-01
    instance: alloydb-aip-01-pr
    database: assistantdemo
    user: postgres
    password: L23F...
    ipType: private
authServices:
  my_google_service:
    kind: google
    clientId: 96828*******-***********.apps.googleusercontent.com

Create a secret using the tools.yaml configuration as a source.

In the VM ssh console execute:

gcloud secrets create tools --data-file=tools.yaml

Expected console output:

student@instance-1:~$ gcloud secrets create tools --data-file=tools.yaml
Created version [1] of the secret [tools].

Deploy the MCP Toolbox as a Cloud Run Service

Now everything is ready to deploy the MCP Toolbox as a service to Cloud Run. For local testing you can run "./toolbox –tools-file=./tools.yaml" but if we want our application to run in the cloud the deployment in Cloud Run makes much more sense.

In the VM SSH session execute:

export IMAGE=us-central1-docker.pkg.dev/database-toolbox/toolbox/toolbox:latest
gcloud run deploy toolbox \
    --image $IMAGE \
    --service-account toolbox-identity \
    --region us-central1 \
    --set-secrets "/app/tools.yaml=tools:latest" \
    --args="--tools-file=/app/tools.yaml","--address=0.0.0.0","--port=8080" \
    --network default \
    --subnet default \
    --no-allow-unauthenticated

Expected console output:

student@instance-1:~$ export IMAGE=us-central1-docker.pkg.dev/database-toolbox/toolbox/toolbox:latest
gcloud run deploy toolbox \
    --image $IMAGE \
    --service-account toolbox-identity \
    --region us-central1 \
    --set-secrets "/app/tools.yaml=tools:latest" \
    --args="--tools-file=/app/tools.yaml","--address=0.0.0.0","--port=8080" \
    --network default \
    --subnet default \
    --no-allow-unauthenticated
Deploying container to Cloud Run service [toolbox] in project [gleb-test-short-002-470613] region [us-central1]
✓ Deploying new service... Done.                                                                                                                                                                                                
  ✓ Creating Revision...                                                                                                                                                                                                        
  ✓ Routing traffic...                                                                                                                                                                                                          
Done.                                                                                                                                                                                                                           
Service [toolbox] revision [toolbox-00001-l9c] has been deployed and is serving 100 percent of traffic.
Service URL: https://toolbox-868691532292.us-central1.run.app

student@instance-1:~$

Verify The Service

Now we can check if the service is up and we can access the endpoint. We use gcloud utility to get the retrieval service endpoint and the authentication token. Alternatively you can check the service URI in the cloud console.

You can copy the value and replace in the curl command the "$(gcloud run services list –filter="(toolbox)" –format="value(URL)" part .

Here is how to get the URL dynamically from the command line:

curl -H "Authorization: Bearer $(gcloud auth print-identity-token)" $(gcloud  run services list --filter="(toolbox)" --format="value(URL)")

Expected console output:

student@instance-1:~$ curl -H "Authorization: Bearer $(gcloud auth print-identity-token)" $(gcloud  run services list --filter="(toolbox)" --format="value(URL)")
🧰 Hello, World! 🧰student@instance-1:~$

If we see the "Hello World" message it means our service is up and serving the requests.

8. Deploy Sample Application

Now when we have the retrieval service up and running we can deploy a sample application. The application represents an online airport assistant which can give you information about flights, airports and even book a flight based on the flights and airport data from our database.

The application can be deployed locally, on a VM in the cloud or any other service like Cloud Run or Kubernetes. Here we are going to show how to deploy it on the VM first.

Prepare the environment

We continue to work on our VM using the same SSH session. To run our application we need some Python modules and we have already added them when we initiated our database earlier. Let's switch to our Python virtual environment and change our location to the app directory.

In the VM SSH session execute:

source ~/.venv/bin/activate
cd cymbal-air-toolbox-demo

Expected output (redacted):

student@instance-1:~$ source ~/.venv/bin/activate
cd cymbal-air-toolbox-demo
(.venv) student@instance-1:~/cymbal-air-toolbox-demo$

Run Assistant Application

Before starting the application we need to set up some environment variables. The basic functionality of the application such as query flights and airport amenities requires only TOOLBOX_URL which points application to the retrieval service. We can get it using the gcloud command .

In the VM SSH session execute:

export TOOLBOX_URL=$(gcloud  run services list --filter="(toolbox)" --format="value(URL)")

Expected output (redacted):

student@instance-1:~/cymbal-air-toolbox-demo$ export BASE_URL=$(gcloud  run services list --filter="(toolbox)" --format="value(URL)")

To use more advanced capabilities of the application like booking and changing flights we need to sign-in to the application using our Google account and for that purpose we need to provide CLIENT_ID environment variable using the OAuth client ID from the Prepare Client ID chapter:

export CLIENT_ID=215....apps.googleusercontent.com

Expected output (redacted):

student@instance-1:~/cymbal-air-toolbox-demo$ export CLIENT_ID=215....apps.googleusercontent.com

And now we can run our application:

python run_app.py

Expected output:

student@instance-1:~/cymbal-air-toolbox-demo/llm_demo$ python run_app.py
INFO:     Started server process [2900]
INFO:     Waiting for application startup.
Loading application...
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:8081 (Press CTRL+C to quit)

Connect to the Application

You have several ways to connect to the application running on the VM. For example you can open port 8081 on the VM using firewall rules in the VPC or create a load balancer with public IP. Here we are going to use a SSH tunnel to the VM translating the local port 8080 to the VM port 8081.

Connecting From Local Machine

When we want to connect from a local machine we need to run a SSH tunnel. It can be done using gcloud compute ssh:

gcloud compute ssh instance-1 --zone=us-central1-a -- -L 8081:localhost:8081

Expected output:

student-macbookpro:~ student$ gcloud compute ssh instance-1 --zone=us-central1-a -- -L 8080:localhost:8081
Warning: Permanently added 'compute.7064281075337367021' (ED25519) to the list of known hosts.
Linux instance-1.us-central1-c.c.gleb-test-001.internal 6.1.0-21-cloud-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.90-1 (2024-05-03) x86_64

The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
student@instance-1:~$

Now we can open the browser and use http://localhost:8081 to connect to our application. We should see the application screen.

Connecting From Cloud Shell

Alternatively we can use Google Cloud Shell to connect. Open another Cloud Shell tab using the sign "+" at the top.

In the new tab get the origin and redirect URI for your web client executing the gcloud command:

echo "origin:"; echo "https://8080-$WEB_HOST"; echo "redirect:"; echo "https://8080-$WEB_HOST/login/google"

Here is the expected output:

student@cloudshell:~ echo "origin:"; echo "https://8080-$WEB_HOST"; echo "redirect:"; echo "https://8080-$WEB_HOST/login/google"
origin:
https://8080-cs-35704030349-default.cs-us-east1-rtep.cloudshell.dev
redirect:
https://8080-cs-35704030349-default.cs-us-east1-rtep.cloudshell.dev/login/google

And use the origin and the redirect of URIs as the "Authorized JavaScript origins" and "Authorized redirect URIs" for our credentials created in the "Prepare Client ID" chapter replacing or adding to the originally provided http://localhost:8080 values.

Click on "Cymbal Air" on the OAuth 2.0 client IDs page.

Put the origin and redirect URIs for the Cloud Shell and push the Save button.

In the new cloud shell tab start the tunnel to your VM by executing the gcloud command:

gcloud compute ssh instance-1 --zone=us-central1-a -- -L 8080:localhost:8081

If it will show an error "Cannot assign requested address" - please ignore it.

Here is the expected output:

student@cloudshell:~ gcloud compute ssh instance-1 --zone=us-central1-a -- -L 8080:localhost:8081
bind [::1]:8081: Cannot assign requested address
inux instance-1.us-central1-a.c.gleb-codelive-01.internal 6.1.0-21-cloud-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.90-1 (2024-05-03) x86_64

The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
Last login: Sat May 25 19:15:46 2024 from 35.243.235.73
student@instance-1:~$

It opens port 8080 on your cloud shell which can be used for the "Web preview".

Click on the "Web preview" button on the right top of your Cloud Shell and from the drop down menu choose "Preview on port 8080"

It opens a new tab in your web browser with the application interface. You should be able to see the "Cymbal Air Customer Service Assistant" page.

Sign into the Application

When everything is set up and your application is open we can use the "Sign in" button at the top right of our application screen to provide our credentials. That is optional and required only if you want to try booking functionality of the application.

It will open a pop-up window where we can choose our credentials.

After signing in the application is ready and you can start to post your requests into the field at the bottom of the window.

This demo showcases the Cymbal Air customer service assistant. Cymbal Air is a fictional passenger airline. The assistant is an AI chatbot that helps travelers to manage flights and look up information about Cymbal Air's hub at San Francisco International Airport (SFO).

Without signing in (without CLIENT_ID) it can help answer users questions like:

When is the next flight to Denver?

Are there any luxury shops around gate C28?

Where can I get coffee near gate A6?

Where can I buy a gift?

Please find a flight from SFO to Denver departing today

When you are signed in to the application you can try other capabilities like booking flights or check if the seat assigned to you is a window or aisle seat.

The application uses the latest Google foundation models to generate responses and augment it by information about flights and amenities from the operational AlloyDB database. You can read more about this demo application on the Github page of the project.

9. Clean up environment

Now when all tasks are completed we can clean up our environment

Delete Cloud Run Service

In Cloud Shell execute:

gcloud run services delete toolbox --region us-central1

Expected console output:

student@cloudshell:~ (gleb-test-short-004)$ gcloud run services delete retrieval-service --region us-central1
Service [retrieval-service] will be deleted.

Do you want to continue (Y/n)?  Y

Deleting [retrieval-service]...done.                                                                                                                                                                                                                 
Deleted service [retrieval-service].

Delete the Service Account for cloud run service

In Cloud Shell execute:

PROJECT_ID=$(gcloud config get-value project)
gcloud iam service-accounts delete toolbox-identity@$PROJECT_ID.iam.gserviceaccount.com --quiet

Expected console output:

student@cloudshell:~ (gleb-test-short-004)$ PROJECT_ID=$(gcloud config get-value project)
Your active configuration is: [cloudshell-222]
student@cloudshell:~ (gleb-test-short-004)$ gcloud iam service-accounts delete retrieval-identity@$PROJECT_ID.iam.gserviceaccount.com --quiet
deleted service account [retrieval-identity@gleb-test-short-004.iam.gserviceaccount.com]
student@cloudshell:~ (gleb-test-short-004)$

Destroy the AlloyDB instances and cluster when you are done with the lab.

Delete AlloyDB cluster and all instances

If you've used the trial version of AlloyDB. Do not delete the trial cluster if you have plans to test other labs and resources using the trial cluster. You will not be able to create another trial cluster in the same project.

The cluster is destroyed with option force which also deletes all the instances belonging to the cluster.

In the cloud shell define the project and environment variables if you've been disconnected and all the previous settings are lost:

gcloud config set project <your project id>

export REGION=us-central1
export ADBCLUSTER=alloydb-aip-01
export PROJECT_ID=$(gcloud config get-value project)

Delete the cluster:

gcloud alloydb clusters delete $ADBCLUSTER --region=$REGION --force

📝 Note: The command takes 3-5 minutes to execute

Expected console output:

student@cloudshell:~ (test-project-001-402417)$ gcloud alloydb clusters delete $ADBCLUSTER --region=$REGION --force

All of the cluster data will be lost when the cluster is deleted.

Do you want to continue (Y/n)?  Y

Operation ID: operation-1697820178429-6082890a0b570-4a72f7e4-4c5df36f
Deleting cluster...done.

Delete AlloyDB Backups

Delete all AlloyDB backups for the cluster:

📝 Note: The command will destroy all data backups for the cluster with name specified in environment variable

for i in $(gcloud alloydb backups list --filter="CLUSTER_NAME: projects/$PROJECT_ID/locations/$REGION/clusters/$ADBCLUSTER" --format="value(name)" --sort-by=~createTime) ; do gcloud alloydb backups delete $(basename $i) --region $REGION --quiet; done

Expected console output:

student@cloudshell:~ (test-project-001-402417)$ for i in $(gcloud alloydb backups list --filter="CLUSTER_NAME: projects/$PROJECT_ID/locations/$REGION/clusters/$ADBCLUSTER" --format="value(name)" --sort-by=~createTime) ; do gcloud alloydb backups delete $(basename $i) --region $REGION --quiet; done
Operation ID: operation-1697826266108-60829fb7b5258-7f99dc0b-99f3c35f
Deleting backup...done.

Now we can destroy our VM

Delete GCE VM

In Cloud Shell execute:

export GCEVM=instance-1
export ZONE=us-central1-a
gcloud compute instances delete $GCEVM \
    --zone=$ZONE \
    --quiet

Expected console output:

student@cloudshell:~ (test-project-001-402417)$ export GCEVM=instance-1
export ZONE=us-central1-a
gcloud compute instances delete $GCEVM \
    --zone=$ZONE \
    --quiet
Deleted

Delete the Service Account for GCE VM and The Retrieval service

In Cloud Shell execute:

PROJECT_ID=$(gcloud config get-value project)
gcloud iam service-accounts delete compute-aip@$PROJECT_ID.iam.gserviceaccount.com --quiet

Expected console output:

student@cloudshell:~ (gleb-test-short-004)$ PROJECT_ID=$(gcloud config get-value project)
gcloud iam service-accounts delete compute-aip@$PROJECT_ID.iam.gserviceaccount.com --quiet
Your active configuration is: [cloudshell-222]
deleted service account [compute-aip@gleb-test-short-004.iam.gserviceaccount.com]
student@cloudshell:~ (gleb-test-short-004)$

10. Congratulations

Congratulations for completing the codelab.

What we've covered

✅ How to deploy AlloyDB Cluster
✅ How to connect to the AlloyDB
✅ How to configure and deploy MCP Toolbox Service
✅ How to deploy a sample application using the deployed service

AlloyDB Agentic RAG Application with MCP Toolbox [Part 1]

Gleb Otochkin — Tue, 25 Nov 2025 21:17:28 +0000

1. Introduction

In this codelab, you will learn how to create an AlloyDB cluster, deploy the MCP toolbox, and configure it to use AlloyDB as a data source. You'll then build a sample interactive RAG application that uses the deployed toolbox to ground its requests.

You can get more information about the MCP Toolbox on the documentation page and the sample Cymbal Air application here.

This lab is part of a lab collection dedicated to AlloyDB AI features. You can read more on the AlloyDB AI page in documentation and see other labs.

Prerequisites

A basic understanding of the Google Cloud Console
Basic skills in command line interface and Google Cloud shell

What you'll learn

✅ How to deploy AlloyDB Cluster with Vertex AI integration
✅ How to connect to the AlloyDB
✅ How to configure and deploy MCP Tooolbox Service
✅ How to deploy a sample application using the deployed service

What you'll need

A Google Cloud Account and Google Cloud Project
A web browser such as Chrome

2. Setup and Requirements

Self-paced environment setup

Sign-in to the Google Cloud Console and create a new project or reuse an existing one. If you don't already have a Gmail or Google Workspace account, you must create one.

The Project name is the display name for this project's participants. It is a character string not used by Google APIs. You can always update it.
The Project ID is unique across all Google Cloud projects and is immutable (cannot be changed after it has been set). The Cloud Console auto-generates a unique string; usually you don't care what it is. In most codelabs, you'll need to reference your Project ID (typically identified as PROJECT_ID). If you don't like the generated ID, you might generate another random one. Alternatively, you can try your own, and see if it's available. It can't be changed after this step and remains for the duration of the project.
For your information, there is a third value, a Project Number, which some APIs use. Learn more about all three of these values in the documentation.

⚠️ Caution: A project ID is globally unique and can't be used by anyone else after you've selected it. You are the only user of that ID. Even if a project is deleted, the ID can't be used again

📝 Note: If you use a Gmail account, you can leave the default location set to No organization. If you use a Google Workspace

Next, you'll need to enable billing in the Cloud Console to use Cloud resources/APIs. Running through this codelab won't cost much, if anything at all. To shut down resources to avoid incurring billing beyond this tutorial, you can delete the resources you created or delete the project. New Google Cloud users are eligible for the $300 USD Free Trial program.

Start Cloud Shell

While Google Cloud can be operated remotely from your laptop, in this codelab you will be using Google Cloud Shell, a command line environment running in the Cloud.

From the Google Cloud Console, click the Cloud Shell icon on the top right toolbar:

It should only take a few moments to provision and connect to the environment. When it is finished, you should see something like this:

This virtual machine is loaded with all the development tools you'll need. It offers a persistent 5GB home directory, and runs on Google Cloud, greatly enhancing network performance and authentication. All of your work in this codelab can be done within a browser. You do not need to install anything.

3. Before you begin

Enable API

Output:

Please be aware that some resources you enable are going to incur some cost if you are not using the promotional tier. In normal circumstances if all the resources are destroyed upon completion of the lab the cost of all resources would not exceed $5. We recommend checking your billing and making sure the exercise is acceptable for you.

Inside Cloud Shell, make sure that your project ID is setup:

Usually the project ID is shown in parentheses in the command prompt in the cloud shell as it is shown in the picture:

gcloud config set project [YOUR-PROJECT-ID]

Then set the PROJECT_ID environment variable to your Google Cloud project ID:

PROJECT_ID=$(gcloud config get-value project)

Enable all necessary services:

gcloud services enable alloydb.googleapis.com \
                       compute.googleapis.com \
                       cloudresourcemanager.googleapis.com \
                       servicenetworking.googleapis.com \
                       vpcaccess.googleapis.com \
                       aiplatform.googleapis.com \
                       cloudbuild.googleapis.com \
                       artifactregistry.googleapis.com \
                       run.googleapis.com \
                       iam.googleapis.com \
                       secretmanager.googleapis.com

Expected output

student@cloudshell:~ (gleb-test-short-004)$ gcloud services enable alloydb.googleapis.com \
                       compute.googleapis.com \
                       cloudresourcemanager.googleapis.com \
                       servicenetworking.googleapis.com \
                       vpcaccess.googleapis.com \
                       aiplatform.googleapis.com \
                       cloudbuild.googleapis.com \
                       artifactregistry.googleapis.com \
                       run.googleapis.com \
                       iam.googleapis.com \
                       secretmanager.googleapis.com
Operation "operations/acf.p2-404051529011-664c71ad-cb2b-4ab4-86c1-1f3157d70ba1" finished successfully.

4. Deploy AlloyDB Cluster

Create AlloyDB cluster and primary instance. The following procedure describes how to create an AlloyDB cluster and instance using Google Cloud SDK. If you prefer the console approach you can follow the documentation here.

Before creating an AlloyDB cluster we need an available private IP range in our VPC to be used by the future AlloyDB instance. If we don't have it then we need to create it, assign it to be used by internal Google services and after that we will be able to create the cluster and instance.

Create private IP range

📝 Note: This step is required only if you don't already have an unused private IP range assigned to work with Google internal services.

We need to configure Private Service Access configuration in our VPC for AlloyDB. The assumption here is that we have the "default" VPC network in the project and it is going to be used for all actions.

Create the private IP range:

gcloud compute addresses create psa-range \
    --global \
    --purpose=VPC_PEERING \
    --prefix-length=24 \
    --description="VPC private service access" \
    --network=default

Create private connection using the allocated IP range:

gcloud services vpc-peerings connect \
    --service=servicenetworking.googleapis.com \
    --ranges=psa-range \
    --network=default

📝 Note: The second command takes a couple of minutes to execute

Expected console output:

student@cloudshell:~ (test-project-402417)$ gcloud compute addresses create psa-range \
    --global \
    --purpose=VPC_PEERING \
    --prefix-length=24 \
    --description="VPC private service access" \
    --network=default
Created [https://www.googleapis.com/compute/v1/projects/test-project-402417/global/addresses/psa-range].

student@cloudshell:~ (test-project-402417)$ gcloud services vpc-peerings connect \
    --service=servicenetworking.googleapis.com \
    --ranges=psa-range \
    --network=default
Operation "operations/pssn.p24-4470404856-595e209f-19b7-4669-8a71-cbd45de8ba66" finished successfully.

student@cloudshell:~ (test-project-402417)$

Create AlloyDB Cluster

In this section we are creating an AlloyDB cluster in the us-central1 region.

Define password for the postgres user. You can define your own password or use a random function to generate one

export PGPASSWORD=`openssl rand -hex 12`

Expected console output:

student@cloudshell:~ (test-project-402417)$ export PGPASSWORD=`openssl rand -hex 12`

Note the PostgreSQL password for future use.

echo $PGPASSWORD

You will need that password in the future to connect to the instance as the postgres user. I suggest writing it down or copying it somewhere to be able to use later.

Expected console output:

student@cloudshell:~ (test-project-402417)$ echo $PGPASSWORD
bbefbfde7601985b0dee5723

Create a Free Trial Cluster

If you haven't been using AlloyDB before you can create a free trial cluster:

Define region and AlloyDB cluster name. We are going to use us-central1 region and alloydb-aip-01 as a cluster name:

export REGION=us-central1
export ADBCLUSTER=alloydb-aip-01

Run command to create the cluster:

gcloud alloydb clusters create $ADBCLUSTER \
    --password=$PGPASSWORD \
    --network=default \
    --region=$REGION \
    --subscription-type=TRIAL

Expected console output:

export REGION=us-central1
export ADBCLUSTER=alloydb-aip-01
gcloud alloydb clusters create $ADBCLUSTER \
    --password=$PGPASSWORD \
    --network=default \
    --region=$REGION \
    --subscription-type=TRIAL
Operation ID: operation-1697655441138-6080235852277-9e7f04f5-2012fce4
Creating cluster...done.

Create an AlloyDB primary instance for our cluster in the same cloud shell session. If you are disconnected you will need to define the region and cluster name environment variables again.

📝 Note: The instance creation usually takes 6-10 minutes to complete

gcloud alloydb instances create $ADBCLUSTER-pr \
    --instance-type=PRIMARY \
    --cpu-count=8 \
    --region=$REGION \
    --cluster=$ADBCLUSTER

Expected console output:

student@cloudshell:~ (test-project-402417)$ gcloud alloydb instances create $ADBCLUSTER-pr \
    --instance-type=PRIMARY \
    --cpu-count=8 \
    --region=$REGION \
    --availability-type ZONAL \
    --cluster=$ADBCLUSTER
Operation ID: operation-1697659203545-6080315c6e8ee-391805db-25852721
Creating instance...done.

Create AlloyDB Standard Cluster

If it is not your first AlloyDB cluster in the project proceed with creation of a standard cluster.

Define region and AlloyDB cluster name. We are going to use us-central1 region and alloydb-aip-01 as a cluster name:

export REGION=us-central1
export ADBCLUSTER=alloydb-aip-01

Run command to create the cluster:

gcloud alloydb clusters create $ADBCLUSTER \
    --password=$PGPASSWORD \
    --network=default \
    --region=$REGION

Expected console output:

export REGION=us-central1
export ADBCLUSTER=alloydb-aip-01
gcloud alloydb clusters create $ADBCLUSTER \
    --password=$PGPASSWORD \
    --network=default \
    --region=$REGION 
Operation ID: operation-1697655441138-6080235852277-9e7f04f5-2012fce4
Creating cluster...done.

Create an AlloyDB primary instance for our cluster in the same cloud shell session. If you are disconnected you will need to define the region and cluster name environment variables again.

📝 Note: The instance creation usually takes 6-10 minutes to complete

gcloud alloydb instances create $ADBCLUSTER-pr \
    --instance-type=PRIMARY \
    --cpu-count=2 \
    --region=$REGION \
    --cluster=$ADBCLUSTER

Expected console output:

student@cloudshell:~ (test-project-402417)$ gcloud alloydb instances create $ADBCLUSTER-pr \
    --instance-type=PRIMARY \
    --cpu-count=2 \
    --region=$REGION \
    --availability-type ZONAL \
    --cluster=$ADBCLUSTER
Operation ID: operation-1697659203545-6080315c6e8ee-391805db-25852721
Creating instance...done.

Grant Necessary Permissions to AlloyDB

Add Vertex AI permissions to the AlloyDB service agent.

Open another Cloud Shell tab using the sign "+" at the top.

In the new cloud shell tab execute:

PROJECT_ID=$(gcloud config get-value project)
gcloud projects add-iam-policy-binding $PROJECT_ID \
  --member="serviceAccount:service-$(gcloud projects describe $PROJECT_ID --format="value(projectNumber)")@gcp-sa-alloydb.iam.gserviceaccount.com" \
  --role="roles/aiplatform.user"

Expected console output:

student@cloudshell:~ (test-project-001-402417)$ PROJECT_ID=$(gcloud config get-value project)
Your active configuration is: [cloudshell-11039]
student@cloudshell:~ (test-project-001-402417)$ gcloud projects add-iam-policy-binding $PROJECT_ID \
  --member="serviceAccount:service-$(gcloud projects describe $PROJECT_ID --format="value(projectNumber)")@gcp-sa-alloydb.iam.gserviceaccount.com" \
  --role="roles/aiplatform.user"
Updated IAM policy for project [test-project-001-402417].
bindings:
- members:
  - serviceAccount:service-4470404856@gcp-sa-alloydb.iam.gserviceaccount.com
  role: roles/aiplatform.user
- members:
...
etag: BwYIEbe_Z3U=
version: 1

Close the tab by either execution command "exit" in the tab:

exit

5. Prepare GCE Virtual Machine

We are going to use a Google Compute Engine (GCE) VM as our platform to work with the database and deploy different parts of the sample application. Using a VM gives us more flexibility in installed components and direct access to the private AlloyDB IP for data preparation steps.

Create Service Account

Since we will use our VM to deploy the MCP Toolbox as a service and deploy or host the sample application, the first step is to create a Google Service Account (GSA). The GSA will be used by the GCE VM, and we will need to grant it the necessary privileges to work with other services.

In the Cloud Shell execute:

PROJECT_ID=$(gcloud config get-value project)
gcloud iam service-accounts create compute-aip --project $PROJECT_ID

gcloud projects add-iam-policy-binding $PROJECT_ID \
  --member="serviceAccount:compute-aip@$PROJECT_ID.iam.gserviceaccount.com" \
  --role="roles/cloudbuild.builds.editor"

gcloud projects add-iam-policy-binding $PROJECT_ID \
  --member="serviceAccount:compute-aip@$PROJECT_ID.iam.gserviceaccount.com" \
  --role="roles/artifactregistry.admin"

gcloud projects add-iam-policy-binding $PROJECT_ID \
  --member="serviceAccount:compute-aip@$PROJECT_ID.iam.gserviceaccount.com" \
  --role="roles/storage.admin"

gcloud projects add-iam-policy-binding $PROJECT_ID \
  --member="serviceAccount:compute-aip@$PROJECT_ID.iam.gserviceaccount.com" \
  --role="roles/run.admin"

gcloud projects add-iam-policy-binding $PROJECT_ID \
  --member="serviceAccount:compute-aip@$PROJECT_ID.iam.gserviceaccount.com" \
  --role="roles/iam.serviceAccountUser"

gcloud projects add-iam-policy-binding $PROJECT_ID \
  --member="serviceAccount:compute-aip@$PROJECT_ID.iam.gserviceaccount.com" \
  --role="roles/alloydb.viewer"

gcloud projects add-iam-policy-binding $PROJECT_ID \
  --member="serviceAccount:compute-aip@$PROJECT_ID.iam.gserviceaccount.com" \
  --role="roles/alloydb.client"

gcloud projects add-iam-policy-binding $PROJECT_ID \
  --member="serviceAccount:compute-aip@$PROJECT_ID.iam.gserviceaccount.com" \
  --role="roles/aiplatform.user"

gcloud projects add-iam-policy-binding $PROJECT_ID \
  --member="serviceAccount:compute-aip@$PROJECT_ID.iam.gserviceaccount.com" \
  --role="roles/serviceusage.serviceUsageConsumer"

gcloud projects add-iam-policy-binding $PROJECT_ID \
    --member serviceAccount:compute-aip@$PROJECT_ID.iam.gserviceaccount.com \
    --role roles/secretmanager.admin
Deploy GCE VM
Create a GCE VM in the same region and VPC as the AlloyDB cluster.

In Cloud Shell execute:

ZONE=us-central1-a
PROJECT_ID=$(gcloud config get-value project)
gcloud compute instances create instance-1 \
    --zone=$ZONE \
    --create-disk=auto-delete=yes,boot=yes,image=projects/debian-cloud/global/images/$(gcloud compute images list --filter="family=debian-12 AND family!=debian-12-arm64" --format="value(name)") \
    --scopes=https://www.googleapis.com/auth/cloud-platform \
    --service-account=compute-aip@$PROJECT_ID.iam.gserviceaccount.com

Expected console output:

student@cloudshell:~ (test-project-402417)$ ZONE=us-central1-a
PROJECT_ID=$(gcloud config get-value project)
gcloud compute instances create instance-1 \
    --zone=$ZONE \
    --create-disk=auto-delete=yes,boot=yes,image=projects/debian-cloud/global/images/$(gcloud compute images list --filter="family=debian-12 AND family!=debian-12-arm64" --format="value(name)") \
    --scopes=https://www.googleapis.com/auth/cloud-platform \
    --service-account=compute-aip@$PROJECT_ID.iam.gserviceaccount.com
Your active configuration is: [cloudshell-10282]
Created [https://www.googleapis.com/compute/v1/projects/gleb-test-short-002-470613/zones/us-central1-a/instances/instance-1].
NAME: instance-1
ZONE: us-central1-a
MACHINE_TYPE: n1-standard-1
PREEMPTIBLE: 
INTERNAL_IP: 10.128.0.2
EXTERNAL_IP: 34.28.55.32
STATUS: RUNNING

Install Postgres Client

Install the PostgreSQL client software on the deployed VM

Connect to the VM:

🗒️ Note: First time the SSH connection to the VM can take longer since the process includes creation of RSA key for secure connection and propagating the public part of the key to the project

gcloud compute ssh instance-1 --zone=us-central1-a

Expected console output:

student@cloudshell:~ (test-project-402417)$ gcloud compute ssh instance-1 --zone=us-central1-a
Updating project ssh metadata...working..Updated [https://www.googleapis.com/compute/v1/projects/test-project-402417].                                                                                                                                                         
Updating project ssh metadata...done.                                                                                                                                                                                                                                              
Waiting for SSH key to propagate.
Warning: Permanently added 'compute.5110295539541121102' (ECDSA) to the list of known hosts.
Linux instance-1 5.10.0-26-cloud-amd64 #1 SMP Debian 5.10.197-1 (2023-09-29) x86_64

The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
student@instance-1:~$

Install the software running command inside the VM:

sudo apt-get update
sudo apt-get install --yes postgresql-client

Expected console output:

student@instance-1:~$ sudo apt-get update
sudo apt-get install --yes postgresql-client
Get:1 file:/etc/apt/mirrors/debian.list Mirrorlist [30 B]
Get:4 file:/etc/apt/mirrors/debian-security.list Mirrorlist [39 B]
Hit:7 https://packages.cloud.google.com/apt google-compute-engine-bookworm-stable InRelease
Get:8 https://packages.cloud.google.com/apt cloud-sdk-bookworm InRelease [1652 B]
Get:2 https://deb.debian.org/debian bookworm InRelease [151 kB]
Get:3 https://deb.debian.org/debian bookworm-updates InRelease [55.4 kB]
...redacted...
update-alternatives: using /usr/share/postgresql/15/man/man1/psql.1.gz to provide /usr/share/man/man1/psql.1.gz (psql.1.gz) in auto mode
Setting up postgresql-client (15+248) ...
Processing triggers for man-db (2.11.2-2) ...
Processing triggers for libc-bin (2.36-9+deb12u7) ...

Connect to the AlloyDB Instance

Connect to the primary instance from the VM using psql.

Continue with the opened SSH session to your VM. If you have been disconnected then connect again using the same command as above.

Use the previously noted $PGASSWORD and the cluster name to connect to AlloyDB from the GCE VM:

export PGPASSWORD=<Noted password>

PROJECT_ID=$(gcloud config get-value project)
REGION=us-central1
ADBCLUSTER=alloydb-aip-01
INSTANCE_IP=$(gcloud alloydb instances describe $ADBCLUSTER-pr --cluster=$ADBCLUSTER --region=$REGION --format="value(ipAddress)")
psql "host=$INSTANCE_IP user=postgres sslmode=require"

Expected console output:

student@instance-1:~$ PROJECT_ID=$(gcloud config get-value project)
REGION=us-central1
ADBCLUSTER=alloydb-aip-01
INSTANCE_IP=$(gcloud alloydb instances describe $ADBCLUSTER-pr --cluster=$ADBCLUSTER --region=$REGION --format="value(ipAddress)")
psql "host=$INSTANCE_IP user=postgres sslmode=require"
psql (15.13 (Debian 15.13-0+deb12u1), server 16.8)
WARNING: psql major version 15, server major version 16.
         Some psql features might not work.
SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, compression: off)
Type "help" for help.

postgres=>

Exit from the psql session keeping the SSH connection up:

exit

Expected console output:

postgres=> exit
student@instance-1:~$

6. Initialize the database

We are going to use our client VM as a platform to populate our database with data and host our application. The first step is to create a database and populate it with data.

Create Database

Create a database with the name "assistantdemo".

In the GCE VM session execute:

📝 Note: If your SSH session was terminated you need to reset your environment variables such as:

export PGPASSWORD=

export REGION=us-central1

export ADBCLUSTER=alloydb-aip-01

export INSTANCE_IP=$(gcloud alloydb instances describe $ADBCLUSTER-pr --cluster=$ADBCLUSTER --region=$REGION --format="value(ipAddress)")

psql "host=$INSTANCE_IP user=postgres" -c "CREATE DATABASE assistantdemo"

Expected console output:

student@instance-1:~$ psql "host=$INSTANCE_IP user=postgres" -c "CREATE DATABASE assistantdemo"
CREATE DATABASE
student@instance-1:~$

Prepare Python Environment

To continue we are going to use prepared Python scripts from GitHub repository but before doing that we need to install the required software.

In the GCE VM execute:

sudo apt install -y python3.11-venv git
python3 -m venv .venv
source .venv/bin/activate
pip install --upgrade pip

Expected console output:

student@instance-1:~$ sudo apt install -y python3.11-venv git
python3 -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following additional packages will be installed:
  git-man liberror-perl patch python3-distutils python3-lib2to3 python3-pip-whl python3-setuptools-whl
Suggested packages:
  git-daemon-run | git-daemon-sysvinit git-doc git-email git-gui gitk gitweb git-cvs git-mediawiki git-svn ed diffutils-doc
The following NEW packages will be installed:
  git git-man liberror-perl patch python3-distutils python3-lib2to3 python3-pip-whl python3-setuptools-whl python3.11-venv
0 upgraded, 9 newly installed, 0 to remove and 2 not upgraded.
Need to get 12.4 MB of archives.
After this operation, 52.2 MB of additional disk space will be used.
Get:1 file:/etc/apt/mirrors/debian.list Mirrorlist [30 B]
...redacted...
Installing collected packages: pip
  Attempting uninstall: pip
    Found existing installation: pip 23.0.1
    Uninstalling pip-23.0.1:
      Successfully uninstalled pip-23.0.1
Successfully installed pip-24.0
(.venv) student@instance-1:~$

Verify Python version.

In the GCE VM execute:

python -V

Expected console output:

(.venv) student@instance-1:~$ python -V
Python 3.11.2
(.venv) student@instance-1:~$

Install MCP Toolbox Locally

MCP Toolbox for Databases (later in the text MCP toolbox or toolbox) is an open source MCP server working with different data sources. It helps you to develop tools faster by providing a level of abstraction for different data sources and adding features like authentication and connection pooling. You can read about all the features on the official page.

We are going to use the MCP toolbox to initiate our sample dataset and later to be used as MCP server to handle data source requests from our application during Retrieval Augmented Generation (RAG) flow.

Let's install the MCP toolbox locally to populate the assistantdemo database.

In the GCE VM execute:

export VERSION=0.16.0
curl -O https://storage.googleapis.com/genai-toolbox/v$VERSION/linux/amd64/toolbox
chmod +x toolbox

Expected console output:

(.venv) student@instance-1:~$ export VERSION=0.16.0
curl -O https://storage.googleapis.com/genai-toolbox/v$VERSION/linux/amd64/toolbox
chmod +x toolbox
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  133M  100  133M    0     0   158M      0 --:--:-- --:--:-- --:--:--  158M

Run Toolbox for Data Initialization

In the GCE VM execute:

📝Note: If your SSH session was terminated by inactivity or any other reason you need to set your environment variables such as:

export PGPASSWORD=

REGION=us-central1

ADBCLUSTER=alloydb-aip-01

INSTANCE_IP=$(gcloud alloydb instances describe $ADBCLUSTER-pr --cluster=$ADBCLUSTER --region=$REGION --format="value(ipAddress)")

Export environment variables for database population:

export ALLOYDB_POSTGRES_PROJECT=$(gcloud config get-value project)
export ALLOYDB_POSTGRES_REGION="us-central1"
export ALLOYDB_POSTGRES_CLUSTER="alloydb-aip-01"
export ALLOYDB_POSTGRES_INSTANCE="alloydb-aip-01-pr"
export ALLOYDB_POSTGRES_DATABASE="assistantdemo"
export ALLOYDB_POSTGRES_USER="postgres"
export ALLOYDB_POSTGRES_PASSWORD=$PGPASSWORD
export ALLOYDB_POSTGRES_IP_TYPE="private"

Start toolbox for the database initiation. It will start the process locally which will help you to connect seamlessly to the destination database on AlloyDB to fill it up with sample data.

./toolbox --prebuilt alloydb-postgres

Expected console output. You should see in the last line of the output - "Server ready to serve!":

student@instance-1:~$ cexport ALLOYDB_POSTGRES_PROJECT=$PROJECT_ID
export ALLOYDB_POSTGRES_REGION="us-central1"
export ALLOYDB_POSTGRES_CLUSTER="alloydb-aip-01"
export ALLOYDB_POSTGRES_INSTANCE="alloydb-aip-01-pr"
export ALLOYDB_POSTGRES_DATABASE="assistantdemo"
export ALLOYDB_POSTGRES_USER="postgres"
export ALLOYDB_POSTGRES_PASSWORD=$PGPASSWORD
export ALLOYDB_POSTGRES_IP_TYPE="private"
student@instance-1:~$ ./toolbox --prebuilt alloydb-postgres
2025-09-02T18:30:58.957655886Z INFO "Using prebuilt tool configuration for alloydb-postgres" 
2025-09-02T18:30:59.507306664Z INFO "Initialized 1 sources." 
2025-09-02T18:30:59.50748379Z INFO "Initialized 0 authServices." 
2025-09-02T18:30:59.507618807Z INFO "Initialized 2 tools." 
2025-09-02T18:30:59.507726704Z INFO "Initialized 2 toolsets." 
2025-09-02T18:30:59.508258894Z INFO "Server ready to serve!"

Do not exit or close this tab of the Cloud Shell until data population is complete.

Populate Database

Open another Cloud Shell tab using the sign "+" at the top.

And connect to the instance-1 VM:

gcloud compute ssh instance-1 --zone=us-central1-a

Expected console output:

student@cloudshell:~ (test-project-402417)$ gcloud compute ssh instance-1 --zone=us-central1-a
Linux instance-1 6.1.0-37-cloud-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.140-1 (2025-05-22) x86_64

The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
Last login: Tue Sep  2 21:44:07 2025 from 35.229.111.9
student@instance-1:~$

Clone the GitHub repository with the code for the retrieval service and sample application.

In the GCE VM execute:

git clone  https://github.com/GoogleCloudPlatform/cymbal-air-toolbox-demo.git

Expected console output:

student@instance-1:~$ git clone  https://github.com/GoogleCloudPlatform/cymbal-air-toolbox-demo.git
Cloning into 'cymbal-air-toolbox-demo'...
remote: Enumerating objects: 3481, done.
remote: Counting objects: 100% (47/47), done.
remote: Compressing objects: 100% (41/41), done.
remote: Total 3481 (delta 16), reused 7 (delta 5), pack-reused 3434 (from 3)
Receiving objects: 100% (3481/3481), 57.96 MiB | 6.04 MiB/s, done.
Resolving deltas: 100% (2549/2549), done.
student@instance-1:~

Please pay attention if you have any errors.

Prepare Python environment and install requirement packages:

source .venv/bin/activate
cd cymbal-air-toolbox-demo
pip install -r requirements.txt

Set Python path to the repository root folder and run script to populate the database with the sample dataset. The first command is adding a path to our Python modules to our environment and the second command is populating our database with the data.

export PYTHONPATH=$HOME/cymbal-air-toolbox-demo
python data/run_database_init.py

Expected console output(redacted). You should see "database init done" at the end:

student@instance-1:~$ source .venv/bin/activate
(.venv) student@instance-1:~$ 
(.venv) student@instance-1:~$ cd cymbal-air-toolbox-demo/
(.venv) student@instance-1:~/cymbal-air-toolbox-demo$ pip install -r requirements.txt
python run_database_init.py
Collecting fastapi==0.115.0 (from -r requirements.txt (line 1))
  Downloading fastapi-0.115.0-py3-none-any.whl.metadata (27 kB)
Collecting google-auth==2.40.3 (from -r requirements.txt (line 2))
  Downloading google_auth-2.40.3-py2.py3-none-any.whl.metadata (6.2 kB)
Collecting google-cloud-aiplatform==1.97.0 (from google-cloud-aiplatform[evaluation]==1.97.0->-r requirements.txt (line 3))
  Downloading google_cloud_aiplatform-1.97.0-py2.py3-none-any.whl.metadata (36 kB)
Collecting itsdangerous==2.2.0 (from -r requirements.txt (line 4))
  Downloading itsdangerous-2.2.0-py3-none-any.whl.metadata (1.9 kB)
Collecting jinja2==3.1.5 (from -r requirements.txt (line 5))
  Downloading jinja2-3.1.5-py3-none-any.whl.metadata (2.6 kB)
Collecting langchain-community==0.3.25 (from -r requirements.txt (line 6))
  Downloading langchain_community-0.3.25-py3-none-any.whl.metadata (2.9 kB)
Collecting langchain==0.3.25 (from -r requirements.txt (line 7))
...

(.venv) student@instance-1:~/cymbal-air-toolbox-demo$ 
(.venv) student@instance-1:~/cymbal-air-toolbox-demo$ export PYTHONPATH=$HOME/cymbal-air-toolbox-demo
python data/run_database_init.py
Airports table initialized
Amenities table initialized
Flights table initialized
Tickets table initialized
Policies table initialized
database init done.
(.venv) student@instance-1:~/cymbal-air-toolbox-demo$

You can close this tab now.

In the VM session execute:

exit

And in the Cloud Shell session press ctrl+d or execute :

exit

In the first tab with running MCP Toolbox press ctrl+c in to exit from the toolbox running session.

The database has been populated with sample data for the application.

You can verify it by connecting to the database and checking the number of rows in the airports table. You can use the psql utility as we've used before or AlloyDB Studio . here is how you can check it using psql

In the ssh session to instance-1 VM execute:

export PGPASSWORD=<Noted AlloyDB password>

REGION=us-central1
ADBCLUSTER=alloydb-aip-01
INSTANCE_IP=$(gcloud alloydb instances describe $ADBCLUSTER-pr --cluster=$ADBCLUSTER --region=$REGION --format="value(ipAddress)")
psql "host=$INSTANCE_IP user=postgres dbname=assistantdemo" -c "SELECT COUNT(*) FROM airports"

Expected console output:

student@instance-1:~$ REGION=us-central1
ADBCLUSTER=alloydb-aip-01
INSTANCE_IP=$(gcloud alloydb instances describe $ADBCLUSTER-pr --cluster=$ADBCLUSTER --region=$REGION --format="value(ipAddress)")
psql "host=$INSTANCE_IP user=postgres dbname=assistantdemo" -c "SELECT COUNT(*) FROM airports"
 count 
-------
  7698
(1 row)

The database is ready and we can move on to MCP Toolbox deployment.

You've completed Part 1 of the AlloyDB Agentic RAG application codelab, please continue to Part 2.

Cloud SQL vs. Specialized Databases: Choosing Your Vector Search Solution

Gleb Otochkin — Wed, 29 Oct 2025 04:32:59 +0000

Introduction

The popularity of vector search has exploded with the introduction of Generative AI, where it serves as the main engine for semantic search.

This demand initially led to specialized vector databases like Pinecone, Milvus and others, but now, most mainstream databases have also incorporated vector search capabilities.

This leaves developers with a critical decision: adopt a new, specialized database or use their existing one? The wrong choice can lead to poor performance and costly refactoring. This post explains why the best solution is often the simplest one — leveraging your current relational database.

In this post I explain when you don’t need a specialized vector solution and your relational database might be a better choice. I will use Cloud SQL for MySQL as one of the relational databases with full vector support.

Simplified architecture and operations

Using your existing database like Cloud SQL for MySQL for vectors greatly simplifies your architecture and daily operations for three key reasons.

First, unified data management means your vectors live with your operational data. This streamlines everything from security and backups to monitoring, eliminating the need to manage a separate database and a complex data pipeline.
Second, you use standard tools. There are no new SDKs or languages to learn, as you can manage vectors using the standard SQL you already know.
Finally, this creates a flat learning curve. Your team can adopt vector search immediately without having to learn a new distributed system or syntax.

Data consistency

Data in applications are not static and often the source data for the vector embeddings are subject to change too. When it happens the corresponding vector has to be updated.

If you use Cloud SQL as your vector store you can do it in the same transaction inserting, deleting or updating the corresponding data and the vector together. It will eliminate any possible discrepancy between the vector and the data it represents. If a transaction is rolled back, data consistency stays intact.
When your vectors are in the Cloud SQL for MySQL, your application doesn’t need to wait for a data pipeline (ETL) to sync changes from your application database to your vector store. This removes a potential point of failure and eliminates data lag. You always have the latest data and vectors for your search.
With a separate store for vectors you need to capture the changes in the database and transfer those changes to the vector database or introduce the vector store to your application making it part of the application. That might create additional problems synchronizing different application services caches or correct handling of rollbacks on application level.

Hybrid Search

One of the main benefits of keeping the data and vectors together is the ability to use filtering and hybrid searches using non-vectorized data.

For example, if you search a product based on a product description using embedding vectors for the description you can also introduce a filter on the product brand or combine it with other preferences from a user profile.
Pre-filtering combines B-Tree indexes on other columns with vector search, reducing the search space for the vectors. It can drastically improve performance and required memory.
In some cases if you get data from a vector search and try to apply post-filtering they can remove bulk of the returned dataset and effectively reduce the number of returned values in some cases to zero.

Lower Total Cost of Ownership

This is more of a business reason than a technical one, but that doesn’t make it any less important. Moving your vectors into Cloud SQL for MySQL can significantly reduce your bill. Here is the reasoning behind that statement:

No data transfer cost. All your data is in the same place in the same database and you don’t need to keep a pipeline and additional resources.
Consolidation of resources. All your data is stored, backed up and managed in the same environment. Instead of provisioning a new database just for vectors, you can utilize your existing resources.
Reduced engineering hours. Your engineers spend less time learning, deploying and maintaining separate systems to keep your vectors.

When a Vector Database is a better choice

Cloud SQL for MySQL can be one of the better choices to store and work with your vectors but sometimes it might be prudent to consider a specialized vector database.

Some advanced vector-specific features provided by some of the vector databases which are not available on the Cloud SQL. For example, the concept of namespaces in Pinecone can be appealing to some workloads.
A massive scale with billions of vectors. In such a case one of the specialized solutions like the Google Vector Search might be a more feasible destination.
Sometimes decoupling makes sense when the vectors are serving different applications in microservices deployments.

Want to know more?

This is just a first blog from a series of articles about vector search in Cloud SQL written by me and my colleagues. If you want to know more about KNN and ANN and what stands behind it please read a blog Vector Search: Demystifying ANN and KNN written by Shu Zhou.

Automating AlloyDB Operations

Gleb Otochkin — Sat, 30 Aug 2025 02:58:21 +0000

Introduction

One of the best features of cloud services is the management API. Let’s imagine you need to implement some automated tasks. For example, you want your database instance, or multiple instances to stop at certain times, scale up, or do something else like start an on-demand backup. In the case of an in-house deployment, you need to program everything by yourself from start to finish. And believe me, I’ve done it. It may sound easy, but it is not.

However, virtually every cloud service comes with an API to handle all those tasks. That API is documented, aligned with what the service can do, and in some cases, it also has a client SDK. AlloyDB is no exception, and it has a documented API that can be used for automation. In one of my previous blogs, I’ve written how to scale up primary instances using CPU monitoring. Here, I am going to show how you can automate some other tasks.

APIs

The AlloyDB API documentation is available on the main reference page. There, you can find there reverence for version v1, v1beta and shared types such as “Date” and others.

Expanding the API reference might seem overwhelming at first with its long list of resources and types.

In reality, it’s quite straightforward. To help visualize the structure, let’s create a graph of the main AlloyDB API resources.

At a high level, the AlloyDB resource hierarchy begins with a Project, which contains one or more Locations. Nested under a specific location is the Cluster, which in turn contains resources like Instances, Backups, and Users.

All changes to these resources are done through Operations. Any request that modifies a resource, such as creating an instance or a backup, triggers an operation that you can monitor.

While this is a simplified view, it covers the basics. For this post, we’ll focus on just a few of these key resources and show how to manage them using Go and the REST API.

Cluster

In a project we can have one unique cluster per location. You cannot have two clusters with the same name in the same region. So, to define a cluster or clusters we want to modify we have to specify a project and a certain location as root resources. In case of an AlloyDB cluster the location will be represented by a region. You can get the location resource definition here in the reference. Here is an example of how the location resource can be defined in the code.

// List of available locations
type Locations struct {
 Locations []Location `json:"locations"`
}

type Location struct {
 Name string `json:"name"`
 LocationId string `json:"locationId"`
 DisplayName string `json:"displayName"`
}

...
// Get list of all locations instances for clusters with defined name in all locations
   locationsURL := fmt.Sprintf("%s/projects/%s/locations", apiURL, project)

   resp, err := client.Get(locationsURL)
   if err != nil {
    return nil, fmt.Errorf("failed to get all locations for project %s: %v", project, err)
   }
...
// Get list of locations 
locations := Locations{}
   err = json.Unmarshal(locationsListBody, &locations)
   if err != nil {
    return nil, fmt.Errorf("failed to unmarshal all locations: %v", err)
   }

The examples in this post use the Go language to make direct HTTP requests to the API. You can find the full source code for the Cloud Run function here.

Once we know the location we can define our cluster using cluster name as a parameter. For all clusters in the project we use an alias ALL and it will tell us to use “-” in the request URL to define all clusters.

Our sample code focuses on instance management, so we don’t explicitly define a Cluster type (or struct) in Go. Instead, we simply use the cluster’s name directly in the request URL to target the correct resources.

However, if you were performing actions on the cluster itself, like creating a new one, you would need to define that Cluster type in your code to properly structure the API request. You can find the full description of the cluster API resource in the documentation.

Instance

Each cluster can have one or more instances where one instance is the primary and the rest would belong to one of the read pools. Some operations like backup require the primary instance to be available. To create a proper request body for the instance it is defined as a type or struct in the Go code.

Here is how we would define the instances in the code.

// List of AlloyDB instances from API response
type Instances struct {
 Instances []Instance `json:"instances"`
}

// A single instance from the list
type Instance struct {
 Name string `json:"name"`
 DisplayName string `json:"displayName"`
 Uid string `json:"uid"`
 CreateTime string `json:"createTime"`
 UpdateTime string `json:"updateTime"`
 DeleteTime string `json:"deleteTime"`
 State string `json:"state"`
 InstanceType string `json:"instanceType"`
 Description string `json:"description"`
 IpAddress string `json:"ipAddress"`
 Reconciling bool `json:"reconciling"`
 Etag string `json:"etag"`
}

We have two types — one for instance itself and the other for a list of instances where each member is a single instance. It helps when we want to perform an operation on all instances of a cluster. Now, let’s talk about operations or what we can do with an instance.

Operations

What can we do with an instance? We can create, delete, change shape, and, more recently, start and stop them. You can read about starting and stopping instances in one of my previous blogs. For a complete list of all available methods, please refer to the reference documentation.

To illustrate, let’s delete an instance using curl. This is done by sending a DELETE request to the instance URL.

For this example, we’ll assume the following details:

Project ID: test-project-123
Region: us-central1
Cluster Name: my-cluster
Instance Name: my-instance

Given these parameters, the request would look like this:

curl -X DELETE -H "Authorization: Bearer $(gcloud auth print-access-token)" https://alloydb.googleapis.com/v1beta/projects/test-project-123/locations/us-central1/clusters/my-cluster/instances/my-instance

You noticed that I used an OAuth token to authenticate my request. That command works if you have gcloud SDK on your machine and authenticated in the cloud.

When you post such a request it will return an id of the operation triggered by that request. You can monitor the operation status using a “GET” request to the operations endpoint.

It’s also worth mentioning the failover operation, which is useful for managing High Availability (HA) instances by allowing you to switch between zones.

There are other AlloyDB resources such as backups and users which you can include to your tool and automate but now I want to focus on the way you initiate one or another operation.

Cloud Function

Technically you could use Google Cloud Scheduler to send HTTPS requests directly to AlloyDB API endpoints, but this approach has limitations. One of such limitations is that you often don’t know the exact name of a resource in advance.

For example, if you want to stop all AlloyDB instances in a project, you first need to query the API to get a list of those instances before you can send a ‘stop’ request for each one. The same is true for managing backups, where you must know a backup’s unique name to interact with it.

A more flexible solution is to use a serverless function (like Cloud Functions or Cloud Run) that is triggered by a Pub/Sub message. The message payload can dynamically specify the desired action, the target resources, and any other parameters you need.

Here is an example of what such a message payload might look like:

{
    "project": "test-project-123",
    "location": "us-central1",
    "operation": "STOP",
    "cluster": "my-cluster",
    "instance": "my-instance",
    "retention": 0
}

This message initiates a STOP operation on the my-instance instance of the my-cluster AlloyDB cluster in the us-central1 region.

The retention field here is for a future implementation of backup management, where you can specify a retention period for your manual backups.

In the code, the message would be represented by structs like the following:

type PubSubMessage struct {
 Data []byte `json:"data"`
}
type Parameters struct {
 Project string `json:"project"`
 Location string `json:"location"`
 Operation string `json:"operation"`
 Cluster string `json:"cluster"`
 Instance string `json:"instance"`
 Retention int `json:"retention"`
}

When we create a function, we specify an EventArc trigger that will invoke the function whenever a Pub/Sub message is published to a topic. The web console interface allows us to create the Pub/Sub topic at the same time we define the trigger for the function.

After defining all the metadata for the function, we can add the source code. As a reminder, the sample code is available for download from GitHub.

Now, whenever you publish a message to the alloydb-mgmt-topic using the JSON format discussed earlier, you can start, stop, scale, or delete a specific instance or all instances in a project. This can also be combined with monitoring to, for example, scale an instance up or down, as was described in one of the previous blogs.

Summary

AlloyDB service in Google Cloud gives you a great start with automated services and features that require minimal management. However, as your business grows and requires unique features, the AlloyDB API provides the ability to manage and automate all kinds of tasks based on your requirements.

Try the sample function code with the REST API and HTTP requests and also check the examples in the previously published blogs about start and stop automation and vertical autoscaling. Google also provides a Go SDK for AlloyDB along with SDK for other languages. Please try the code and let me know if you would like to see a version of the sample function that is based on the Go SDK.

B-tree indexes for JSON in PostgreSQL

Gleb Otochkin — Thu, 17 Jul 2025 02:28:46 +0000

Introduction

PostgreSQL is a swiss knife of databases — it supports all kinds of data. But different data types create different challenges. If you are using PostgreSQL to store JSON data you probably have heard about potential performance problems of querying JSON data. The general rule of thumb is to use JSONB data types when you can. But sometimes you want to use JSON not JSONB to preserve some information, like duplicated keys, maybe some null values or order of keys in the JSON. But how can we query the data more efficiently if we use JSON? In this blog I will talk about using B-tree indexes on JSON data.

JSON and B tree indexes

Let’s start from the basics and talk about JSON in Postgres. It can be stored either as a JSON data type where JSON is stored as a text or as a JSONB in a binary format. You could hear that you need JSONB to use indexes for your data. That’s not entirely true. While JSONB has more options for indexes, the JSON data type still supports some indexing. For example, you can create a B-Tree index for your known JSON keys and that index will be similar in behaviour to any other B-Tree indexes on expression in Postgres.

Here is an example of how you can create one.

I have a table jproducts with two columns — id and product where id is a primary key and the product is a JSON data type. The table has about 29 000 rows.

testdb=> \d ecomm.jproducts
              Table "ecomm.jproducts"
 Column | Type | Collation | Nullable | Default
---------+--------+-----------+----------+---------
 id | bigint | | not null |
 product | json | | |
Indexes:
    "jproducts_pkey" PRIMARY KEY, btree (id)

testdb=>

And here is the list of keys in my JSON:

          key | value_type
------------------------+------------
 brand | string
 category | string
 cost | number
 department | string
 distribution_center_id | number
 id | number
 name | string
 product_description | string
 product_image_uri | string
 retail_price | number
 sku | string

Let’s say we know that our application is going to filter data by brand and it is going to be executed often enough. If we run the query without an index then we can see it takes about 42 ms.

testdb=> explain analyze select product->>'department' from ecomm.jproducts where product @> jsonb_build_object('brand','Victor');
                                                        QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
 Gather (cost=1000.00..7143.47 rows=2036 width=32) (actual time=0.310..41.958 rows=178 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   -> Parallel Seq Scan on jproducts (cost=0.00..5939.87 rows=848 width=32) (actual time=0.068..39.653 rows=59 loops=3)
         Filter: ((product ->> 'brand'::text) = 'Victor'::text)
         Rows Removed by Filter: 9647
 Planning Time: 0.051 ms
 Execution Time: 41.998 ms
(8 rows)

Even if 42ms looks reasonable enough it can create some performance problems when it scales to hundreds of requests per second. What if we create an index for the key ‘brand’?

create index jproducts_brand on ecomm.jproducts using btree ((product->>'brand'));

Now if we execute the same query we can see the index is used:

testdb=> explain analyze select product->>'department' from ecomm.jproducts where product->>'brand' = 'Victor';
                                                         QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on jproducts (cost=5.42..488.04 rows=146 width=32) (actual time=0.051..0.481 rows=178 loops=1)
   Recheck Cond: ((product ->> 'brand'::text) = 'Victor'::text)
   Heap Blocks: exact=164
   -> Bitmap Index Scan on jproducts_brand (cost=0.00..5.38 rows=146 width=0) (actual time=0.024..0.024 rows=178 loops=1)
         Index Cond: ((product ->> 'brand'::text) = 'Victor'::text)
 Planning Time: 0.153 ms
 Execution Time: 0.455 ms
(7 rows)

Our new index reduces the execution time almost by 100 times. That is a significant performance boost. But will the index be used for any operation when we work with the ‘brand’ key? Let’s change our query a bit and use the LIKE operator or the UPPER function.

testdb=> explain analyze select product->>'department' from ecomm.jproducts where product->>'brand' like '%Victor%';
                                                QUERY PLAN
----------------------------------------------------------------------------------------------------------
 Seq Scan on jproducts (cost=0.00..3550.82 rows=9 width=32) (actual time=0.079..42.731 rows=178 loops=1)
   Filter: ((product ->> 'brand'::text) ~~'%Victor%'::text)
   Rows Removed by Filter: 28942
 Planning Time: 0.053 ms
 Execution Time: 42.768 ms
(5 rows)

testdb=> explain analyze select product->>'department' from ecomm.jproducts where upper(product->>'brand') = 'VICTOR';
                                                 QUERY PLAN
------------------------------------------------------------------------------------------------------------
 Seq Scan on jproducts (cost=0.00..3623.96 rows=146 width=32) (actual time=1.221..54.989 rows=178 loops=1)
   Filter: (upper((product ->> 'brand'::text)) = 'VICTOR'::text)
   Rows Removed by Filter: 28942
 Planning Time: 0.052 ms
 Execution Time: 55.028 ms
(5 rows)

testdb=>

The index is not used and it resembles exactly the same behaviour as for any other B-tree indexes on expressions. It has to be able to search in its binary tree using exactly the same expression as specified during the index creation. If we want the index to be used for a query with the UPPER function then the function has to be defined in the index.

create index jproducts_brand_upper on ecomm.jproducts using btree (UPPER(product->>'brand'));

And then the index will indeed be used if we have a query with the UPPER function.

testdb=> explain analyze select product->>'department' from ecomm.jproducts where upper(product->>'brand') = 'VICTOR';
                                                            QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on jproducts (cost=5.42..488.41 rows=146 width=32) (actual time=0.090..0.910 rows=178 loops=1)
   Recheck Cond: (upper((product ->> 'brand'::text)) = 'VICTOR'::text)
   Heap Blocks: exact=164
   -> Bitmap Index Scan on jproducts_brand_upper (cost=0.00..5.38 rows=146 width=0) (actual time=0.023..0.023 rows=178 loops=1)
         Index Cond: (upper((product ->> 'brand'::text)) = 'VICTOR'::text)
 Planning Time: 0.149 ms
 Execution Time: 0.952 ms
(7 rows)

testdb=>

Let’s talk about planner and stats for the index. Does PostgreSQL gather statistics for the keys in the JSON, and would it impact the planner’s decision on whether or not to use the index? I analyzed the table before creating our index but didn’t do it after.

Let’s check the planner’s expectation about the number of rows for the brands ‘Victor’ and ‘Verona Q’ .

testdb=> explain select product->>'department' from ecomm.jproducts where product->>'brand' = 'Victor';
                                   QUERY PLAN
--------------------------------------------------------------------------------
 Bitmap Heap Scan on jproducts (cost=5.42..488.04 rows=146 width=32)
   Recheck Cond: ((product ->> 'brand'::text) = 'Victor'::text)
   -> Bitmap Index Scan on jproducts_brand (cost=0.00..5.38 rows=146 width=0)
         Index Cond: ((product ->> 'brand'::text) = 'Victor'::text)
(4 rows)

testdb=> explain select product->>'department' from ecomm.jproducts where product->>'brand' = 'Verona Q';
                                   QUERY PLAN
--------------------------------------------------------------------------------
 Bitmap Heap Scan on jproducts (cost=5.42..488.04 rows=146 width=32)
   Recheck Cond: ((product ->> 'brand'::text) = 'Verona Q'::text)
   -> Bitmap Index Scan on jproducts_brand (cost=0.00..5.38 rows=146 width=0)
         Index Cond: ((product ->> 'brand'::text) = 'Verona Q'::text)
(4 rows)

testdb=>

It seems that the planner assumes the number of rows is roughly the same and equal to 146. By default PostgreSQL doesn’t gather statistics for keys in your JSON but it can do it for expressions or when you create an index on expression. We have created the index and we can help our planner by adding the statistics.

Let’s analyze the table and repeat one of our queries.

testdb=> analyze ecomm.jproducts;
ANALYZE
testdb=> explain select product->>'department' from ecomm.jproducts where product->>'brand' = 'Verona Q';
                                    QUERY PLAN
----------------------------------------------------------------------------------
 Bitmap Heap Scan on jproducts (cost=16.30..2162.20 rows=1034 width=32)
   Recheck Cond: ((product ->> 'brand'::text) = 'Verona Q'::text)
   -> Bitmap Index Scan on jproducts_brand (cost=0.00..16.04 rows=1034 width=0)
         Index Cond: ((product ->> 'brand'::text) = 'Verona Q'::text)
(4 rows)

testdb=> select count(*) from ecomm.jproducts where product->>'brand' = 'Verona Q';
 count
-------
  1034
(1 row)

Now the planner knows exactly how many rows of each particular brand we have and that can help to make the right decision. It might prefer the sequential scan (seq_scan) operation sometimes because it can be less expensive if cardinality (number of rows returned by the query block) is too high. The reason for that is the cost of querying using seq_scan is by default four times less costly than the random scan used for index. You can see (and change) it using the following parameters.

testdb=> show seq_page_cost ;
 seq_page_cost
---------------
 1
(1 row)

testdb=> show random_page_cost ;
 random_page_cost
------------------
 4
(1 row)

We can demonstrate it using an index on product->department key.

testdb=> create index jproducts_department on ecomm.jproducts using btree ((product->>'department'));
CREATE INDEX
Time: 93.934 ms
testdb=> explain select product->>'brand' from ecomm.jproducts where product->>'department' = 'Women';
                                     QUERY PLAN
-------------------------------------------------------------------------------------
 Bitmap Heap Scan on jproducts (cost=5.42..491.90 rows=146 width=32)
   Recheck Cond: ((product ->> 'department'::text) = 'Women'::text)
   -> Bitmap Index Scan on jproducts_department (cost=0.00..5.38 rows=146 width=0)
         Index Cond: ((product ->> 'department'::text) = 'Women'::text)
(4 rows)

Time: 63.887 ms
testdb=> analyze ecomm.jproducts;
ANALYZE
Time: 166.026 ms
testdb=> explain select product->>'brand' from ecomm.jproducts where product->>'department' = 'Women';
                           QUERY PLAN
-----------------------------------------------------------------
 Seq Scan on jproducts (cost=0.00..3869.77 rows=15989 width=32)
   Filter: ((product ->> 'department'::text) = 'Women'::text)
(2 rows)

Time: 58.981 ms
testdb=>

You can see that after creating the index we have the same default assumption about the number of rows for each department and the index is chosen to get the data. But as soon as we analyzed the table again it updated the stats and recognized that more than half of the table needs to be scanned. The overall cost for our index scan is higher than the sequential scan for the table. This is because we’re retrieving most of our table’s pages, and an index scan has a higher cost per page. As a result, the planner chooses to use sequential scan and ignore the index.

That’s great but what would we do if we don’t know what keys would be used in our application queries? Can we create an index for each key? We can but it might not be the best decision. Each index generates significant overhead for all operations on the data. Every inserted, deleted or updated row should update the indexes and then later be a subject of the vacuuming process. Here is a simple example of impact on insert from our two indexes.

-- Without any indexes on JSON
testdb=> insert into ecomm.jproducts select id, to_json(t) from products t;
INSERT 0 29120
Time: 954.380 ms
testdb=>

-- Creating indexes 
testdb=> create index jproducts_brand on ecomm.jproducts using btree ((product->>'brand'));
CREATE INDEX
Time: 69.097 ms
testdb=> create index jproducts_department on ecomm.jproducts using btree ((product->>'department'));
CREATE INDEX
Time: 63.080 ms
testdb=>

-- Insert with two indexes
testdb=> insert into ecomm.jproducts select id, to_json(t) from products t;
INSERT 0 29120
Time: 3008.885 ms (00:03.009)
testdb=>

Just two indexes increased insert time by 3 times and it is not taking into consideration the full impact from any deletes or updates, which will result in additional vacuuming of obsolete tuples.

Conclusion

Let’s recap what we’ve discussed here.

You can use index on expressions for your JSON data when you know what JSON keys your application or queries are going to use.
The indexes follow the same rules as any other b-tree indexes on expressions — your query should use a compatible expression.
You need to analyze your table after creating an index to have correct statistics for your indexed keys to help planner to make correct decisions about using the index.
Be mindful of index overhead and potential impact on your DML and maintenance operations. Create only indexes you really need.

What if you don’t know what keys are going to be used in your application and maybe don’t know what new keys can appear in your JSON data? Then maybe it makes sense to look into JSONB data type and GIN index. And that is what we are going to discuss in the next blog. Stay tuned.

Automating AlloyDB starts and stops

Gleb Otochkin — Fri, 13 Jun 2025 00:21:42 +0000

Introduction

Development and production environments have different requirements and that applies to the database world too. And there are enough cases when the development database doesn’t need to be up 24x7. Of course there is a possibility that some zealous developer waits until 2 AM on the Saturday morning to work on some ideas evading him or her during business hours. But for many companies the ability to stop resources when nobody uses them is not only a matter of saved money but also a responsible choice of saving energy and reducing emissions.

Until recent times one of the ways to achieve it was to delete the unused instances when they are not required and create them back when they are requested again. It didn’t affect the data since data was stored on cluster level but you would get new IP addresses, parameters needed to be updated and all connections based on those IPs should be updated.

Now AlloyDB API has an interface to stop and start primary instances and read pools without deleting them, saving all network configuration and parameters. That might be a game changer for some and can help manage cloud resources. Probably even more important is to automate that AlloyDB management and be able to start and stop the instances by a schedule. In the blog I will show how you can use Cloud Run Functions, Cloud Pub/Sub and Cloud Scheduler to achieve this goal. I will go step by step and hopefully it can serve as a principle example to develop your own solution.

Prepare

Let’s assume you have a brand new project for your development environment where you decide to deploy the AlloyDB cluster and schedule it to stop every night by 9pm. First you need all required APIs to be enabled for all necessary components. Here is the list of services we need to enable:

AlloyDB
Network services
Cloud Build — to build the image with function code
Artifact Registry — to store the image
Cloud Run functions — running the function
Eventarc — trigger for the function to execute
Cloud Pub/Sub — publish the message to trigger execution
Cloud Scheduler — send message to Cloud Pub/Sub at right time
Compute Engine and Resource Manager API

The easiest way to enable all of these is to either go to the Google Cloud Console or use Google Cloud Shell and run a command like the following.

gcloud services enable alloydb.googleapis.com \
                       servicenetworking.googleapis.com \
                       cloudbuild.googleapis.com \
                       artifactregistry.googleapis.com \
                       run.googleapis.com \
                       pubsub.googleapis.com \
                       eventarc.googleapis.com \
                       cloudscheduler.googleapis.com \
                       compute.googleapis.com \
                       cloudresourcemanager.googleapis.com

Having the APIs enabled we can create an AlloyDB cluster, primary instance and a couple of read pools for the tests. You can follow instructions in the documentation. By the way, do you know that if you haven’t used AlloyDB in your project, then you can create a free trial AlloyDB cluster. You can read more about free trial clusters here.

Let’s say you get your cluster created in the us-central1 region with the name my-cluster and the primary instance name is my-cluster-primary. The cluster is up and running now it is time to work on our automation.

Create Cloud Run Function

The AlloyDB start/stop feature is defined by the value of activation policy attribute for the instance. For a running instance it is “ALWAYS” and if we want to stop it we change it to “NEVER”. If we want to start it we return back the “ALWAYS” value. You can use the gcloud command to check the current activation policy for an instance.

gleb@cloudshell:~ (gleb-test-short-001-461915)$ gcloud alloydb instances describe my-cluster-primary --cluster my-cluster --region us-central1 --format="value(activationPolicy)"
ALWAYS
gleb@cloudshell:~ (gleb-test-short-001-461915)$

To update the activation policy of the AlloyDB instances we are going to build a function which will be triggered by the eventarc service. To build the function you can use the Google Cloud Console and the process as described in the documentation or use a command line. In this blog I am showing the command line approach. In the Cloud Shell (or a machine with installed Google SDK) execute:

mkdir alloydb-mgmt-fnc
cd alloydb-mgmt-fnc

Create a requirements.txt file in the alloydb-mgmt-fnc directory with the following contents:

functions-framework==3.*
google-auth
requests
cloudevents

We then create a file with our code for the function. For AlloyDB start stop operations I’ve prepared sample Python code which uses http requests to the AlloyDB API to update the activation policy for an instance based on parameters passed in a Pub/Sub message. As soon as the message is published in the defined topic the eventarc service triggers the function. We will talk about the exact structure for the Pub/Sub message later.

Here is the python code I’ve used for the function. Save it as a file with the name main.py.

# main.py
import base64
import json
import logging
import sys
import time

import google.auth
import google.auth.transport.requests
import requests
from cloudevents.http import CloudEvent

# Set up basic logging
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')

API_BASE_URL = "https://alloydb.googleapis.com/v1beta"

def get_access_token():
    """Gets a Google Cloud access token with the necessary scopes."""
    credentials, project = google.auth.default(scopes=["https://www.googleapis.com/auth/cloud-platform"])
    auth_req = google.auth.transport.requests.Request()
    credentials.refresh(auth_req)
    return credentials.token

def log_instance_details(instance_json: dict):
    """Logs all available properties from the instance's JSON response."""
    instance_name = instance_json.get('name', 'Unknown').split('/')[-1]
    logging.info("-----------------------------------------")
    logging.info(f"--- Detailed Properties for Instance: {instance_name} ---")
    for key, value in instance_json.items():
        logging.info(f" - {key}: {value}")
    logging.info("-----------------------------------------")

def control_alloydb_cluster(cloudevent: CloudEvent):
    """
    Cloud Function triggered by Pub/Sub to start/stop AlloyDB instances via HTTP.
    Args:
         cloudevent (cloudevent.http.CloudEvent): The CloudEvent representing the Pub/Sub message.
    """
    try:
        data_payload = cloudevent.data
        if isinstance(data_payload, bytes):
            data_payload = json.loads(data_payload.decode('utf-8'))

        pubsub_message_b64 = data_payload["message"]["data"]
        pubsub_message_json_str = base64.b64decode(pubsub_message_b64).decode('utf-8')
        message_json = json.loads(pubsub_message_json_str)

        project_id = message_json['project']
        region = message_json['region']
        cluster_name = message_json['cluster']
        operation = message_json['operation'].lower()

    except (KeyError, json.JSONDecodeError, UnicodeDecodeError, TypeError) as e:
        logging.error(f"Invalid message format or missing keys: {e}")
        return 'Invalid message format', 400

    if operation not in ['start', 'stop']:
        logging.error(f"Invalid operation '{operation}'. Must be 'start' or 'stop'.")
        return f"Invalid operation '{operation}'. Must be 'start' or 'stop'.", 400

    try:
        access_token = get_access_token()
        headers = {"Authorization": f"Bearer {access_token}"}

        list_url = f"{API_BASE_URL}/projects/{project_id}/locations/{region}/clusters/{cluster_name}/instances"
        logging.info(f"Listing instances from: {list_url}")

        response = requests.get(list_url, headers=headers)
        response.raise_for_status()

        response_json = response.json()
        all_instances = response_json.get('instances', [])

        if not all_instances:
            logging.warning(f"No instances found for cluster {cluster_name}")
            return 'No instances found', 200

        logging.info("Found the following instances. Logging their full details:")
        for instance in all_instances:
            log_instance_details(instance)

        primary_instance = None
        read_pool_instances = []

        for instance in all_instances:
            if instance.get('instanceType') == 'PRIMARY':
                primary_instance = instance
            elif instance.get('instanceType') == 'READ_POOL':
                read_pool_instances.append(instance)

        if not primary_instance:
            logging.error(f"No primary instance found for cluster {cluster_name}")
            return 'No primary instance found', 404

        logging.info(f"Proceeding with '{operation}' operation...")
        if operation == 'start':
            start_cluster(access_token, primary_instance, read_pool_instances)
        elif operation == 'stop':
            stop_cluster(access_token, primary_instance, read_pool_instances)

        logging.info(f"Successfully processed '{operation}' for cluster {cluster_name}")
        return 'Operation completed successfully.', 200

    except requests.exceptions.HTTPError as e:
        logging.error(f"An HTTP error occurred: {e.response.status_code} {e.response.text}")
        return 'An internal HTTP error occurred', 500
    except Exception as e:
        logging.error(f"An unexpected error occurred: {e}", exc_info=True)
        return 'An unexpected internal error occurred', 500

def start_cluster(token, primary, read_pools):
    """Starts the primary instance first, then the read pools."""
    logging.info(f"Starting primary instance: {primary['name']}")
    update_instance_activation_http(token, primary['name'], "ALWAYS")
    logging.info("Waiting for 10 seconds before starting read pools...")
    time.sleep(10)
    for rp in read_pools:
        logging.info(f"Starting read pool instance: {rp['name']}")
        update_instance_activation_http(token, rp['name'], "ALWAYS")
    logging.info("Start operation initiated for all instances.")

def stop_cluster(token, primary, read_pools):
    """Stops the read pool instances first, then the primary instance."""
    for rp in read_pools:
        logging.info(f"Stopping read pool instance: {rp['name']}")
        update_instance_activation_http(token, rp['name'], "NEVER")
    logging.info("Waiting for 10 seconds before stopping the primary...")
    time.sleep(10)
    logging.info(f"Stopping primary instance: {primary['name']}")
    update_instance_activation_http(token, primary['name'], "NEVER")
    logging.info("Stop operation initiated for all instances.")

def update_instance_activation_http(token, instance_name, policy):
    """Updates the activation policy via an HTTP PATCH request and polls the operation."""
    headers = {
        "Authorization": f"Bearer {token}",
        "Content-Type": "application/json"
    }
    patch_url = f"{API_BASE_URL}/{instance_name}?updateMask=activation_policy"
    payload = json.dumps({"activation_policy": policy})

    logging.info(f"Sending PATCH to {patch_url} with policy {policy}")
    response = requests.patch(patch_url, headers=headers, data=payload)
    response.raise_for_status()

    operation = response.json()
    op_name = operation['name']
    op_url = f"{API_BASE_URL}/{op_name}"

    logging.info(f"Initiated operation {op_name}. Polling for completion...")

    # WARNING: This polling loop may cause the Cloud Function to time out if the
    # AlloyDB operation takes longer than the function's configured timeout.
    # For production, consider a "fire-and-forget" approach or a more robust
    # workflow using Cloud Tasks to check the operation status.
    while True:
        op_response = requests.get(op_url, headers=headers)
        op_response.raise_for_status()
        op_status = op_response.json()
        if op_status.get('done', False):
            logging.info(f"Operation {op_name} completed.")
            if 'error' in op_status:
                logging.error(f"Operation failed: {op_status['error']}")
            break
        logging.info("Operation not done yet, waiting 5 seconds...")
        time.sleep(5)

We now have all the build components like source code for function in the main.py file and the requirements.txt file with required packages, and can deploy the function using google gcloud command.

gcloud run deploy alloydb-mgmt-fnc \
      --source . \
      --function control_alloydb_cluster \
      --base-image python313 \
      --region us-central1 \
      --no-allow-unauthenticated

You can see in the code I’ve named the function alloydb-mgmt-fnc and the entry point will be the control_alloydb_cluster procedure. Also you might notice I’ve explicitly called to not allow unauthenticated requests. An unauthenticated endpoint would allow anybody with network access to stop or start your AlloyDB instances. You can read more about service authentication in documentation. It usually takes around five minutes to build the image and deploy the function to Cloud Run service.

Create Trigger

As soon as the function is ready we can move forward with our other components. Since we plan to use Google Pub/Sub to pass the parameters for our function we need to create a topic. We can create it using Google Cloud Console or a command line. Here is an example of creating a topic with the name alloydb-mgmt.

gcloud pubsub topics create alloydb-mgmt

Then we need a service to subscribe to the topic and trigger the function passing the message to the function. That role is played by an eventarc trigger. In the following code I create an eventarc trigger in the same region as my function and with a subscription to the Pub/Sub topic defined earlier. Pay attention to the service account. It might be very useful to have dedicated service accounts for each component but here for simplicity I use the default service account for Compute service. You can read about best practices for using service accounts here.

PROJECT_ID=$(gcloud config get-value project)
PROJECT_NUMBER=$(gcloud projects describe $PROJECT_ID --format="value(projectNumber)")
EVENTARC_SERVICE_ACCOUNT=${PROJECT_NUMBER}-compute@developer.gserviceaccount.com
TOPIC_NAME=$(gcloud pubsub topics describe alloydb-mgmt --format="value(name)")
gcloud eventarc triggers create alloydb-mgmt-fnc-trigger \
    --location=us-central1 \
    --destination-run-service=alloydb-mgmt-fnc \
    --destination-run-region=us-central1 \
    --event-filters="type=google.cloud.pubsub.topic.v1.messagePublished" \
    --transport-topic=${TOPIC_NAME} \
    --service-account=${EVENTARC_SERVICE_ACCOUNT}

As you probably remember we’ve created a function with a service requiring authentication. To be able to pass parameters to our function the Pub/Sub service account needs to be able to generate authentication tokens. And that is exactly what we do in the next step.

PROJECT_ID=$(gcloud config get-value project)
PROJECT_NUMBER=$(gcloud projects describe $PROJECT_ID --format="value(projectNumber)")
PUBSUB_SERVICE_ACCOUNT=service-${PROJECT_NUMBER}@gcp-sa-pubsub.iam.gserviceaccount.com
PUBSUB_ROLE="roles/iam.serviceAccountTokenCreator"
gcloud projects add-iam-policy-binding ${PROJECT_ID} --member="serviceAccount:${PUBSUB_SERVICE_ACCOUNT}" --role="${PUBSUB_ROLE}"

We are almost there and can already test our function by passing a JSON message to the Pub/Sub topic. The message structure is simple. We provide the project, region, cluster name and operation. Here is an example:

{"project":"your-project-id","region":"us-central1","cluster":"my-cluster","operation":"stop"}

You can pass it directly to the Pub/Sub topic which then the eventarc subscription picks it up and triggers the function execution. Here is how to run it using the gcloud command (don’t forget to replace placeholder “your-project-id” by your real project id).

topic_name=$(gcloud pubsub topics describe alloydb-mgmt --format="value(name)")
gcloud pubsub topics publish ${topic_name} --message='{"project":"your-project-id","region":"us-central1","cluster":"my-cluster","operation":"stop"}'

Shortly after publishing the message you can check the status of your cluster instances in the console and see that one of the read pool instances is running the stopping operation.

You can see that one of the read pool instances is running the stopping operation. Remember that the stop operation starts from read pools, stopping them one by one, and only at the end stops the primary. When you start the instances it works in reverse order starting primary first.

Scheduling

The only thing that is left is scheduling of our operations. We can schedule the execution using Cloud Scheduler. Let’s schedule the stop operation to be executed at 9pm EDT every night.

TOPIC_NAME=$(gcloud pubsub topics describe alloydb-mgmt --format="value(name)")
CRON_SCHEDULE="0 21 * * *"
gcloud scheduler jobs create pubsub stop-my-cluster \
  --location=us-central1 \
  --schedule="${CRON_SCHEDULE}" \
  --time-zone="America/New_York" \
  --topic=${TOPIC_NAME} \
  --message-body='{"project":"your-project-id","region":"us-central1","cluster":"my-cluster","operation":"stop"}'

And now every night at 9 PM EDT the scheduler will pass the message to the topic where eventarc will pick it up and pass it to the function, which will stop all instances for our cluster.

Conclusion

Using a similar approach you can schedule AlloyDB to start all the instances every morning at 8AM just before you get your first cup of coffee and login to your instance as a developer. Try it, test it and let us know if you find any issues. I would also love to know if you prefer examples using Google Cloud Console instead of command line.

State-of-the-art text embedding in AlloyDB with the latest Gemini model

Gleb Otochkin — Thu, 29 May 2025 17:02:03 +0000

I think most of the readers are aware about large language models (LLM) and their ability to use the semantic meaning of a phrase rather than the exact wording. The LLM embeddings is a numeric representation of that meaning in the way of a numeric array. Those arrays can be compared with one another and the difference between them would be called “distance”. The closer the “distance” between the embeddings, the more similar the semantic meaning of the corresponding phrases.

There are multiple different embedding models around. Are they all the same? Of course not. Some do work better than others and we have different ways to compare their quality. For example there is a Massive Text Embedding Benchmark (MTEB) (Multilingual) leaderboard which ranks hundreds of embedding models, and you can can have a look at it here.

Google recently released the latest embedding model — Gemini Embedding text model. You can read about the model in the Google blog.

And the model is available right now in AlloyDB. You can use the standard embedding function to call the new model out of the box. Let’s briefly check the new model.

How to call the embedding model

You can call the model using something like the following:

select embedding('gemini-embedding-001', 'What is AlloyDB?');

It works out of box and generates embeddings returning the result for a single request with about the same speed as the current text embedding model text-embedding-005. The exact timing depends on the request itself and multiple other factors which could impact communications between instance and the model endpoint.

quickstart_db=> explain analyze select embedding('text-embedding-005', 'What is AlloyDB?');
QUERY PLAN
 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Result (cost=0.00..0.01 rows=1 width=32) (actual time=0.001..0.001 rows=1 loops=1)
Planning Time: 87.099 ms
Execution Time: 0.012 ms
(3 rows)
Time: 87.827 ms
quickstart_db=> explain analyze select embedding('gemini-embedding-001', 'What is AlloyDB?');
QUERY PLAN
 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Result (cost=0.00..0.01 rows=1 width=32) (actual time=0.001..0.001 rows=1 loops=1)
Planning Time: 77.402 ms
Execution Time: 0.016 ms
(3 rows)
Time: 78.135 ms
quickstart_db=>

But keep in mind that the gemini-embedding model returns by default a 3072-dimensional vector vs 768 for the text-embedding-005. It means you will need to use the vector(3072) column data type for the new model and it will consume more space.

quickstart_db=> create table t1 as select id, my_text, embedding('text-embedding-005', my_text) from t0;
SELECT 2000
quickstart_db=> SELECT pg_size_pretty(pg_total_relation_size('t1'));
pg_size_pretty
 - - - - - - - - 
8496 kB
(1 row)
Time: 1.030 ms
quickstart_db=> create table t2 as select id, my_text, embedding('gemini-embedding-001', my_text) from t0;
SELECT 2000
quickstart_db=> SELECT pg_size_pretty(pg_total_relation_size('t2'));
pg_size_pretty
 - - - - - - - - 
28 MB
(1 row)

What about the quality of the response? Of course the previously mentioned MTEB gives standardized and comprehensive data but what about real life experience? Let’s try to find an answer to one of the most important questions in the universe.

Battle of Bagels

I am creating a table expression with some basic information about bagel vendors and asking which ones are the most true bagels.

First we try it using the text-embedding-005 model and a short random selection of some bagels shops.

WITH bagels_vendors(brand, description, location) AS (
  VALUES
    ('Brooklyn Bagel & Coffee Company', 'Offers large, hand-rolled bagels with a good balance of chewy and soft texture.', 'Multiple locations in Manhattan and Brooklyn'),
    ('Ess-a-Bagel', 'Offers large, chewy, and dense bagels, considered a classic New York style.', 'Midtown East, Manhattan'),
    ('Tompkins Square Bagels', 'Popular spot with a wide variety of creative and classic bagel flavors and toppings.', 'East Village, Manhattan'),
    ('St-Viateur Bagel', 'Iconic Montreal bagel shop since 1957, known for hand-rolled bagels baked in a wood-fired oven.', 'Montreal, Quebec'),
    ('Fairmount Bagel', 'A long-standing Montreal institution since 1949, famous for its slightly denser and wood-fired bagels.', 'Montreal, Quebec'),
    ('Bagel Etc.', 'A popular Montreal spot since 1982, known for its neon-and-vinyl diner aesthetic and bagel-centric breakfast.', 'Montreal, Quebec')
),
combined_bagels (brand, description, location,text_embedding_005) AS (
  SELECT brand, description, location,embedding('text-embedding-005','brand name: '||brand||' description: '||description)::vector
    FROM bagels_vendors
)
select brand, location,text_embedding_005 <=> embedding ('text-embedding-005','The only correct way to make bagels')::vector as distance from combined_bagels order by distance;

              brand | location | distance
---------------------------------+----------------------------------------------+---------------------
 Brooklyn Bagel & Coffee Company | Multiple locations in Manhattan and Brooklyn | 0.3357732955292745
 Ess-a-Bagel | Midtown East, Manhattan | 0.34294438809905525
 Tompkins Square Bagels | East Village, Manhattan | 0.36784331284309657
 Fairmount Bagel | Montreal, Quebec | 0.3744778134972202
 St-Viateur Bagel | Montreal, Quebec | 0.3906381759225491
 Bagel Etc. | Montreal, Quebec | 0.42936727199347535

You can see all 3 first places are taken by New York style bagels. What if we replace the model by the new gemini-embedding-001?

WITH bagels_vendors(brand, description, location) AS (
  VALUES
    ('Brooklyn Bagel & Coffee Company', 'Offers large, hand-rolled bagels with a good balance of chewy and soft texture.', 'Multiple locations in Manhattan and Brooklyn'),
    ('Ess-a-Bagel', 'Offers large, chewy, and dense bagels, considered a classic New York style.', 'Midtown East, Manhattan'),
    ('Tompkins Square Bagels', 'Popular spot with a wide variety of creative and classic bagel flavors and toppings.', 'East Village, Manhattan'),
    ('St-Viateur Bagel', 'Iconic Montreal bagel shop since 1957, known for hand-rolled bagels baked in a wood-fired oven.', 'Montreal, Quebec'),
    ('Fairmount Bagel', 'A long-standing Montreal institution since 1949, famous for its slightly denser and wood-fired bagels.', 'Montreal, Quebec'),
    ('Bagel Etc.', 'A popular Montreal spot since 1982, known for its neon-and-vinyl diner aesthetic and bagel-centric breakfast.', 'Montreal, Quebec')
),
combined_bagels (brand, description, location,gemini_embedding_001) AS (
  SELECT brand, description, location,embedding('gemini-embedding-001','brand name: '||brand||' description: '||description)::vector
    FROM bagels_vendors
)
select brand, location,gemini_embedding_001 <=> embedding('gemini-embedding-001','The only correct way to make bagels')::vector as distance from combined_bagels order by distance;

And here we have Montreal style bagels moving a bit up. Not to be the first place but the St-Viateur Bagel takes the honorary 3rd place here.

              brand | location | distance
---------------------------------+----------------------------------------------+---------------------
 Ess-a-Bagel | Midtown East, Manhattan | 0.32417611253786394
 Brooklyn Bagel & Coffee Company | Multiple locations in Manhattan and Brooklyn | 0.35497361421585083
 St-Viateur Bagel | Montreal, Quebec | 0.3588242530822754
 Fairmount Bagel | Montreal, Quebec | 0.3619239710614687
 Tompkins Square Bagels | East Village, Manhattan | 0.3629311159767006
 Bagel Etc. | Montreal, Quebec | 0.40006683469017823

We still have the same two shops on the first and the second place but in different order and with bigger difference in the distance. It might look small but in some real applications that difference can change behaviour. From my testing using a couple of other datasets it felt like the new model gave better results. And of course you should definitely test it on your data.

Try it out

In my opinion Montreal style bagels are the best and should be among top rated bagels. The new Gemini embedding model gives them better ratings than the old one and it is of course right.

Try it with your data and see if you can get more accurate results using the new Gemini embedding model in AlloyDB for your text embedding needs today. And if you want some hands-on experience with embeddings try one of embeddings codelabs for AlloyDB.

AlloyDB Omni on Mac with Podman

Gleb Otochkin — Fri, 28 Feb 2025 16:30:56 +0000

Do you like to sink deep in an armchair with a laptop on your lap and do some development or run some tests for PostgreSQL? I do it all the time, sometimes from my living room and sometimes from an airport or sitting in a plane. And it is really handy to have a database engine running locally on your laptop. What database do you use in such a case? I think AlloyDB Omni can be a perfect candidate. Let me share my setup here.

I use Podman as the main platform to run containers on my Mac. I like the fact it is an open source project and runs under Apache License 2.0. You can use it totally free on your Mac and everybody can participate and contribute to the product and make it better. And it works perfectly well on my Mac.

When you install the Podman you need to create a default VM to host your containers. On Mac and Windows you need that VM since the container’s core depends on the Linux kernel. I would suggest assigning about 3GB memory to the VM to have some room for some advanced AlloyDB features. And I usually don’t use more than 50 GB storage for my sample databases.

Installing AlloyDB Omni is quite straightforward if you only want to test the process. You run the following command in your terminal.

podman run --name my-omni \
    -e POSTGRES_PASSWORD=MyVeryStongPassword \
    -d google/alloydbomni:latest

It works out of the box and you can start to test or develop right away. Let’s connect and create a database with name quickstart_db.

otochkin@Glebs-MacBook-Pro ~ % podman exec -it my-omni psql -h localhost -U postgres
psql (15.7)
Type "help" for help.

postgres=# create database quickstart_db;
CREATE DATABASE
postgres=# \l+ quickstart_db
                                                                   List of databases
     Name | Owner | Encoding | Locale Provider | Collate | Ctype | Locale | ICU Rules | Access privileges | Size | Tablespace | Description
---------------+----------+----------+-----------------+---------+-------+-----------+-----------+-------------------+-------+------------+-------------
 quickstart_db | postgres | UTF8 | icu | C | C | und-x-icu | | | 12 MB | pg_default |
(1 row)

That’s great but … That works for a simple test and probably good enough to be used to verify something or run a quick check when you don’t need to return back to it. It has a couple of issues. It requires access through the container itself which is not the most convenient way and it stores all the data on the ephemeral layer of the container. Every time when you recreate the container it wipes out all the data and starts from the blank list.

To make it right we need to add at least a couple of parameters. To give access through the network we can translate port 5432 to a local port. It can be the same 5432 or any other port. Let’s make it 5433 for my first container. In such a case the parameter will look like -p 5433:5432. And the second parameter specifies a directory where I want to keep my data. It will allow me to recreate containers, upgrade or modify some container parameters without losing my data. I’ve created a directory ~/Podman/my-omni on my Mac excatly for that purposes. Let’s see how we can combine all the parameters.

podman run -d --name my-omni \
-e POSTGRES_PASSWORD=MyVeryStongPassword \
-v ~/Podman/my-omni:/var/lib/postgresql/data \
-p 5433:5432 \
docker.io/google/alloydbomni:latest

Here is our AlloyDB Omni in the Podman GUI

Now everything looks much better I can connect using psql utility and create my database again.

otochkin@Glebs-MacBook-Pro ~ % psql -h localhost -p 5433 -U postgres
Password for user postgres:
psql (17.4 (Postgres.app), server 15.7)
Type "help" for help.

postgres=# create database quickstart_db;
CREATE DATABASE
postgres=# \l+ quickstart_db
                                                                   List of databases
     Name | Owner | Encoding | Locale Provider | Collate | Ctype | Locale | ICU Rules | Access privileges | Size | Tablespace | Description
---------------+----------+----------+-----------------+---------+-------+-----------+-----------+-------------------+-------+------------+-------------
 quickstart_db | postgres | UTF8 | icu | C | C | und-x-icu | | | 12 MB | pg_default |
(1 row)

postgres=#

What is next? You know AlloyDB is packed with different features such as automatic memory management or columnar engine. To make the columnar engine working right it is recommended to make shared memory available to the container by adding the shm-size flag. Let’s remove the old container and create a new with shm-size equal to 1 GB.

podman stop my-omni
podman rm my-omni
podman run -d --name my-omni \
-e POSTGRES_PASSWORD=MyVeryStongPassword \
-v ~/Podman/my-omni:/var/lib/postgresql/data \
-p 5433:5432 \
--shm-size=1g \
docker.io/google/alloydbomni:latest

Now we can enable and use columnar engine on our AlloyDB Omni. By the way, if you connect to it you will see that our quickstart_db is still there.

otochkin@Glebs-MacBook-Pro ~ % psql -h localhost -p 5433 -U postgres
Password for user postgres:
psql (17.4 (Postgres.app), server 15.7)
Type "help" for help.

postgres=# \l+ quickstart_db
                                                                   List of databases
     Name | Owner | Encoding | Locale Provider | Collate | Ctype | Locale | ICU Rules | Access privileges | Size | Tablespace | Description
---------------+----------+----------+-----------------+---------+-------+-----------+-----------+-------------------+-------+------------+-------------
 quickstart_db | postgres | UTF8 | icu | C | C | und-x-icu | | | 12 MB | pg_default |
(1 row)

postgres=#

We can use a codelab to test the columnar engine and see if how it works. I am not going to reproduce all the steps from the lab here — you can do it by yourself, but I can show the query from the lab and information about the columnar store.

Here is what we got in the columnar store when we’ve enabled columnar engine and ran queries several times.

quickstart_db=# SELECT database_name, schema_name, relation_name, column_name FROM g_columnar_recommended_columns;
 database_name | schema_name | relation_name | column_name
---------------+-------------+--------------------------------------+-------------
 quickstart_db | public | insurance_producers_licensed_in_iowa | city
 quickstart_db | public | insurance_producers_licensed_in_iowa | expirydate
 quickstart_db | public | insurance_producers_licensed_in_iowa | loa_has_ah
(3 rows)

And here is the query itself. During my tests execution time without columnar engine was about 45ms and with columnar engine was about 7.5 ms.

quickstart_db=# SELECT city, count(*) FROM insurance_producers_licensed_in_iowa WHERE loa_has_ah ='Yes' and expirydate > now() + interval '6 MONTH' GROUP BY city ORDER BY count(*) desc limit 5;
    city | count
-------------+-------
 TAMPA | 1996
 OMAHA | 1686
 KANSAS CITY | 1221
 AUSTIN | 1178
 MIAMI | 1082
(5 rows)

Time: 7.630 ms

And we can see in the execution plan that it indeed uses the columnar engine to speed up the execution. That performance difference visible even on my mac with very fast CPU and storage.

quickstart_db=# explain analyze SELECT city, count(*) FROM insurance_producers_licensed_in_iowa WHERE loa_has_ah ='Yes' and expirydate > now() + interval '6 MONTH' GROUP BY city ORDER BY count(*) desc limit 5;
                                                                                      QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit (cost=2335.09..2335.10 rows=5 width=17) (actual time=15.165..15.173 rows=5 loops=1)
   -> Sort (cost=2335.09..2348.24 rows=5261 width=17) (actual time=15.160..15.167 rows=5 loops=1)
         Sort Key: (count(*)) DESC
         Sort Method: top-N heapsort Memory: 25kB
         -> HashAggregate (cost=2195.10..2247.71 rows=5261 width=17) (actual time=13.167..14.439 rows=7639 loops=1)
               Group Key: city
               Batches: 1 Memory Usage: 1169kB
               -> Append (cost=20.00..1712.64 rows=96492 width=9) (actual time=13.160..13.166 rows=96751 loops=1)
                     -> Custom Scan (columnar scan) on insurance_producers_licensed_in_iowa (cost=20.00..1708.62 rows=96491 width=9) (actual time=13.156..13.158 rows=96751 loops=1)
                           Filter: ((loa_has_ah = 'Yes'::text) AND (expirydate > (now() + '6 mons'::interval)))
                           Rows Removed by Columnar Filter: 114169
                           Rows Aggregated by Columnar Scan: 96751
                           Columnar cache search mode: native
                     -> Seq Scan on insurance_producers_licensed_in_iowa (cost=0.00..4.02 rows=1 width=9) (never executed)
                           Filter: ((loa_has_ah = 'Yes'::text) AND (expirydate > (now() + '6 mons'::interval)))

There are some other features like adaptive autovacuum, automatic memory management or index adviser — all are baked in the AlloyDB. And we recently published a lot of useful information in our AlloyDB Omni documentation about best practices and recommendations how to use it. Try it an let us know what you think, share your experience.

AlloyDB Autoscaling is Easy

Gleb Otochkin — Tue, 14 Jan 2025 14:06:21 +0000

Can you imagine yourself buying something during a major sale in a store and not able to complete the purchase because the site is not responding? You click again and again but nothing happens. That is a frustrating experience. So, what could be a reason? Sometimes it is a frontend or a middleware problem but it also can be a database behind the application where CPU skyrockets to 100% because of the increased demand.

In some cases when you know when the load starts to grow you can proactively scale up the database instance and subsequently scale down it when the activity subsides. But what to do when the increased demands are not so predictable? In such a case, one of the possible solutions is to enable automation to scale up and down the instance based on defined criterias, CPU load for example. Let me show how to build such automation for AlloyDB instances based on monitoring metrics.

Here is the main workflow for the process. We create monitoring alerts for high or low CPU utilization which are sent to a Pub/Sub topic. Then a Cloud Function subscribed to the topic scale up or scale down the instance. Sounds simple, right? Let me show how to do that step by step.

Let’s start with the Pub/Sub topic. To create a topic you go to the Pub/Sub -> Topics and push the “Create Topic” button on the top. It will open a dialog where you can create a brand new topic.

The topic will serve as a connection point where our cloud function will pick up the message and depending on the information it will scale up or down an AlloyDB instance.

When the topic is created we can create an alert. To create an alert you go to Google Cloud console and choose Logging->Alerting

Then create a policy.

To build a policy based on CPU load the “Mean CPU Utilization” is probably the best choice since it allows us to generate an alert on an average CPU utilization for a sliding window preventing false alarms from accidental short CPU spikes.

For the metric I would put at least a 15 min sliding window to avoid cyclical scaling up and down. When your instance is resized you might experience a short spike in CPU utilization when all the services are starting up and the buffers are warming up. You will need to test and configure the exact size of your sliding window and maybe other filters and options for the metric.

Then we define the threshold for the alert. I put 85% as the value there. Depending on your usage pattern you can be more conservative or aggressive with the threshold.

And now we come to the notification channels. You remember we created the Pub/Sub topic? We go to the input form for the notification channels and push the button “Manage Notification Channels” where we choose “Add new” for Pub/Sub.

In another tab open the Pub/Sub topic we have created and copy the topic name.

Then we can put that topic name in the first tab where we add a new channel and name the notification channel.

And we can now choose the created notification channel for our alerting policy.

The final step is to name the alert policy since it will be our marker to identify what to do when we get the alert.

The alert has been created but to make it possible to publish messages in the alloydb-scale-up topic we need to grant role pubsub.publisher to the internal service account for notifications and alerts. Here is how to do it using gcloud SDK:

PROJECT_ID=$(gcloud config get-value project)
gcloud pubsub topics add-iam-policy-binding alloydb-scaling --member="serviceAccount:service-$(gcloud projects describe $PROJECT_ID --format="value(projectNumber)")@gcp-sa-monitoring-notification.iam.gserviceaccount.com" --role="roles/pubsub.publisher"

Before going forward and creating a Cloud Run function we need to add a service account and grant some roles to that account. I am going to create an account alloydb-scale-sa and use that name in the following commands. This is how I do that in a cloud shell session.

PROJECT_ID=$(gcloud config get-value project)
gcloud iam service-accounts create alloydb-scale-sa --project $PROJECT_ID
gcloud projects add-iam-policy-binding $PROJECT_ID \
  --member="serviceAccount:alloydb-scale-sa@$PROJECT_ID.iam.gserviceaccount.com" \
  --role="roles/alloydb.admin"

Then we can go forward and create a cloud function. In console switch to Cloud Run Functions and click “Create Function”.

There we need to provide the function name, choose Pub/Sub as a trigger and select our topic. Then we need to expand the runtime settings.

In the runtime settings we need to select the service account alloydb-scale-sa we created earlier.

On the next screen we need to put our code which will be responsible for scaling up and scaling down the AlloyDB instance. The name for the cluster, instance and location will be in the alert message and the name of the alert policy will define what kind of action we should execute. I’ve put a sample code here. This is a simplified example and in production you probably need to define your own parameters, conditions and error handling.

After pushing the deploy function we need to wait until the function is completely built and deployed.

When the function is deployed we need t ogo back to the Pub/Sub topic and click on the eventarc subscription.

We need to edit the subscription and change the service account to our alloydb-scale-sa account.

Save it by clicking the Update button at the bottom.

In theory you can use separate accounts for function and invocation but I’ve put the same one here for simplicity.

We also need to enable that service account or any other account assigned to the Pub/Sun subscription to be able to invoke the function. Here is how you can do it for the alloydb-scale-sa account.

PROJECT_ID=$(gcloud config get-value project)
gcloud functions add-invoker-policy-binding alloydb-scale-fnc --region=us-central1 --member="serviceAccount:alloydb-scale-sa@$PROJECT_ID.iam.gserviceaccount.com"

We have the alert set up, notifications, subscription and the function to react to parse the notification and scale up our instance if the mean CPU load exceeds 85% of CPU. To test it out, I’ve built a client VM with pgbench and run a TCP-B-like benchmark on my AlloyDB instance. You can read how to do that in detail in the public guide.

When I run the pgbench it creates a CPU load of about 87–89% which is sufficient to trigger the alert. After 15 minutes it creates an incident which you can see on the picture.

The alert is published to the Pub/Sub topic and the function parses the message, gets the information about the instance and scales up the instance.

And we can see the AlloyDB instance is getting updated.

The scaling up works now. How can we scale down automatically? So, the next step is to create another alert for a CPU lower than 10% to scale the instance down with the name “alloydb-scale-down”. The steps are exactly the same as for the previous alert and the only difference is the trigger action

And the policy name

The policy creates a request to scale down the instance if the mean CPU utilization for the last 15 minutes is lower than 10%.

We have created a policy to scale up and down our instance based on average CPU utilization. But there is a lot of different options you can use along with the policy using filters and conditions on the alert, different triggers based on other monitored values and how you want to increase or reduce the number of CPUs on the instances. Try it and let me know how it works.

AlloyDB Omni and local models on GKE

Gleb Otochkin — Wed, 18 Dec 2024 21:06:36 +0000

AlloyDB and Vertex AI are great cloud services providing tons of capabilities and options to serve as a main backend for development. But what if you need something different? Maybe more local and deployed as a compact self-serving deployment where all communications between different parts of the application should be as closed as possible? Or deploy it where normal access to the service endpoints is unavailable? Can we do it and still use all the good stuff from AlloyDB such as AI integration and improved vector search? Yes we can and in this blog I will show how to deploy a local AI model and AlloyDB Omni to the same kubernetes cluster and make them working together.

Deploying AlloyDB Omni

For my deployment I am using Google GKE and we are starting from creating a standard cluster. For most of the actions I am using google cloud shell and standard utilities coming with it. But you of course can use your own preferred environment. Here is command to create a cluster.

export PROJECT_ID=$(gcloud config get project)
export LOCATION=us-central1
export CLUSTER_NAME=alloydb-ai-gke
gcloud container clusters create ${CLUSTER_NAME} \
  --project=${PROJECT_ID} \
  --region=${LOCATION} \
  --workload-pool=${PROJECT_ID}.svc.id.goog \
  --release-channel=rapid \
  --machine-type=e2-standard-8 \
  --num-nodes=1

As soon as the cluster is deployed we can follow up preparing it for AlloyDB Omni. You can read about all requirements and about installation procedure in much more details in the documentation.

One of the requirements is to install cert-manager service. Most of the actions on the cluster is done using native kubernetes utilities like kubectl and helm. And to use the tools, we need cluster credentials. In GKE it is done by gcloud command.

gcloud container clusters get-credentials ${CLUSTER_NAME} --region=${LOCATION}

Then we can install the cert-manager service on our cluster.

kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.16.2/cert-manager.yaml

Now we need to get the helm package for the latest AlloyDB Omni kubernetes operator.

export GCS_BUCKET=alloydb-omni-operator
export HELM_PATH=$(gcloud storage cat gs://$GCS_BUCKET/latest)
export OPERATOR_VERSION="${HELM_PATH%%/*}"
gcloud storage cp gs://$GCS_BUCKET/$HELM_PATH ./ --recursive
helm install alloydbomni-operator alloydbomni-operator-${OPERATOR_VERSION}.tgz \
--create-namespace \
--namespace alloydb-omni-system \
--atomic \
--timeout 5m

When the AlloyDB Omni operator is installed we can follow up with the deployment of our database cluster. We need to deploy it with googleMLExtension=true parameter to be able to work with the AI models. Also I prefer to enable internal load balancer for the database deployment. It creates an internal IP in the project VPC and I can use a small VM with psql client installed to work with the databases, load data etc. You can find more information about the load balancer in the documentation. Here is my manifest to deploy AlloyDB Omni cluster with the name my-omni.

apiVersion: v1
kind: Secret
metadata:
  name: db-pw-my-omni
type: Opaque
data:
  my-omni: "VmVyeVN0cm9uZ1Bhc3N3b3Jk"
---
apiVersion: alloydbomni.dbadmin.goog/v1
kind: DBCluster
metadata:
  name: my-omni
spec:
  databaseVersion: "15.7.0"
  primarySpec:
    adminUser:
      passwordRef:
        name: db-pw-my-omni
    features:
      googleMLExtension:
        enabled: true
    resources:
      cpu: 1
      memory: 8Gi
      disks:
      - name: DataDisk
        size: 20Gi
        storageClass: standard
    dbLoadBalancerOptions:
      annotations:
        networking.gke.io/load-balancer-type: "internal"
  allowExternalIncomingTraffic: true

Save it as my-omni.yaml and then apply the configuration to the cluster.

kubectl apply -f my-omni.yaml

By the way, have you noticed the value I’ve used for my password in the secret? It accepts the values encoded in base64 and you can do it using standard linux utilities. Here is an example. I am encoding password “VeryStrongPassword” to get it encoded to base64.

echo -n "VeryStrongPassword" | base64

But speaking about kubernetes secrets and passwords I would rather use more secret solution to store passwords. In GKE I prefer to use Google Cloud Secret Manager. You can read in details how to implement it the documentation. It works really well. Also it helps to integrate AlloyDB Omni with AI models which require authorization like tokens or keys.

When the database cluster and internal load balancer are deployed we should see the external service for our Omni instance.

kubectl get service

In the output we should see a service of “LoadBalancer” type with an external IP. We can use that IP to connect to our instance from a VM in the same VPC.

DB_CLUSTER_NAME=my-omni
export INSTANCE_IP=$(kubectl get service al-${DB_CLUSTER_NAME}-rw-elb -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
echo $INSTANCE_IP

Knowing your load balancer IP you can use it as an export variable (useful for automation) or put it directly in the command.

export INSTANCE_IP=10.128.15.195
psql "host=${INSTANCE_IP} user=postgres"
# or simply
psql "host=10.128.15.195 user=postgres"

Deploying a Model

Now we need to deploy a local model to the same kubernetes cluster. So far we have only one default pool (compute nodes for your apps) with e2-standard-8 nodes. It is enough for our AlloyDB Omni but not ideal for inference. To run a model we need a node with graphic accelerator. For the test I’ve created a pool with L4 Nvidia accelerator. Here is the command.

export PROJECT_ID=$(gcloud config get project)
export LOCATION=us-central1
export CLUSTER_NAME=alloydb-ai-gke
gcloud container node-pools create gpupool \
  --accelerator type=nvidia-l4,count=1,gpu-driver-version=latest \
  --project=${PROJECT_ID} \
  --location=${LOCATION} \
  --node-locations=${LOCATION}-a \
  --cluster=${CLUSTER_NAME} \
  --machine-type=g2-standard-8 \
  --num-nodes=1

Keep in mind quotas for the project when you create the pools. Not all types accelerators available by default and it may dictate the way you deploy the model.

I was using Hugging Face to deploy the BGE Base v1.5 embedding model. Hugging face provides full instruction and deployment package to be used with GKE.

We need the deployment manifest and we can get it from the Huggigface GitHub.

git clone https://github.com/huggingface/Google-Cloud-Containers

If you plan to reuse the model it makes sense to use a google cloud storage (GCS) bucket to keep it between the deployments but in my case I am only testing it and skipping the bucket part. The GCS option is also included to the downloaded package.

For deployment without a GCS we need to review and modify the Google-Cloud-Containers/examples/gke/tei-from-gcs-deployment/gpu-config/deployment.yaml file replacing the cloud.google.com/gke-accelerator value by our nvidia-l4. Also we need to define limits to the resources we request or we can get an error.

vi Google-Cloud-Containers/examples/gke/tei-deployment/gpu-config/deployment.yaml

Here is the corrected manifest:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: tei-deployment
spec:
  replicas: 1
  selector:
    matchLabels:
      app: tei-server
  template:
    metadata:
      labels:
        app: tei-server
        hf.co/model: Snowflake--snowflake-arctic-embed-m
        hf.co/task: text-embeddings
    spec:
      containers:
        - name: tei-container
          image: us-docker.pkg.dev/deeplearning-platform-release/gcr.io/huggingface-text-embeddings-inference-cu122.1-4.ubuntu2204:latest
          resources:
            requests:
              nvidia.com/gpu: 1
            limits:
              nvidia.com/gpu: 1
          env:
            - name: MODEL_ID
              value: Snowflake/snowflake-arctic-embed-m
            - name: NUM_SHARD
              value: "1"
            - name: PORT
              value: "8080"
          volumeMounts:
            - mountPath: /dev/shm
              name: dshm
            - mountPath: /data
              name: data
      volumes:
        - name: dshm
          emptyDir:
            medium: Memory
            sizeLimit: 1Gi
        - name: data
          emptyDir: {}
      nodeSelector:
        cloud.google.com/gke-accelerator: nvidia-l4

Then we can follow up by creating the namespace, service account and deploying all the rest.

export NAMESPACE=hf-gke-namespace
export SERVICE_ACCOUNT=hf-gke-service-account
kubectl create namespace $NAMESPACE
kubectl create serviceaccount $SERVICE_ACCOUNT --namespace $NAMESPACE
kubectl apply -f Google-Cloud-Containers/examples/gke/tei-deployment/gpu-config

If we have a look to the created service we can see that by default it has only cluster IP and it means it is available only inside the cluster. Nobody outside the cluster have access to the model.

gleb@cloudshell:~/blog (test)$ kubectl get service tei-service
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
tei-service ClusterIP 34.118.225.12 <none> 8080/TCP 12m
gleb@cloudshell:~/blog (test)$

The service will be available for requests using endpoint URL http://34.118.225.12:8080/embed for the embeddings generation.

Register Model in AlloyDB Omni

Everything is ready to register the deployed model in AlloyDB Omni. We are starting from creating a demo database. In a psql session (remember our jump box VM?) connect as postgres user and run.

create database demo;

Let’s connect to the new “demo” database

psql "host=10.128.15.195 user=postgres dbname=demo"

And there we can register our new model using the google_ml procedures. Before registering an embedding model we need to create Transform functions which are responsible to transform input and output to the expected values. Here are functions I’ve prepared for our model.

-- Input Transform Function corresponding to the custom model endpoint
CREATE OR REPLACE FUNCTION tei_text_input_transform(model_id VARCHAR(100), input_text TEXT)
RETURNS JSON
LANGUAGE plpgsql
AS $$
DECLARE
  transformed_input JSON;
  model_qualified_name TEXT;
BEGIN
  SELECT json_build_object('inputs', input_text, 'truncate', true)::JSON INTO transformed_input;
  RETURN transformed_input;
END;
$$;

-- Output Transform Function corresponding to the custom model endpoint
CREATE OR REPLACE FUNCTION tei_text_output_transform(model_id VARCHAR(100), response_json JSON)
RETURNS REAL[]
LANGUAGE plpgsql
AS $$
DECLARE
  transformed_output REAL[];
BEGIN
  SELECT ARRAY(SELECT json_array_elements_text(response_json->0)) INTO transformed_output;
  RETURN transformed_output;
END;
$$;

Then we register the new model with the name bge-base-1.5. I used the early described http endpoint with the cluster service IP and our transform functions.

CALL
  google_ml.create_model(
    model_id => 'bge-base-1.5',
    model_request_url => 'http://34.118.225.12:8080/embed',
    model_provider => 'custom',
    model_type => 'text_embedding',
    model_in_transform_fn => 'tei_text_input_transform',
    model_out_transform_fn => 'tei_text_output_transform');

Tests

Let’s test it and see how many dimensions have a generated vector. here is the output:

demo=# select array_dims(google_ml.embedding('bge-base-1.5','What is AlloyDB Omni?'));
 array_dims 
------------
 [1:768]
(1 row)

demo=#

Great! It works and shows that our embedding function returns a real array with 768 dimensions.

I used a small dataset from one of codelabs for embeddings I’ve created some time ago to generated embeddings and run a query.

demo=# \timing
Timing is on.
demo=# SELECT
        cp.product_name,
        left(cp.product_description,80) as description,
        cp.sale_price,
        cs.zip_code,
        (ce.embedding <=> google_ml.embedding('bge-base-1.5','What kind of fruit trees grow well here?')::vector) as distance
FROM
        cymbal_products cp
JOIN cymbal_embedding ce on
        ce.uniq_id=cp.uniq_id
JOIN cymbal_inventory ci on
        ci.uniq_id=cp.uniq_id
JOIN cymbal_stores cs on
        cs.store_id=ci.store_id
        AND ci.inventory>0
        AND cs.store_id = 1583
ORDER BY
        distance ASC
LIMIT 10;
     product_name | description | sale_price | zip_code | distance       
-----------------------+----------------------------------------------------------------------------------+------------+----------+---------------------
 California Sycamore | This is a beautiful sycamore tree that can grow to be over 100 feet tall. It is | 300.00 | 93230 | 0.22753925487632942
 Toyon | This is a beautiful toyon tree that can grow to be over 20 feet tall. It is an e | 10.00 | 93230 | 0.23497374266229387
 California Peppertree | This is a beautiful peppertree that can grow to be over 30 feet tall. It is an e | 25.00 | 93230 | 0.24215884459965364
 California Redwood | This is a beautiful redwood tree that can grow to be over 300 feet tall. It is a | 1000.00 | 93230 | 0.24564130578287147
 Cherry Tree | This is a beautiful cherry tree that will produce delicious cherries. It is an d | 75.00 | 93230 | 0.24846117929767153
 Fremont Cottonwood | This is a beautiful cottonwood tree that can grow to be over 100 feet tall. It i | 200.00 | 93230 | 0.2533482837690365
 Madrone | This is a beautiful madrona tree that can grow to be over 80 feet tall. It is an | 50.00 | 93230 | 0.25755536556243364
 Secateurs | These secateurs are perfect for pruning small branches and vines. | 15.00 | 93230 | 0.26093776589260964
 Sprinkler | This sprinkler is perfect for watering a large area of your garden. | 30.00 | 93230 | 0.26263969504592044
 Plant Pot | This is a stylish plant pot that will add a touch of elegance to your garden. | 20.00 | 93230 | 0.2639707045520192
(10 rows)

Time: 25.900 ms
demo=#

The response time was about 25 ms in average and relatively stable. Also the recall quality was quite descent returning good selection of trees from the inventory.

You can try to deploy AlloyDB Omni along with the different AI models right now in GKE or to your local Kubernetes environment. The great thing about AlloyDB Omni is that it can be deployed anywhere where you can run containers.

In the next post I will compare performance and recall with other model and with full text search. Stay tuned.

Hugging face and Open AI Models in AlloyDB Omni

Gleb Otochkin — Fri, 31 May 2024 16:11:32 +0000

In my previous post I showed how to register some Vertex AI models in the AlloyDB Omni database and how to generate text or embeddings using just SQL. But what if you want to call an AI model outside of Google infrastructure. In such a case you still can register the model using credentials and endpoint provided by the third party provider. In this post I will show you how to register a model hosted by OpenAI and in the Hugging Face.

A quick reminder that all what I am describing here is still in preview and some things can be changed later.

We’ve already covered how to install and enable the AI integration in the AlloyDB Omni in the previous post. Here we continue assuming that the database is created, the AI integration is enabled and the the latest version of the google_ml_integration extension is installed. And, as of now, even when we work with 3d party models we still need the integration with Google Cloud because we use Google Cloud Secret Manager to store credentials, such as API tokens for 3 party providers. That integration is using the same service account we created for the Vertex AI authentication and the same service account is used to retrieve our secrets from the Google Secret Manager.

We will go step by step and at first, we have to enable the Secret Manager API:

gleb@cloudshell:~ (gleb-test-project)$ gcloud services enable secretmanager.googleapis.com
Operation "operations/acat.p2-870304593371-39f6ec47-ab61-45b5-9d42-ba128a5af575" finished successfully.
gleb@cloudshell:~ (gleb-test-project)$

Now let us move to the OpenAI platform and generate an API key. We need to go to https://platform.openai.com/api-keys and push the button “ Create new secret key” at the top right of the screen, put a name for the key and push the button “Create secret key”:

It will open another popup window with the key value. It has to be copied aside to be used later since it appears only once — when you create the key.

Later you can change the key name and permissions but you will not be able to read the key value again.

Then we can create a secret in the Google Secret Manager using either the Google Cloud Web Console or a gcloud command like the following:

gleb@cloudshell:~ (gleb-test-project)$ gcloud secrets create openai-api-key 
Created secret [openai-api-key].
gleb@cloudshell:~ (gleb-test-project)$ echo -n "sk-omni-access-key-...........................8hP9" | gcloud secrets versions add openai-api-key --data-file=-
Created version [1] of the secret [openai-api-key].
gleb@cloudshell:~ (gleb-test-project)$

And grant permissions to the secret to our service account used to set up the AI integration. We are still using the same vertex-ai-connect google service account we’ve used for the Vertex AI integration.

gleb@cloudshell:~ (gleb-test-project)$ gcloud secrets add-iam-policy-binding openai-api-key --member='serviceAccount:vertex-ai-connect@gleb-test-project.iam.gserviceaccount.com' --role='roles/secretmanager.secretAccessor'
Updated IAM policy for secret [openai-api-key].
bindings:
- members:
  - serviceAccount:vertex-ai-connect@gleb-test-project.iam.gserviceaccount.com
  role: roles/secretmanager.secretAccessor
etag: BwYZr3y9lZ4=
version: 1
gleb@cloudshell:~ (gleb-test-project)$

Now we need to register the created secret using google_ml.create_sm_secret procedure where we define the mapping between google_ml secret_id and the path to our secret in the Google Secret Manager. In a psql session connecting to the ai_demo database execute:

ai_demo=# CALL google_ml.create_sm_secret( 
    secret_id => 'openai-api-key', 
    secret_path => 'projects/gleb-test-project/secrets/openai-api-key/versions/1'
);
CALL
ai_demo=#

For our test we are going to test one of the latest Open AI models — gpt-4o. The API documentation for the chat interface is published here. From the documentation we take all required parameters and format for input and output for our queries to the AI model.

And here is the procedure to Here is register the Open AI gpt-4o model using the endpoint from Open AI documentation, previously registered secret and the model name:

CALL google_ml.create_model(
    model_id => 'gpt-4o', 
    model_provider => 'open_ai', 
    model_request_url =>'https://api.openai.com/v1/chat/completions', 
    model_type => 'generic', 
    model_auth_type => 'secret_manager', 
    model_auth_id => 'openai-api-key', 
    model_qualified_name => 'gpt-4o');

Now we can run the google_ml.predict_row() function supplying the model id and the request in JSON format according to Open AI specifications for API. We place a simple request asking the model what is the AlloyDB Omni:

select google_ml.predict_row('gpt-4o','{"model" : "gpt-4o", "messages" : [{"role": "user", "content": "What is AlloyDB Omni?"}]}')->'choices'->0->'message'->'content';

And here is the result. It looks factually correct and provide a very decent high level overview of what AlloyDB Omni is:

"AlloyDB Omni is a version of AlloyDB that is designed to run in environments beyond Google Cloud Platform (GCP). With AlloyDB Omni, you can deploy and manage AlloyDB on various platforms including on-premises data centers, other cloud providers, and edge locations. This flexibility allows organizations to utilize AlloyDB's features and capabilities while leveraging existing infrastructure or taking advantage of multi-cloud strategies."

I want to give one more example for another provider — Hugging Face. To register a model from Hugging Face we have to create an API token there first. We open the https://huggingface.co/settings/tokens page and push the “New token” button. It will open a pop-up window where we name the token and optionally define its scopes.

After pushing the “Generate token” button we get the new token which can be copied and used to create the secret.

The new token should be granted permissions to make calls to the Inference endpoints. You can always modify the permissions using the “Manage” button.

The process of creating the secret is the same as the one we’ve used for the Open AI token and required a few commands. Here is an example:

gcloud secrets create hugginface-api-key
echo -n "hf_....................oofm" | gcloud secrets versions add hugginface-api-key --data-file=-
gcloud secrets add-iam-policy-binding hugginface-api-key --member='serviceAccount:vertex-ai-connect@gleb-test-project.iam.gserviceaccount.com' --role='roles/secretmanager.secretAccessor'

And here is command executed in the psql session connected to the ai_demo database:

CALL google_ml.create_sm_secret( 
    secret_id => 'hugginface-api-key', 
    secret_path => 'projects/gleb-test-project/secrets/hugginface-api-key/versions/1'
);

Now we can register one of the models served by the Hugging Face platform. Here I’ve decided to give a try to the Mistral-7B. For the model we can use the free serverless endpoints. To get the required parameters such as JSON payload format and endpoint we need to open the “Inference API (serverless)” page and click on “Curl” at the top of the pop-up window.

Then we can register our model in the AlloyDB Omni using the provided information.

CALL google_ml.create_model(
    model_id => 'Mistral-7B-Instruct-v0.3', 
    model_provider => 'custom', 
    model_request_url =>'https://api-inference.huggingface.co/models/mistralai/Mistral-7B-Instruct-v0.3', 
    model_type => 'generic', 
    model_auth_type => 'secret_manager', 
    model_auth_id => 'hugginface-api-key', 
    model_qualified_name => 'Mistral-7B-Instruct-v0.3');

And try to ask exactly the same question using google_ml.predict_row() function. You’ve noticed that I parsed the JSON output getting the first value from the array and value for the “generated_text” key:

select google_ml.predict_row('Mistral-7B-Instruct-v0.3','{"inputs" : "What is AlloyDB Omni?"}')->0->'generated_text';

And here is the response:

"What is AlloyDB Omni? AlloyDB Omni is a fully managed and scalable database service for running dynamic and complex workloads across multiple regions with low-latency, high-throughput, and consistent performance. It\u2019s a global database offering that combines the high-performance and scalable characteristics of AlloyDB for PostgreSQL with the distributed and resilient capabilities of a global database. AlloyDB Omni is ideal for building and deploying applications that require low-latency and high"

The response is not entirely accurate and probably needs some grounding but the wording itself was not too bad and maybe with addition of RAG or tuning using Google Cloud documentation can be improved. But that is not in the scope of this post.

This is the second post from the series about AI on AlloyDB Omni. It has covered registration of AI models hosted by 3d party providers using examples for Open AI and Hugging Face. Also you can watch on Youtube when I tried it for the first time with Open AI models. Hopefully this is not the last post in the series. I have some ideas for the next post and, if you have some ideas or want some particular information about AlloyDB or AlloyDB Omni, please ping me here or X(twitter) or Linkedin. Stay tuned and happy testing.