Forem: Rosemary Wang

Agent2Agent Protocol, IBM Vault, & OAuth 2.0 On-Behalf-Of

Rosemary Wang — Thu, 23 Apr 2026 19:54:13 +0000

I wrote a blog on using AI agent authorization with Agent2Agent protocol and IBM Vault that focused on setting up Vault as an OIDC provider to authenticate and authorize requests from an Agent2Agent client to a server. While it works, the post missed something rather critical: identity delegation. Basically, if I am an end user, I want to delegate my Agent2Agent (A2A) client to act on my behalf to access the A2A server.

It turns out a number of folks in the agent identity space (Microsoft Entra Agent ID, Christian Posta) have been exploring and implementing RFC 8693: OAuth 2.0 Token Exchange as a way of facilitating and tracking identity delegation. At the time of this post, Vault did not have a secrets engine that that implemented this specification - so I did it as a proof of concept for my own education and knowledge. I ended up creating a Security Token Service (STS) with a custom Vault secrets engine that implements RFC 8693.

Step 1: get a subject token from Vault as OIDC provider

The general workflow for delegating identity required me to read the specification a few times. First, the end user authenticates to the client agent using an OIDC provider to get a subject token.

I set up Vault as an OIDC provider to support a may-act OIDC scope. This scope attaches a may_act claim to the id_token with a list of client agents allowed to act on the user's behalf.

locals {
  may_act_scope_name = "may-act"
  may_act_claim      = jsonencode([for agent, info in var.client_agents : { client_id = agent, sub = vault_identity_entity.client_agents[agent].id }])
}

resource "vault_identity_oidc_scope" "may_act" {
  name        = local.may_act_scope_name
  template    = <<EOT
{
  "client_id": "${vault_identity_oidc_client.agent.client_id}",
  "may_act": ${local.may_act_claim}
}
EOT
  description = "May act claim that includes what agents can act on behalf of user"
}

The client_id and the sub in the may_act claim refer to the client agent that requests delegated access, not the end user. The combination of client_id and sub enables the custom Vault secrets engine to check that the client agent's Vault role (client_id) and entity ID (sub) can have access on behalf of the end user. I decided both needed to be checked because Vault assigns a new entity for every role for each authentication method.

With the correct OIDC scope for the OIDC request, Vault returns a subject access token with a set of claims allowing certain entities to act on behalf of the user.

{
  "at_hash": "gBJNAqZ6z7Yz7UG-z69Leg",
  "aud": "Sy0uWliApQrPApxpLp7gYVD0wAjvQNse",
  "c_hash": "PLTNZjVHMxhDWOLIeZ_sQA",
  "client_id": "Sy0uWliApQrPApxpLp7gYVD0wAjvQNse",
  "exp": 1776796482,
  "iat": 1776792882,
  "iss": "$VAULT_ADDR/v1/identity/oidc/provider/agent",
  "may_act": [
    {
      "client_id": "test-client",
      "sub": "83b1d088-c7d5-b8a4-dd7b-99baca521f8d"
    }
  ],
  "namespace": "root",
  "sub": "50099deb-d0cf-911b-4310-64a173c542a6"
}

Note that if you have multiple entities that can act on behalf of a user, you'd need to create different scopes for each. As long as the OIDC provider supports the various scopes with different may_act claims, your end user can adjust which entities may act on their behalf.

Beyond defining the scope, I set a few other configurations for Vault as a OIDC provider. The full code example is located on GitHub. You can use another identity provider as an OIDC provider as well, as long as they provide a subject token with the may_act claim.

Step 2: get an actor token from Vault's identity secrets engine

Next, the client agents needs to request an actor token with a client_id and sub identifying the agent. I set up the Vault identity secrets engine to generate a JWT with the required claims.

{
  "aud": "test-client",
  "client_id": "test-client",
  "exp": 1776881849,
  "iat": 1776795449,
  "iss": "$VAULT_ADDR/v1/identity/oidc",
  "namespace": "root",
  "scope": "helloworld:read",
  "sub": "83b1d088-c7d5-b8a4-dd7b-99baca521f8d"
}

The identity secrets engine requires the token request to be tied to an entity, which makes it ideal for generating the actor token. The entity ID indicates the authentication and role making the request. For example, the sub claim includes an entity ID tied to a few authentication methods and roles, including the Kubernetes and AppRole auth methods.

$ vault read identity/entity/id/83b1d088-c7d5-b8a4-dd7b-99baca521f8d

Key                    Value
---                    -----
aliases                [map[canonical_id:83b1d088-c7d5-b8a4-dd7b-99baca521f8d creation_time:2026-04-20T16:50:33.077473683Z custom_metadata:<nil> id:87db0c7d-032b-ad5c-c3fb-d9faee1686f7 last_update_time:2026-04-20T17:54:26.863503728Z local:false merged_from_canonical_ids:<nil> metadata:map[service_account_name:test-client service_account_namespace:default service_account_secret_name: service_account_uid:2505bc80-5765-4f18-9f60-b4877d860350] mount_accessor:auth_kubernetes_6cb5b3d7 mount_path:auth/kubernetes/ mount_type:kubernetes name:2505bc80-5765-4f18-9f60-b4877d860350] map[canonical_id:83b1d088-c7d5-b8a4-dd7b-99baca521f8d creation_time:2026-04-21T14:21:23.194441388Z custom_metadata:map[] id:8a3cd010-1a3e-1918-1459-f873767c8a46 last_update_time:2026-04-21T14:21:23.194441388Z local:false merged_from_canonical_ids:<nil> metadata:<nil> mount_accessor:auth_approle_7135b542 mount_path:auth/approle/ mount_type:approle name:test-client]]
name                   test-client

After some research and testing, it seems that the scope claim does not matter so much for the actor token. For the full configuration to set up the identity secrets engine, review the example code.

Step 3: request a delegated access token from Vault

At this point, I realized I needed to create a custom secrets engine in Vault to support token exchange. I won't go into the specifics of developing the secrets engine, since most of it involved reading the spec and making sure it conformed to the right claims. The code for the plugin is on GitHub.

This plugin is not officially supported - its main intention is a proof-of-concept. As a result, use it with caution. Some important points:

The subject token's signature gets verified against a subject token's JWKS endpoint (OIDC provider).
The actor token's signature gets verified against the actor token's JWKS endpoint (identity secrets engine).
When requesting the delegated access token from Vault includes parameters for scope and aud.

This ensures the authenticity and integrity of the claims while keeping the implementation Vault-agnostic. While I could introspect the actor token directly against the identity secrets engine, I decided that a public JWKS endpoint was a better approach so I didn't have to pass a Vault token to the secrets engine.

After validating and verifying the subject and actor tokens, the custom secrets engine generates an access token with an act claim. The act claim identifies the actor who requested access on behalf of the end user. The custom secrets engine appends scope to audit the scope requested by each actor.

{
  "act": {
    "client_id": "test-client",
    "scope": "helloworld:read",
    "sub": "83b1d088-c7d5-b8a4-dd7b-99baca521f8d"
  },
  "aud": "helloworld-server",
  "client_id": "test-client",
  "exp": 1776796510,
  "iat": 1776792910,
  "iss": "$VAULT_ADDR/v1/sts",
  "scope": "helloworld:read",
  "sub": "50099deb-d0cf-911b-4310-64a173c542a6"
}

If you have another client agent who requests on behalf of a client agent, the secrets engine generates an access token with a nested act claim to denote the delegation chain. Use the delegated access token for the original client agent as the subject token for the second exchange. You need an actor token for the second agent, as the second agent is acting on behalf of the first client acting on behalf of the end user (confusing, I know). Based on RFC 8693, the custom secrets engine will only evaluate the top-level actor against the may_act claim. Nested actor claims are for audit purposes.

The custom secrets engine has to be registered with the Vault server. I won't dive too deeply into the registration workflow in this post. If you want to learn more, check out the Terraform configuration that downloads the plugin binaries to a PersistentVolume on Kubernetes and the script to register the binaries.

Step 4: Update A2A agents

The access token is what the client agent passes to the server agent. The server agent verifies the access token's signature against the custom secrets engine's JWKS endpoint, checks the aud claim matches the name of the server agent, and verifies the issuer comes from the custom secrets engine.

If the access token does not contain the correct aud or the correct scope claim, the server agent does not allow the client agent to access its skills. The server agent does not have any direct dependencies on Vault. It uses the custom secrets engine's OpenID Connect configuration endpoint to get the JWKS endpoint for token verification.

The client agent does need access to Vault in order to get the subject and actor token. Rather than have the client agent access the Vault API, I used Vault Agent to read the required credentials to generate subject and actor tokens from Vault and write them to files for the client agent to use.

## omitted for clarity

    template {
      metadata {
        labels = {
          app = local.test_client_name
        }
        annotations = {
          "vault.hashicorp.com/agent-inject"                              = "true"
          "vault.hashicorp.com/role"                                      = "test-client"
          "vault.hashicorp.com/agent-inject-token"                        = "true"
          "vault.hashicorp.com/agent-run-as-same-user"                    = "true"
          "vault.hashicorp.com/tls-skip-verify"                           = "true"
          "vault.hashicorp.com/agent-inject-secret-client_secrets.json"   = "identity/oidc/client/agent"
          "vault.hashicorp.com/agent-inject-template-client_secrets.json" = <<-EOT
            {
            {{- with secret "identity/oidc/client/agent" }}
                "client_id": "{{ .Data.client_id }}",
                "client_secret": "{{ .Data.client_secret }}",
                "redirect_uris": {{ .Data.redirect_uris | toJSON }}
            {{- end }}
            }
          EOT
          "vault.hashicorp.com/agent-inject-secret-oidc_provider.json"    = "identity/oidc/provider/agent/.well-known/openid-configuration"
          "vault.hashicorp.com/agent-inject-template-oidc_provider.json"  = <<-EOT
            {
            {{- with secret "identity/oidc/provider/agent/.well-known/openid-configuration" }}
                "authorization_endpoint": "{{ .Data.authorization_endpoint }}",
                "issuer": "{{ .Data.issuer }}",
                "token_endpoint": "{{ .Data.token_endpoint }}",
                "userinfo_endpoint": "{{ .Data.userinfo_endpoint }}"
            {{- end }}
            }
          EOT
          "vault.hashicorp.com/agent-inject-secret-actor_token"           = "identity/oidc/token/test-client"
          "vault.hashicorp.com/agent-inject-template-actor_token"         = <<-EOT
            {{- with secret "identity/oidc/token/test-client" -}}
            {{ .Data.token }}
            {{- end }}
          EOT
        }
      }

Note that test-client runs with a Kubernetes service account. I configured a Vault role for the Kubernetes auth method and an alias for the test-client entity tied to the test-client service account. This ensures that when the test-client requests an actor token, it has an entity ID.

resource "vault_identity_entity_alias" "client_agents" {
  for_each       = var.client_agents
  name           = kubernetes_service_account_v1.client_agents[each.key].metadata[0].uid
  mount_accessor = vault_auth_backend.kubernetes.accessor
  canonical_id   = vault_identity_entity.client_agents[each.key].id
}

resource "vault_kubernetes_auth_backend_role" "client_agents" {
  for_each                         = var.client_agents
  backend                          = vault_auth_backend.kubernetes.path
  role_name                        = each.key
  bound_service_account_names      = [each.key]
  bound_service_account_namespaces = [each.value.k8s_namespace]
  token_ttl                        = 3600
  token_policies                   = [vault_policy.actor_token[each.key].name, vault_policy.agent_oidc_client.name, vault_policy.oauth_exchange_token[each.key].name]
}

While you can write code in your A2A client agent to authenticate to Vault and get the credentials, I found it easier to use Vault Agent to write them to a file for the client agent to consume. When the credentials expire, Vault Agent will write new credentials to the file.

End-to-end workflow

To demonstrate the workflow, the test-client includes a UI that has the end user log in and obtain the subject token from Vault as an OIDC provider.

Then, the end user requests a delegated access token with a specific scope and subject to access the A2A server agent. The test-client receives an access token from Vault's custom secrets engine.

The test-client agent uses the access token to successfully request a message from helloworld-server.

There are two ways in which the client agent does not have sufficient permissions to act on behalf of the end user to call the server agent.

First, if the client agent's actor token identity does not match the end user's subject token may_act claim, the Vault custom secrets engine does not issue a delegated access token.

Second, if the test-client uses an access token with insufficient scopes or incorrect server agent as the subject, the server agent denies access.

As a check, I reviewed the Vault audit logs to verify if it logged the end user requests to the OIDC provider, actor token requests from the client agent, and the delegated access token request from the client agent. The good news - it does! However, you have to tune the secrets engine to output the claims as non-HMAC keys. For example, I used the vault secrets tune subcommand to make it more clear for me to read.

vault secrets tune -audit-non-hmac-request-keys=scope -audit-non-hmac-request-keys=subject -audit-non-hmac-request-keys=audience sts

By configuring Vault as an OIDC provider, the identity secrets engine for the actor token, and a custom token exchange secrets engine for delegation, you can track and enforce some agent-to-agent communication.

Summary

Overall, this turned out to be far more challenging to implement than expected. It took quite a bit of reverse engineering the specification, reviewing the idea with other folks, arguing with my coding agent, and deploying Vault repeatedly.

The custom secrets engine I created for token exchange has a workflow that should go in the identity secrets engine as it has the same general structure. For my purposes, I ended up developing it as a separate secrets engine so I don't have to maintain a fork of the identity secrets engine plugin. I learned quite a bit about entity IDs and OAuth 2.0 in the process.

I do see a few problems with the approach. An administrator has to configure may_act claims for Vault entities and clients and assign (effectively) a role to every client agent. While this is something you can automate, I imagine it can get fairly complicated and challenging to maintain. It's also deterministic, which doesn't quite address the fact that agents are autonomous and might choose to act on other's behalf. As I am not comfortable letting an agent run amok with minimal supervision, I am fine with the administrative overhead.

Another problem is actually where to enforce the scope of what the client agent can do with the server agent. This is probably where an AI gateway would help, especially as it can review the access tokens and identify what a client agent can do with an MCP server or server agent. At the very least, this workflow does enable some kind of authentication request tracking so you can audit if and when a client agent requested access to a server agent or MCP server. I'll try working on this another day, probably with ContextForge.

In the meantime, if you're interested in how this works, check out the demo repository, which deploys a Kubernetes cluster and all of the components and configuration for Vault.

Using AI for Terraform: flows, prompts, and agents with LangFlow & Docling

Rosemary Wang — Mon, 02 Feb 2026 19:38:35 +0000

In part 1, I learned to deploy a local stack for running an AI agent. Originally, I thought I could generate "good" Terraform configuration based on all my content for secure and scalable infrastructure as code.

Shortly after, I received a message encouraging me to think about the second edition of my book. I have procrastinated on this for a while now since it takes time to revise material and generate new examples. One of the biggest challenges with my book was writing examples. Initial feedback suggested I write everything in Python for greater accessibility and Google Cloud for lower cost, which I did. In retrospect, I should have just written everything in Terraform to run on AWS.

As I reflected on this further, I realized something important. Isn't book writing the perfect use case for an AI agent? If I had agent who knew my writing style help me write new examples in Terraform for my book, maybe I could expedite the process of creating a second edition.

With my regrets in mind, I decided to try to create a "book writing" agent that helps me generate examples to match my text. After all, I had the chapters of the book written. I wanted new examples to reframe some of the principles and practices. This sent me on a major exploration of prompts, agent instructions, and flows in LangFlow.

Process PDFs with Docling

I had editable drafts of the chapters, but only the final PDF version of the book had the proper code annotations and figures. I needed to process the PDF book chapters into text and images before chunking and storing them in my OpenSearch vector database. Enter Docling, a document processing tool for unstructured data.

Fortunately, LangFlow has a Docling component for processing a set of files and chunking them. You do have to install it before you run LangFlow.

uv pip install langflow[docling]

When you start creating a flow in LangFlow, drag-and-drop the Docling component.

There are a few attributes you need to consider with Docling:

Pipeline - I opted for standard just to process the text. If I wanted to also process the figures, I could select vlm (Visual Language Model).
OCR Engine - None for now. I wanted to test if I had sufficient resources to run Docling.

My laptop constrains the amount of memory Docling can use to process the documents, which is why I did not upload the entire book or use VLM or OCR.

Next, I need to chunk the text into my vector database. Rather than do fixed-size chunking, I decided to try hybrid chunking which combines fixed-size chunking with semantic chunking. This ensures that the various chapters of my book have chunks with proper context. After chunking three chapters of my book, I stored the chunks in the OpenSearch vector database using Granite embeddings hosted by Ollama.

Besides the PDF chapters of the book, I had a few blogs on best practices for writing Terraform. These had some text and examples that I wanted to include as part of the agent's response. Using the URL component, I added the set of URLs for the blog posts and passed it to the vector database.

Now that I had my expert-level content on infrastructure as code and Terraform practices into my vector database, I could use an agent to reference that context.

Create the Terraform coding agent

I started with what I thought was the easier agent to build - a coding agent that generates "good" Terraform. This agent needed to generate Terraform configuration with the following:

Proper resource and module declarations - no hallucinations please
Correct formatting
All variables and outputs defined

In addition to writing proper working Terraform, I wanted the agent to include a good example. I had a demo repository that I constantly copied and pasted into other repositories, so I wanted the agent to reference that configuration when the prompt matched.

With all these requirements, I realized I need to use two MCP servers:

Terraform MCP server for the latest resource and module documentation
GitHub MCP server for getting files in my reference repository

I did not need all the tools available on these MCP servers. From an access control perspective, I used the following for each MCP server:

Terraform MCP server - get_latest_module_version, get_latest_provider_version, get_module_details, get_provider_details, get_provider_capabilities, search_modules, and search_providers
GitHub MCP server - get_file_contents and search_code

My agent could retrieve the latest Terraform modules and providers or search GitHub for reference code. I connected the GitHub and Terraform MCP servers with the URL component to my agent using LangFlow.

Finally, the coding agent needed instructions. I started with the official agent skills for Terraform and refined the instructions to better suite Granite and my use case. It took quite a bit of trial-and-error. The full set of instructions are on a GitHub repository. I had two main observations.

First, I had be very specific about which repository and commit the agent should reference for a "good" Terraform example. It turns out the MCP component in LangFlow cannot handle optional parameters at the time of this post so I had to put the exact commit hash and branch the agent should reference in the agent instructions.

Second, the prompt had to include the specific module and resource I wanted the example to include (e.g., Create an example with the aws_opensearchserverless_collection resource.) If I did not include the exact module or resource, Granite would search for the wrong module or resource with the Terraform MCP server. After adjusting my expectations on the prompts, I moved onto the book writing agent.

Create the book writing agent

Why did I separate the book writing task into its own agent? I discovered that combining both Terraform generation and an explanation based on context from OpenSearch led to very poor results. The agent was tasked with doing too much ("generate the explanation AND the example"), which led to some garbled responses that made little sense.

I decided to split the book writing into its own agent so it could properly draft a response that sounds like I wrote the paragraph and not just reiterating the Terraform configuration. This worked much better overall. I also moved the book writing agent first, since it could generate the explanation and the Terraform coding agent could adjust the example.

I connected the output of the writer agent to the input of the coding agent. This ensures that the explanation includes references to the expected resources and examples from user input.

The full set of instructions are on the demo GitHub repository. I wanted it to include the sources so I could check exactly where it found the information.

With both agents, I am going to do more work to refine the prompts. They do a decent job of producing a semi-coherent explanation and example but I think I can improve them with more review.

The result

I passed in a few prompts to test out my agents. While the results needed some edits, they turned out more usable than I expected. For example, I asked the following:

Explain the singleton pattern using the aws_opensearchserverless_collection resource. Include ideas on when to refactor from a singleton to a composite module.

The agent returned a very lengthy explanation, which I'll include a few excerpts below. The first paragraph wasn't completely incorrect but needed to remove the mention of Google project resources since I wanted to use AWS for examples.

The singleton pattern is commonly used in Terraform configurations to manage resources that should exist only once within an environment, such as Google project configurations. In the context of AWS OpenSearch Serverless collections, this pattern ensures that there is a single instance of the collection resource, which typically does not change frequently.

As for the example, it did use the resource I requested but it did not use the right arguments. For example, domain_id doesn't exist for the aws_opensearchserverless_collection.

resource "aws_opensearchserverless_collection" "main" {
  name        = var.collection_name
  domain_id   = aws_opensearch_serverless_domain.main.id
  description = "Primary collection for ${var.environment}"

  tags = merge(
    local.common_tags,
    {
      Environment = var.environment
      Project     = var.project_name
    }
  )
}

When I asked the agent to explain when to refactor from a singleton to composite module, it provided a correct but very generic explanation.

Refactoring from a singleton pattern to a composite module becomes necessary when:

Resource Evolution: Resources evolve beyond single instances, requiring multiple configurations or variations.

Shared Configurations: Multiple resources need shared configurations that are not strictly unique but require common parameters.

Complex Workflows: The infrastructure management involves complex workflows where different components interact and share state.

I asked the book writing agent to apply the other principles when possible, which it did try in later paragraphs. In general, the agent provided a good start for me to edit and iterate on the explanation. I would not use the response as-is, since it has some incorrect points and the explanation is far too generic, but it does write it in the style and tone of voice that I would write in my book.

Conclusion

I learned quite a bit about prompt engineering while trying to build a book writing agent for myself. In some situations, I had to be very specific about how and where an agent should refer to certain tools or data sources. In some of the first iterations, the agent kept bringing up other principles like simplicity, which I do not mention in my book. I had to ask the agent for the source of the principle, which was another book entirely.

In general, the agent did improve over time. The more I asked of it and provided feedback, the better the response it generated. However, I still wouldn't use the response in the book without some editing. I could use the examples, for the most part, but I had to check them correctness and clarity.

Next, I plan on moving these components off my local machine into some cloud infrastructure.

Using AI for Terraform: running locally with Langflow, OpenSearch, & Ollama

Rosemary Wang — Tue, 13 Jan 2026 14:53:29 +0000

I'm a pragmatist at heart. While I don't fully believe in using AI for everything, I did find myself getting very frustrated with my copy and paste process for "good" Terraform configuration. I already wrote Terraform configuration that ran with many resources and was mostly secure by default anyway. Why did I have to go back two or three years to an example and then update it? Could I really use AI to write some new demo code?

I realized I had a lot of content I could reference and get myself out of the copy-paste whirlpool. Most of the time, I looked up:

Slides of old talks with accurate diagrams
Some old code from two or three specific repositories on GitHub
My book
Terraform modules in the registry

The problem? I know how to build infrastructure with Terraform but I know nothing about AI. So I decided to learn.

When I started blogging and trying to learn technology for myself, I ran everything locally and avoided paying for resources. That meant using the free credits for most cloud or managed offerings and working within a resource-constrained system. For this series, I decided on the following tools:

Ollama for models
Langflow for no-code/low-code agentic development
OpenSearch for vector search
Docling to process my PDF documents

As for the model, I was willing to try some of the "open" models like Granite through Ollama. If they didn't work, I would try others.

Building something to run models

As a starting point, I ran everything in containers. If I need more resources, I would move it to some cloud deployment later. With a Docker Compose file, I deployed Ollama, Langflow, and OpenSearch.

Ollama runs models on your local machine. Since I get impatient waiting for Ollama to start and pull the models, I built a Docker container with Ollama and pre-pulled models and embeddings.

FROM ollama/ollama:latest

COPY ./init-ollama.sh /tmp/init-ollama.sh

WORKDIR /tmp

RUN chmod +x init-ollama.sh \
   && ./init-ollama.sh

EXPOSE 11434

In this example, I used granite4:tiny-h since I am running it locally on my laptop.

#!/usr/bin/env bash

ollama serve &
ollama list
ollama pull granite4:tiny-h
ollama pull granite-embedding:30m

Deploying an agent toolchain

I do not know how to write an AI agent. I also didn't feel like coding a whole agent toolchain to achieve my goal of writing infrastructure for my purposes. Luckily, I found Langflow, which offers a no-code/low-code way to deploy AI agents and MCP servers. I created a Dockerfile for Langflow.

FROM langflowai/langflow:1.7.2

USER root

RUN apt update && apt install -y libgl1 libglib2.0-0 && uv pip install langflow[docling]

CMD ["python", "-m", "langflow", "run", "--host", "0.0.0.0", "--port", "7860"]

Initially, I did use the Langflow image without creating a custom Dockerfile. Unfortunately, the Docling component I wanted to use for processing PDF chapters of my book needed a dependency installed. I built that into my own Langflow image so I didn't have to run the install separately.

Building a RAG stack for context

It turns out that retrieval augmented generation (RAG) is an important part of getting my use case working. I have context that I want my agent to use, so that information needs to be processed and stored in a vector database.

I chose Opensearch because I deployed it before and it was something I could run locally. Unfortunately, it turns out that using Opensearch as a vector database for Docling requires some additional configuration. Supposedly, Opensearch creates the index if it doesn't already exist. For some reason, it is unclear if auto-creation works for a simple index or it also works for a vector index. I kept getting errors from Langflow that the index did not exist.

As a workaround, I reverse engineered the index and manually requested the Opensearch API to create an empty index. At this point, I was tired of writing scripts and resorted to asking Project Bob, an AI software agent, for help. I think I asked it to generate me a Dockerfile for Opensearch with a step to create a vector index named "langflow" with "ef_search" of 512 and a property named "chunk_embeddings" of "knn_vector" type with 384 dimensions. It gave me a pretty good script as a response, complete with the proper API call.

#!/bin/bash

# Wait for OpenSearch to be ready
echo "Waiting for OpenSearch to start..."
until curl -s http://localhost:9200/_cluster/health > /dev/null; do
   sleep 2
done

echo "OpenSearch is ready. Creating 'langflow' index..."

# Create the langflow index with vector search configuration
curl -X PUT "http://localhost:9200/langflow" -H 'Content-Type: application/json' -d'
{
 "settings": {
   "index": {
     "knn": true,
     "knn.algo_param.ef_search": 512
   }
 },
 "mappings": {
   "properties": {
     "chunk_embedding": {
       "type": "knn_vector",
       "dimension": 384
     }
   }
 }
}
'

echo ""
echo "Index 'langflow' created successfully!"

Next, Bob created a Dockerfile out of the script. Bob was a bit verbose but the script did work with some modifications.

FROM opensearchproject/opensearch:3

# Copy the initialization script
COPY ./init-opensearch.sh /usr/share/opensearch/init-opensearch.sh

# Create a wrapper script to run both OpenSearch and the init script
RUN echo '#!/bin/bash' > /usr/share/opensearch/entrypoint-wrapper.sh && \
   echo '/usr/share/opensearch/opensearch-docker-entrypoint.sh opensearch &' >> /usr/share/opensearch/entrypoint-wrapper.sh && \
   echo 'sleep 5' >> /usr/share/opensearch/entrypoint-wrapper.sh && \
   echo '/usr/share/opensearch/init-opensearch.sh' >> /usr/share/opensearch/entrypoint-wrapper.sh && \
   echo 'wait' >> /usr/share/opensearch/entrypoint-wrapper.sh && \
   chmod +x /usr/share/opensearch/entrypoint-wrapper.sh

# Use the wrapper as the entrypoint
ENTRYPOINT ["/usr/share/opensearch/entrypoint-wrapper.sh"]

As I said before, I am pragmatic about my use of AI for coding. I didn't want to use something like Bob to speed things up but I got tired and thought, "Why not?" I think the Dockerfile and script were generated in about two minutes compared to the hour that it took me to write and test the Ollama one. The result was functional but I wouldn't use AI to generate anything I didn't have the confidence to verify or test myself.

Putting it all together

I created the set of containers using Docker Compose on my local machine, including the Terraform MCP server. By using the MCP server for the Terraform registry, I could search the public modules and providers available to expedite new examples and versions of modules I used before.

Each of the containers includes a set of environment variables to enable it to run locally. Some variables, like those for Opensearch, disable security plugins and enable demo configurations for ease of use.

services:

 terraform-mcp-server:
   image: hashicorp/terraform-mcp-server:0.3.3
   container_name: terraform-mcp-server
   ports:
     - "8080:8080"
   environment:
     - 'TRANSPORT_MODE=streamable-http'
     - 'TRANSPORT_HOST=0.0.0.0'
     - 'TFE_TOKEN=${TFE_TOKEN}'

 ollama:
   build:
     context: Dockerfiles
     dockerfile: Dockerfile.ollama
   container_name: ollama
   ports:
     - "11434:11434"
   volumes:
     - ollama_data:/root/.ollama
   restart: unless-stopped
   environment:
     - 'OLLAMA_CONTEXT_LENGTH=131072'

 langflow:
   build:
     context: Dockerfiles
     dockerfile: Dockerfile.langflow
   container_name: langflow
   ports:
     - "7860:7860"
   environment:
     - 'LANGFLOW_HOST=0.0.0.0'
     - 'LANGFLOW_OPEN_BROWSER=false'
     - 'LANGFLOW_WORKER_TIMEOUT=1800'
   volumes:
     - langflow_data:/app/langflow
   depends_on:
     - ollama
   restart: unless-stopped

 opensearch:
   build:
     context: Dockerfiles
     dockerfile: Dockerfile.opensearch
   container_name: opensearch
   environment:
     - cluster.name=opensearch-cluster
     - node.name=opensearch-node1
     - discovery.type=single-node
     - bootstrap.memory_lock=true
     - "OPENSEARCH_JAVA_OPTS=-Xms512m -Xmx512m"
     - "DISABLE_INSTALL_DEMO_CONFIG=true"
     - "DISABLE_SECURITY_PLUGIN=true"
   ulimits:
     memlock:
       soft: -1
       hard: -1
     nofile:
       soft: 65536
       hard: 65536
   volumes:
     - opensearch_data:/usr/share/opensearch/data
   ports:
     - "9200:9200"
     - "9600:9600"
   restart: unless-stopped

volumes:
 ollama_data: {}
 langflow_data: {}
 opensearch_data: {}

Each component represents an important part of the AI stack, such as prompts, agents, context, and models. After they came up, I could access Langflow on http://localhost:7860.

The remaining components I could access via API or connect it to a flow in Langflow.

Conclusion

After much trial and error, I managed to figure out how to create a local stack to build out an AI agent to help me update my examples based on knowledge from my book, talks, and other code examples. As I explore more, I will add more tools and context to improve the agent (and maybe even build multiple agents).

I realized I could probably use the AI agent built into my coding IDE to do most of this. Project Bob did end up helping in the process to build this stack and it did make it faster. The downside to using any AI agent built into my coding IDE was the overall cost. I quickly realized that I had to check my usage to ensure I didn't make too many requests.

I was glad that I could run this locally. The small Granite model really helped - I only had to give Ollama a little bit more CPU and memory to run the model. I could mitigate the cost of running against hosted LLMs and maybe achieve a similar result. I found the process of deploying each component valuable as a learning experience.

Next, I plan on building a flow in Langflow to process all of my book chapters, slides, and code examples before passing it to an agent to process.