Forem: Suhas Mallesh

Azure ML Pipelines + Azure DevOps: CI/CD for ML with Terraform 🔁

Suhas Mallesh — Sat, 25 Apr 2026 07:00:00 +0000

Manual ML retraining is a reliability risk. Azure ML Pipelines orchestrates the ML workflow while Azure DevOps automates testing, validation, and deployment on every code push. Here's how to build the full CI/CD stack with Terraform.

Through Series 5, we've built the workspace, deployed endpoints, and set up the feature store. The final piece is automation. Right now, retraining means a data scientist manually submits a job, checks the accuracy, and updates the endpoint. That's a bottleneck.

Azure ML Pipelines (SDK v2) orchestrates the ML workflow as reusable components connected into a DAG - preprocessing, training, evaluation, conditional registration. Azure DevOps provides the CI/CD layer: unit tests, pipeline submission, and gated deployment on every code merge. Terraform provisions everything. 🎯

🏗️ The CI/CD Architecture

Code push to Azure Repos / GitHub
    ↓
Azure DevOps Pipeline trigger fires
    ↓
Stage 1 (CI): lint → unit tests → validate components
    ↓
Stage 2 (CD): submit Azure ML Pipeline job
    ↓
Azure ML Pipeline: preprocess → train → evaluate → condition
    ↓
Pass: register model → manual approval gate → deploy endpoint
Fail: pipeline exits with error notification

Component	Role
Azure ML Pipeline	Reusable component DAG (the ML workflow)
Azure DevOps	CI/CD: test, validate, submit pipeline on code push
Schedule (SDK v2)	Recurring pipeline runs for continuous retraining
Model Registry	Version and approve trained models
Approval Gate	Human review before production deployment

🔧 Terraform: CI/CD Infrastructure

Service Principal for DevOps

# devops/service_principal.tf

data "azuread_client_config" "current" {}

resource "azuread_application" "devops" {
  display_name = "${var.environment}-ml-devops-sp"
}

resource "azuread_service_principal" "devops" {
  client_id = azuread_application.devops.client_id
}

resource "azuread_service_principal_password" "devops" {
  service_principal_id = azuread_service_principal.devops.id
}

# DevOps SP needs Contributor on the ML workspace
resource "azurerm_role_assignment" "devops_ml" {
  scope                = azurerm_machine_learning_workspace.this.id
  role_definition_name = "Contributor"
  principal_id         = azuread_service_principal.devops.object_id
}

Storage for Pipeline Artifacts

# devops/pipeline_storage.tf

resource "azurerm_storage_container" "pipeline_artifacts" {
  name                  = "pipeline-artifacts"
  storage_account_id    = azurerm_storage_account.ml.id
  container_access_type = "private"
}

Azure DevOps Project and Service Connection (via azuredevops provider)

# devops/azuredevops.tf

terraform {
  required_providers {
    azuredevops = {
      source  = "microsoft/azuredevops"
      version = "~> 1.0"
    }
  }
}

resource "azuredevops_project" "ml" {
  name               = "${var.environment}-ml-platform"
  visibility         = "private"
  version_control    = "Git"
  work_item_template = "Agile"
}

resource "azuredevops_serviceendpoint_azurerm" "ml_workspace" {
  project_id            = azuredevops_project.ml.id
  service_endpoint_name = "azure-ml-connection"

  credentials {
    serviceprincipalid  = azuread_application.devops.client_id
    serviceprincipalkey = azuread_service_principal_password.devops.value
  }

  environment           = "AzureCloud"
  resource_group        = azurerm_resource_group.ml.name
  subscription_id       = data.azurerm_client_config.current.subscription_id
  subscription_name     = data.azurerm_subscription.current.display_name
}

resource "azuredevops_build_definition" "ml_pipeline" {
  project_id = azuredevops_project.ml.id
  name       = "${var.environment}-ml-pipeline"

  ci_trigger {
    use_yaml = true
  }

  repository {
    repo_type   = "GitHub"
    repo_id     = "${var.github_owner}/${var.github_repo}"
    branch_name = var.deploy_branch
    yml_path    = "azure-devops/ml-pipeline.yml"
  }

  variable {
    name  = "ENVIRONMENT"
    value = var.environment
  }

  variable {
    name           = "WORKSPACE_NAME"
    value          = azurerm_machine_learning_workspace.this.name
    is_secret      = false
  }
}

🔧 Azure DevOps Pipeline YAML

This file lives in your repo and runs on every push to the deploy branch:

# azure-devops/ml-pipeline.yml

trigger:
  branches:
    include:
      - main

variables:
  SUBSCRIPTION_ID: $(subscriptionId)
  RESOURCE_GROUP: $(resourceGroup)
  WORKSPACE_NAME: $(workspaceName)
  ENVIRONMENT: $(environment)

stages:
  - stage: CI
    displayName: "Test and Validate"
    jobs:
      - job: Test
        pool:
          vmImage: "ubuntu-latest"
        steps:
          - task: UsePythonVersion@0
            inputs:
              versionSpec: "3.11"

          - script: pip install -r requirements.txt
            displayName: "Install dependencies"

          - script: python -m pytest pipelines/tests/ -v
            displayName: "Run unit tests"

          - script: python pipelines/validate_components.py
            displayName: "Validate component definitions"

  - stage: CD
    displayName: "Submit ML Pipeline"
    dependsOn: CI
    condition: succeeded()
    jobs:
      - job: SubmitPipeline
        pool:
          vmImage: "ubuntu-latest"
        steps:
          - task: AzureCLI@2
            displayName: "Submit Azure ML Pipeline"
            inputs:
              azureSubscription: "azure-ml-connection"
              scriptType: "bash"
              scriptLocation: "inlineScript"
              inlineScript: |
                az ml job create \
                  --file pipelines/training-pipeline.yml \
                  --workspace-name $(WORKSPACE_NAME) \
                  --resource-group $(RESOURCE_GROUP) \
                  --subscription $(SUBSCRIPTION_ID) \
                  --stream

  - stage: Approval
    displayName: "Manual Approval Gate"
    dependsOn: CD
    condition: succeeded()
    jobs:
      - deployment: ApproveDeployment
        environment: "$(ENVIRONMENT)-ml-approval"
        strategy:
          runOnce:
            deploy:
              steps:
                - task: AzureCLI@2
                  displayName: "Deploy approved model to endpoint"
                  inputs:
                    azureSubscription: "azure-ml-connection"
                    scriptType: "bash"
                    scriptLocation: "inlineScript"
                    inlineScript: |
                      python scripts/deploy_approved_model.py \
                        --workspace $(WORKSPACE_NAME) \
                        --resource-group $(RESOURCE_GROUP) \
                        --endpoint-name $(ENVIRONMENT)-my-endpoint

🐍 Azure ML Pipeline Definition (SDK v2)

# pipelines/training_pipeline.py

from azure.ai.ml import MLClient, Input, Output
from azure.ai.ml.dsl import pipeline
from azure.ai.ml.entities import (
    CommandComponent,
    RecurrenceTrigger,
    JobSchedule,
)
from azure.identity import DefaultAzureCredential

ml_client = MLClient(
    DefaultAzureCredential(),
    subscription_id="...",
    resource_group_name="...",
    workspace_name="...",
)

# Define components
preprocess = CommandComponent(
    name="preprocess",
    command="python preprocess.py --input ${{inputs.raw_data}} --output ${{outputs.processed_data}}",
    environment="azureml:sklearn-env:1",
    inputs={"raw_data": {"type": "uri_folder"}},
    outputs={"processed_data": {"type": "uri_folder"}},
)

train = CommandComponent(
    name="train",
    command="python train.py --data ${{inputs.data}} --model-output ${{outputs.model}} --accuracy-output ${{outputs.accuracy}}",
    environment="azureml:sklearn-env:1",
    inputs={"data": {"type": "uri_folder"}},
    outputs={"model": {"type": "uri_folder"}, "accuracy": {"type": "uri_file"}},
)

@pipeline(name="training-pipeline", compute="cpu-cluster")
def training_pipeline(raw_data: Input(type="uri_folder")):
    preprocess_step = preprocess(raw_data=raw_data)
    train_step = train(data=preprocess_step.outputs.processed_data)
    return {"model": train_step.outputs.model}

# Submit pipeline
pipeline_job = training_pipeline(
    raw_data=Input(path="azureml://datastores/workspaceblobstore/paths/data/")
)
ml_client.jobs.create_or_update(pipeline_job, experiment_name="training-runs")

Scheduled Recurring Training (SDK v2)

from azure.ai.ml.entities import RecurrenceTrigger, JobSchedule

schedule = JobSchedule(
    name=f"{environment}-daily-training",
    trigger=RecurrenceTrigger(frequency="day", interval=1, start_time="2026-01-01T02:00:00"),
    create_job=pipeline_job,
)

ml_client.schedules.begin_create_or_update(schedule).result()

📐 Environment Configuration

# environments/dev.tfvars
environment    = "dev"
deploy_branch  = "develop"

# environments/prod.tfvars
environment    = "prod"
deploy_branch  = "main"

Approval gates in Azure DevOps are configured per environment in the UI: Pipelines → Environments → prod-ml-approval → Approvals and checks. Add team members as required approvers before any production deployment proceeds.

⚠️ Gotchas and Tips

Use SDK v2 only. SDK v1 reached end-of-support in March 2025 and will fully stop working in June 2026. All pipelines should use azure-ai-ml (SDK v2) and CLI v2.

Service principal secrets need rotation. The azuread_service_principal_password expires. Use federated identity (OIDC) in Azure DevOps for a secretless authentication alternative that doesn't require rotation.

AzureML Job Wait task for long-running jobs. Training jobs can take hours. Use the AzureML Job Wait task in Azure DevOps to hold the pipeline until the ML job completes before proceeding to the approval stage.

Component versioning. Register components in the Azure ML registry with versions. This ensures pipeline runs are reproducible - you know exactly which version of each component ran for any historical job.

Schedules live in the workspace, not Terraform. Azure ML job schedules are created via SDK v2 or CLI v2 and live in the workspace. They're not managed by Terraform directly. Include schedule creation in your DevOps pipeline's deploy stage.

⏭️ Series 5 Complete!

This is Post 4 of the Azure ML Pipelines & MLOps with Terraform series, and the final post of Series 5.

Post 1: Azure ML Workspace 🔬
Post 2: Azure ML Online Endpoints 🚀
Post 3: Azure ML Feature Store 🗃️
Post 4: Azure ML Pipelines + Azure DevOps (you are here) 🔁

Your ML workflow is automated. Azure DevOps tests and validates on every push. Azure ML Pipelines runs the DAG. Models that pass evaluation register automatically. Manual approval gates protect production. All provisioned with Terraform. 🔁

Thanks for following the full Series 5! Series 6 coming soon. 💬

Vertex AI Pipelines + Cloud Build: CI/CD for ML on GCP with Terraform 🔁

Suhas Mallesh — Fri, 24 Apr 2026 07:00:00 +0000

Manual ML retraining doesn't scale. Vertex AI Pipelines orchestrates your ML DAG while Cloud Build automates testing, compiling, and deploying updated pipelines on every code push. Here's how to wire it all together with Terraform.

Through Series 5, we've built the Workbench, deployed endpoints, and set up the Feature Store. The final piece is automation. Right now, retraining means a data scientist manually runs a notebook, checks metrics, and updates the endpoint. That's a bottleneck and a reliability risk.

GCP's ML CI/CD stack uses two services together: Vertex AI Pipelines orchestrates the ML workflow (preprocessing, training, evaluation, registration) as a managed DAG. Cloud Build provides the CI/CD layer that tests your pipeline code, compiles it, uploads it to GCS, and runs it on a schedule or trigger. Terraform provisions the infrastructure for both. 🎯

🏗️ The CI/CD Architecture

Code push to GitHub/Cloud Source Repos
    ↓
Cloud Build trigger fires
    ↓
Cloud Build: run tests → compile KFP pipeline → upload to GCS
    ↓
Cloud Scheduler: daily trigger → run Vertex AI Pipeline
    ↓
Pipeline DAG: preprocess → train → evaluate → condition
    ↓
Pass: register model → approve → deploy to endpoint
Fail: pipeline exits with error

Component	Role
Vertex AI Pipelines	Managed KFP pipeline execution (the ML DAG)
Cloud Build	CI/CD: test, compile, upload pipeline on code push
Cloud Scheduler	Trigger pipeline on a cron schedule
GCS	Store compiled pipeline specs (`.json`)
Vertex AI Model Registry	Version and approve trained models

🔧 Terraform: CI/CD Infrastructure

APIs and Service Account

# pipelines/apis.tf

resource "google_project_service" "required" {
  for_each = toset([
    "aiplatform.googleapis.com",
    "cloudbuild.googleapis.com",
    "cloudscheduler.googleapis.com",
    "storage.googleapis.com",
    "artifactregistry.googleapis.com",
  ])
  project = var.project_id
  service = each.value
}

resource "google_service_account" "pipeline_runner" {
  account_id   = "${var.environment}-pipeline-runner"
  display_name = "Vertex AI Pipeline Runner"
  project      = var.project_id
}

resource "google_project_iam_member" "pipeline_roles" {
  for_each = toset([
    "roles/aiplatform.user",
    "roles/storage.objectAdmin",
    "roles/bigquery.dataEditor",
    "roles/bigquery.jobUser",
  ])
  project = var.project_id
  role    = each.value
  member  = "serviceAccount:${google_service_account.pipeline_runner.email}"
}

GCS Bucket for Pipeline Artifacts

# pipelines/storage.tf

resource "google_storage_bucket" "pipeline_root" {
  name          = "${var.project_id}-${var.environment}-pipeline-root"
  location      = var.region
  force_destroy = var.environment != "prod"

  versioning {
    enabled = true
  }

  labels = {
    environment = var.environment
    managed_by  = "terraform"
  }
}

resource "google_storage_bucket" "pipeline_specs" {
  name          = "${var.project_id}-${var.environment}-pipeline-specs"
  location      = var.region
  force_destroy = var.environment != "prod"
}

Cloud Build Trigger

# pipelines/cloudbuild.tf

resource "google_cloudbuild_trigger" "pipeline_deploy" {
  name     = "${var.environment}-ml-pipeline-deploy"
  project  = var.project_id
  location = var.region

  github {
    owner = var.github_owner
    name  = var.github_repo
    push {
      branch = var.deploy_branch  # e.g. "main" for prod, "develop" for dev
    }
  }

  filename = "cloudbuild/pipeline-deploy.yaml"

  substitutions = {
    _ENVIRONMENT        = var.environment
    _PIPELINE_ROOT      = "gs://${google_storage_bucket.pipeline_root.name}"
    _PIPELINE_SPECS_GCS = "gs://${google_storage_bucket.pipeline_specs.name}/specs/"
    _REGION             = var.region
    _PROJECT_ID         = var.project_id
    _SA_EMAIL           = google_service_account.pipeline_runner.email
  }

  service_account = google_service_account.cloudbuild_sa.id
}

Cloud Scheduler: Run on Schedule

# pipelines/scheduler.tf

resource "google_cloud_scheduler_job" "pipeline_schedule" {
  name     = "${var.environment}-training-pipeline"
  region   = var.region
  project  = var.project_id
  schedule = var.pipeline_schedule   # e.g. "0 2 * * *"
  time_zone = "UTC"

  http_target {
    uri         = "https://${var.region}-aiplatform.googleapis.com/v1/projects/${var.project_id}/locations/${var.region}/pipelineJobs"
    http_method = "POST"

    body = base64encode(jsonencode({
      displayName = "${var.environment}-training-run"
      pipelineSpec = {}
      templateUri  = "gs://${google_storage_bucket.pipeline_specs.name}/specs/training-pipeline.json"
      runtimeConfig = {
        gcsOutputDirectory = "gs://${google_storage_bucket.pipeline_root.name}/runs/"
        parameterValues = {
          project_id   = var.project_id
          region       = var.region
          data_gcs_uri = var.training_data_uri
          model_name   = var.model_name
        }
      }
      serviceAccount = google_service_account.pipeline_runner.email
    }))

    oauth_token {
      service_account_email = google_service_account.pipeline_runner.email
    }
  }
}

🔧 Cloud Build Config (cloudbuild/pipeline-deploy.yaml)

This file lives in your repo and runs on every push to the deploy branch:

# cloudbuild/pipeline-deploy.yaml

steps:
  # Step 1: Install dependencies
  - name: "python:3.11"
    entrypoint: pip
    args: ["install", "-r", "requirements.txt", "--user"]

  # Step 2: Run unit tests on pipeline components
  - name: "python:3.11"
    entrypoint: python
    args: ["-m", "pytest", "pipelines/tests/", "-v"]

  # Step 3: Compile the Vertex AI Pipeline
  - name: "python:3.11"
    entrypoint: python
    args: ["pipelines/compile.py", "--output", "/workspace/training-pipeline.json"]
    env:
      - "PROJECT_ID=$PROJECT_ID"
      - "REGION=$_REGION"

  # Step 4: Upload compiled pipeline spec to GCS
  - name: "gcr.io/cloud-builders/gsutil"
    args: ["cp", "/workspace/training-pipeline.json", "${_PIPELINE_SPECS_GCS}training-pipeline.json"]

  # Step 5: (Optional) Run a quick end-to-end test on dev
  - name: "python:3.11"
    entrypoint: python
    args: ["pipelines/run.py", "--pipeline-spec", "${_PIPELINE_SPECS_GCS}training-pipeline.json"]
    env:
      - "ENVIRONMENT=$_ENVIRONMENT"
    id: "e2e-test"

substitutions:
  _ENVIRONMENT: dev
  _PIPELINE_SPECS_GCS: gs://my-bucket/specs/
  _REGION: us-central1

🐍 KFP Pipeline Definition

# pipelines/compile.py

from kfp import dsl, compiler
from kfp.dsl import component
from google.cloud import aiplatform

@component(base_image="python:3.11", packages_to_install=["scikit-learn", "pandas"])
def preprocess(data_uri: str, output_uri: str) -> None:
    import pandas as pd
    df = pd.read_parquet(data_uri)
    # ... preprocessing logic ...
    df.to_parquet(output_uri)

@component(base_image="python:3.11", packages_to_install=["scikit-learn"])
def train(data_uri: str, model_uri: str) -> float:
    # ... training logic ...
    # Returns accuracy
    return accuracy

@component(base_image="python:3.11")
def register_model(model_uri: str, accuracy: float, project: str, region: str, model_name: str) -> None:
    aiplatform.init(project=project, location=region)
    model = aiplatform.Model.upload(
        display_name=model_name,
        artifact_uri=model_uri,
        serving_container_image_uri="us-docker.pkg.dev/vertex-ai/prediction/sklearn-cpu.1-3:latest",
    )

@dsl.pipeline(name="training-pipeline")
def training_pipeline(
    project_id: str,
    region: str,
    data_gcs_uri: str,
    model_name: str,
    accuracy_threshold: float = 0.85,
):
    preprocess_task = preprocess(data_uri=data_gcs_uri, output_uri=f"gs://pipeline-root/processed/")
    train_task = train(data_uri=preprocess_task.output, model_uri=f"gs://pipeline-root/model/")

    with dsl.If(train_task.output >= accuracy_threshold, name="AccuracyGate"):
        register_model(
            model_uri=f"gs://pipeline-root/model/",
            accuracy=train_task.output,
            project=project_id,
            region=region,
            model_name=model_name,
        )

if __name__ == "__main__":
    compiler.Compiler().compile(training_pipeline, "training-pipeline.json")

📐 Environment Configuration

# environments/dev.tfvars
environment        = "dev"
deploy_branch      = "develop"
pipeline_schedule  = "0 6 * * *"    # Daily at 6am UTC
model_name         = "my-model-dev"

# environments/prod.tfvars
environment        = "prod"
deploy_branch      = "main"
pipeline_schedule  = "0 2 * * *"    # Daily at 2am UTC
model_name         = "my-model"

⚠️ Gotchas and Tips

Two separate pipelines. Cloud Build is the CI/CD pipeline for your code. Vertex AI Pipelines is the ML orchestration DAG. They serve different purposes and run independently.

Compile on every push. The Cloud Build step compiles the KFP pipeline from Python code to JSON on every merge. This catches pipeline definition errors early and ensures GCS always has the latest spec.

Pipeline spec versioning. Upload compiled specs with a version suffix (commit hash or timestamp) alongside latest. This enables rollback to any previous pipeline version: training-pipeline-abc123.json.

Cloud Scheduler vs Eventarc. Cloud Scheduler runs pipelines on a fixed cron. For event-driven triggers (new data in GCS), use Eventarc to trigger a Cloud Function that submits the pipeline job.

Service account for Cloud Build. Give the Cloud Build trigger a dedicated service account with roles/aiplatform.user and roles/storage.objectAdmin. Avoid using the default Cloud Build SA which has overly broad permissions.

⏭️ Series 5 Complete!

This is Post 4 of the GCP ML Pipelines & MLOps with Terraform series.

Post 1: Vertex AI Workbench 🔬
Post 2: Vertex AI Endpoints 🚀
Post 3: Vertex AI Feature Store 🗃️
Post 4: Vertex AI Pipelines + Cloud Build (you are here) 🔁

Your ML workflow is automated. Cloud Build tests and compiles your pipeline on every code push. Cloud Scheduler runs it on a cron. Vertex AI Pipelines executes the DAG. Models that pass evaluation register automatically. All provisioned with Terraform. 🔁

Found this helpful? Follow for the next series! 💬

Agentic AWS - Day 2: Amazon Bedrock AgentCore Runtime

Suhas Mallesh — Fri, 24 Apr 2026 07:00:00 +0000

Series: Agentic AWS | Post: 2 of 6 | Cloud: AWS

Why Agents Need Their Own Runtime

A Lambda function times out in 15 minutes. An EC2 instance charges you whether the agent is thinking or idle. An ECS task requires container orchestration expertise before you write a single line of agent logic.

AI agents have fundamentally different runtime requirements - they run for minutes to hours, maintain session context across tool calls, need isolated execution per user, and must scale from zero to many concurrent sessions without pre-provisioning.

AgentCore Runtime is a serverless execution environment purpose-built for exactly this workload. It hosts your agent code in ARM64 containers with up to 8-hour execution windows, full session isolation, built-in observability, and native support for both HTTP and the A2A (Agent-to-Agent) protocol. You bring the agent logic; Runtime handles everything else.

In Post 1 we built an AgentCore Gateway that exposes an order-status Lambda as an MCP tool. This post deploys the agent itself - the process that calls that gateway, reasons with Claude, and serves user requests - onto AgentCore Runtime via Terraform and container deployment.

Architecture

Client (curl / SDK)
        |
        | HTTPS + JWT auth
        v
AgentCore Runtime endpoint
        |  (session-isolated container per user)
        v
Agent container (Python + Strands SDK)
        |
        | MCP streamable HTTP + SigV4
        v
AgentCore Gateway  (from Post 1)
        |
        v
Lambda: order-status-tool

Each user session gets its own isolated container instance. Session state - conversation history, in-flight tool calls - lives in that container for the duration of the session. When the session idles past the timeout, the container is reaped and you stop paying.

Agent Code

The agent runs as a long-lived HTTP server inside the container. AgentCore Runtime routes requests to it via the /invocations endpoint.

# agent/main.py
import json
import os
import asyncio
from http.server import HTTPServer, BaseHTTPRequestHandler
from strands import Agent
from strands.tools.mcp import MCPClient
from mcp.client.streamable_http import streamablehttp_client
import boto3
from botocore.auth import SigV4Auth
from botocore.awsrequest import AWSRequest

GATEWAY_ENDPOINT = os.environ["GATEWAY_ENDPOINT"]
AWS_REGION = os.environ.get("AWS_REGION", "us-east-1")
MODEL_ID = os.environ.get("MODEL_ID", "anthropic.claude-3-5-sonnet-20241022-v2:0")

session_creds = boto3.session.Session().get_credentials().resolve()


def signed_headers(url: str) -> dict:
    """SigV4 signed headers for AgentCore Gateway inbound IAM auth."""
    request = AWSRequest(method="POST", url=url)
    SigV4Auth(session_creds, "bedrock", AWS_REGION).add_auth(request)
    return dict(request.headers)


async def build_agent() -> Agent:
    """
    Connect to AgentCore Gateway, load available tools,
    and return a Strands Agent ready to handle requests.
    """
    headers = signed_headers(GATEWAY_ENDPOINT)
    mcp_client = MCPClient(
        lambda: streamablehttp_client(GATEWAY_ENDPOINT, headers=headers)
    )
    tools = await mcp_client.get_tools()

    return Agent(
        model=MODEL_ID,
        tools=tools,
        system_prompt=(
            "You are a helpful order support agent. "
            "Use your tools to look up order status and shipping details. "
            "Always confirm the order ID before making tool calls."
        ),
    )


# Build agent once at container startup - reused across requests in the session
agent = asyncio.run(build_agent())


class AgentHandler(BaseHTTPRequestHandler):
    """
    AgentCore Runtime expects a POST /invocations endpoint.
    Request body: {"prompt": "user message", "session_id": "..."}
    Response body: {"response": "agent reply"}
    """

    def do_POST(self):
        if self.path != "/invocations":
            self.send_response(404)
            self.end_headers()
            return

        length = int(self.headers.get("Content-Length", 0))
        body = json.loads(self.rfile.read(length))
        prompt = body.get("prompt", "")

        try:
            result = asyncio.run(agent.invoke_async(prompt))
            response_body = json.dumps({"response": result.message})
            self.send_response(200)
            self.send_header("Content-Type", "application/json")
            self.end_headers()
            self.wfile.write(response_body.encode())
        except Exception as e:
            error_body = json.dumps({"error": str(e)})
            self.send_response(500)
            self.send_header("Content-Type", "application/json")
            self.end_headers()
            self.wfile.write(error_body.encode())

    def log_message(self, format, *args):
        # AgentCore Runtime captures stdout/stderr to CloudWatch
        print(f"[{self.address_string()}] {format % args}")


if __name__ == "__main__":
    port = int(os.environ.get("PORT", 8080))
    print(f"AgentCore Runtime agent listening on port {port}")
    server = HTTPServer(("0.0.0.0", port), AgentHandler)
    server.serve_forever()

# agent/Dockerfile
FROM public.ecr.aws/amazonlinux/amazonlinux:2023-minimal

RUN dnf install -y python3.12 python3.12-pip && dnf clean all

WORKDIR /app

COPY requirements.txt .
RUN pip3.12 install --no-cache-dir -r requirements.txt

COPY main.py .

# AgentCore Runtime routes traffic to port 8080 by default
EXPOSE 8080

CMD ["python3.12", "main.py"]

# agent/requirements.txt
strands-agents>=0.1.0
mcp>=1.0.0
boto3>=1.35.0

Terraform Infrastructure

Variables

# variables.tf
variable "aws_region" {
  type = string
}

variable "environment" {
  type = string
}

variable "project_name" {
  type    = string
  default = "agentic-aws"
}

variable "gateway_endpoint" {
  description = "AgentCore Gateway MCP endpoint URL (output from Post 1 stack)"
  type        = string
}

variable "gateway_arn" {
  description = "AgentCore Gateway ARN for IAM policy (output from Post 1 stack)"
  type        = string
}

variable "idle_session_timeout_seconds" {
  description = "Seconds before an idle session container is reaped"
  type        = number
  default     = 1800
}

variable "max_session_lifetime_seconds" {
  description = "Hard ceiling on session duration (max 28800 = 8 hours)"
  type        = number
  default     = 7200
}

variable "container_cpu" {
  description = "vCPU units for the agent container (1024 = 1 vCPU)"
  type        = number
  default     = 1024
}

variable "container_memory_mb" {
  description = "Memory in MB for the agent container"
  type        = number
  default     = 2048
}

# dev.tfvars
aws_region                   = "us-east-1"
environment                  = "dev"
idle_session_timeout_seconds = 600    # 10 min - aggressive cleanup in dev
max_session_lifetime_seconds = 3600   # 1 hour ceiling in dev
container_cpu                = 512
container_memory_mb          = 1024

# prod.tfvars
aws_region                   = "us-east-1"
environment                  = "prod"
idle_session_timeout_seconds = 1800   # 30 min idle tolerance
max_session_lifetime_seconds = 28800  # Full 8-hour window
container_cpu                = 1024
container_memory_mb          = 2048

ECR Repository

# ecr.tf
resource "aws_ecr_repository" "agent" {
  name                 = "${var.project_name}-agent-${var.environment}"
  image_tag_mutability = "MUTABLE"

  image_scanning_configuration {
    scan_on_push = true
  }

  encryption_configuration {
    encryption_type = "AES256"
  }
}

resource "aws_ecr_lifecycle_policy" "agent" {
  repository = aws_ecr_repository.agent.name

  policy = jsonencode({
    rules = [{
      rulePriority = 1
      description  = "Keep last 10 images"
      selection = {
        tagStatus   = "any"
        countType   = "imageCountMoreThan"
        countNumber = 10
      }
      action = { type = "expire" }
    }]
  })
}

output "ecr_repository_url" {
  value = aws_ecr_repository.agent.repository_url
}

IAM for Runtime

# iam.tf

# Execution role - assumed by AgentCore Runtime service
resource "aws_iam_role" "runtime_execution" {
  name = "${var.project_name}-runtime-exec-${var.environment}"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Effect    = "Allow"
      Principal = { Service = "bedrock.amazonaws.com" }
      Action    = "sts:AssumeRole"
    }]
  })
}

resource "aws_iam_role_policy" "runtime_permissions" {
  name = "runtime-permissions"
  role = aws_iam_role.runtime_execution.id

  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Sid      = "BedrockModelAccess"
        Effect   = "Allow"
        Action   = "bedrock:InvokeModel"
        Resource = "arn:aws:bedrock:${var.aws_region}::foundation-model/*"
      },
      {
        Sid      = "AgentCoreGatewayAccess"
        Effect   = "Allow"
        Action   = "bedrock:InvokeAgentCoreGateway"
        Resource = var.gateway_arn
      },
      {
        Sid    = "CloudWatchLogs"
        Effect = "Allow"
        Action = [
          "logs:CreateLogGroup",
          "logs:CreateLogStream",
          "logs:PutLogEvents"
        ]
        Resource = "arn:aws:logs:${var.aws_region}:*:log-group:/aws/bedrock-agentcore/*"
      },
      {
        Sid      = "ECRPull"
        Effect   = "Allow"
        Action   = [
          "ecr:GetDownloadUrlForLayer",
          "ecr:BatchGetImage",
          "ecr:GetAuthorizationToken"
        ]
        Resource = "*"
      }
    ]
  })
}

AgentCore Runtime Resource

# runtime.tf

resource "aws_bedrockagentcore_agent_runtime" "main" {
  name        = "${var.project_name}-runtime-${var.environment}"
  description = "Agentic AWS order support agent runtime"

  # Container image deployed to ECR
  runtime_artifact = {
    container_image = {
      uri = "${aws_ecr_repository.agent.repository_url}:latest"
    }
  }

  # IAM role the runtime assumes
  role_arn = aws_iam_role.runtime_execution.arn

  # Environment variables injected into every container instance
  environment_variables = {
    GATEWAY_ENDPOINT = var.gateway_endpoint
    AWS_REGION       = var.aws_region
    MODEL_ID         = "anthropic.claude-3-5-sonnet-20241022-v2:0"
  }

  # Session lifecycle controls
  session_idle_timeout_in_seconds = var.idle_session_timeout_seconds
  max_session_duration_in_seconds = var.max_session_lifetime_seconds

  # Protocol: HTTP for standard request/response, A2A for multi-agent
  server_protocol = "HTTP"

  # JWT authorizer - validates tokens before requests reach the container
  # Remove authorizer_configuration block for unauthenticated dev testing
  authorizer_configuration = {
    jwt = {
      discovery_url      = "https://cognito-idp.${var.aws_region}.amazonaws.com/${aws_cognito_user_pool.agents.id}/.well-known/openid-configuration"
      allowed_audience   = ["agentcore-runtime-${var.environment}"]
    }
  }

  # Resource limits per container instance
  compute_configuration = {
    cpu    = var.container_cpu
    memory = var.container_memory_mb
  }

  depends_on = [aws_iam_role_policy.runtime_permissions]
}

output "runtime_endpoint" {
  description = "HTTPS endpoint to invoke the agent"
  value       = aws_bedrockagentcore_agent_runtime.main.endpoint_url
}

output "runtime_arn" {
  value = aws_bedrockagentcore_agent_runtime.main.arn
}

Cognito for JWT Auth

# cognito.tf

resource "aws_cognito_user_pool" "agents" {
  name = "${var.project_name}-agents-${var.environment}"
}

resource "aws_cognito_user_pool_client" "runtime_client" {
  name         = "runtime-client-${var.environment}"
  user_pool_id = aws_cognito_user_pool.agents.id

  generate_secret                      = true
  allowed_oauth_flows                  = ["client_credentials"]
  allowed_oauth_flows_user_pool_client = true
  allowed_oauth_scopes                 = ["agentcore-runtime-${var.environment}/invoke"]

  explicit_auth_flows = ["ALLOW_USER_SRP_AUTH", "ALLOW_REFRESH_TOKEN_AUTH"]
}

resource "aws_cognito_user_pool_domain" "agents" {
  domain       = "${var.project_name}-agents-${var.environment}"
  user_pool_id = aws_cognito_user_pool.agents.id
}

output "cognito_token_url" {
  value = "https://${aws_cognito_user_pool_domain.agents.domain}.auth.${var.aws_region}.amazoncognito.com/oauth2/token"
}

output "cognito_client_id" {
  value = aws_cognito_user_pool_client.runtime_client.id
}

Build and Deploy

# 1. Terraform apply (provisions ECR + Runtime, outputs ECR URL)
terraform init
terraform apply -var-file=dev.tfvars

ECR_URL=$(terraform output -raw ecr_repository_url)
RUNTIME_ENDPOINT=$(terraform output -raw runtime_endpoint)

# 2. Build and push container image
# CRITICAL: target linux/arm64 - AgentCore Runtime is ARM64
aws ecr get-login-password --region us-east-1 | \
  docker login --username AWS --password-stdin $ECR_URL

docker build \
  --platform linux/arm64 \
  -t $ECR_URL:latest \
  ./agent

docker push $ECR_URL:latest

# 3. Get a JWT token from Cognito (client credentials flow)
TOKEN=$(aws cognito-idp initiate-auth \
  --auth-flow USER_SRP_AUTH \
  --client-id $(terraform output -raw cognito_client_id) \
  --query "AuthenticationResult.IdToken" \
  --output text)

# 4. Invoke the agent
curl -X POST $RUNTIME_ENDPOINT/invocations \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"prompt": "Where is order ORD-1234?", "session_id": "user-abc"}'

{
  "response": "Order ORD-1234 is currently in transit with FedEx. Tracking number 794601234567. Estimated delivery April 17, 2026."
}

How Session Isolation Works

When a request arrives with session_id: "user-abc", AgentCore Runtime routes it to the container instance bound to that session. If no instance exists yet, Runtime cold-starts one. Subsequent requests with the same session ID hit the same container - so the agent's in-memory conversation history persists across turns.

Two users with different session IDs get completely separate container instances. There is no shared memory, no shared state, no cross-contamination between sessions. This is the key property that makes AgentCore Runtime safe for multi-tenant production workloads without any application-level session management code.

When idle_session_timeout_seconds elapses with no requests, Runtime tears down the container. The next request for that session ID cold-starts a fresh instance. For stateful workflows that need memory to survive session teardown, Post 3 covers AgentCore Memory.

ARM64 Architecture - The Critical Gotcha

AgentCore Runtime runs on ARM64. Building your dependencies on an x86 machine produces silent import errors at runtime. Always build with --platform linux/arm64:

# Wrong - builds for your Mac or x86 CI runner
docker build -t my-agent .

# Correct - explicit ARM64 target
docker build --platform linux/arm64 -t my-agent .

If your CI pipeline runs on x86, add --platform linux/arm64 to every docker build command and ensure your base image has ARM64 variants available (the amazonlinux:2023-minimal image used above does).

Decision Framework

Scenario	Configuration	Notes
Dev / testing	No JWT authorizer, short idle timeout (10 min)	Saves cost, no token management overhead
Production	JWT authorizer (Cognito or OIDC), 30 min idle	Token validated before container is hit
Short workflows (< 30 min)	`max_session_lifetime_seconds = 1800`	Limit blast radius on runaway agents
Long-running research tasks	`max_session_lifetime_seconds = 28800`	Full 8-hour window
Multi-agent orchestration	`server_protocol = "A2A"`	Runtime acts as an A2A server; other agents can call it
VPC isolation required	Add `network_mode = "VPC"` + subnet/SG config	Traffic stays off public internet
Large ML deps (> 250MB)	Container deployment	ZIP limit is 250MB; containers support up to 1GB

Production Additions

VPC mode - Add network_mode = "VPC" with vpc_subnet_ids and vpc_security_group_ids to keep agent traffic inside your VPC. Combine with PrivateLink to reach AgentCore Gateway without public egress.

Observability - AgentCore Runtime emits token usage, session duration, latency, and error rates to CloudWatch automatically. No SDK instrumentation needed. For richer traces, add OpenTelemetry export to Datadog, LangFuse, or Langsmith from your agent code.

Secrets - Pass sensitive values (API keys, DB passwords) via AWS Secrets Manager, not environment variables. Environment variables are visible in the console. Fetch secrets at container startup with boto3.client("secretsmanager").

A2A protocol - Set server_protocol = "A2A" to expose the runtime as an Agent-to-Agent server. Other AgentCore Runtime agents can then call it as a sub-agent. Post 6 in this series builds a full multi-agent system on this capability.

What's Next

Post 3 covers AgentCore Memory - persistent context that survives session teardown. Without it, every new session starts from zero. Memory adds short-term (within session), long-term (across sessions), and episodic (experience-based learning) storage, all managed, with no vector database to provision.

The Runtime you built here connects to AgentCore Memory with a single configuration addition - no changes to agent code required.

Key Takeaways

AgentCore Runtime provides session-isolated, serverless containers for agent workloads - up to 8-hour execution windows with no pre-provisioned infrastructure
Always build container images targeting linux/arm64 - Runtime is ARM64 and silent import errors will bite you on x86 builds
Idle session timeout and max lifetime are the two most important cost controls - set them aggressively in dev
JWT authorization (Cognito or any OIDC provider) sits in front of the container - your agent code handles no auth logic
server_protocol = "A2A" turns the runtime into a callable sub-agent for multi-agent orchestration patterns

Series: Agentic AWS | Next: Post 3 - AgentCore Memory

SageMaker Pipelines: CI/CD for ML with Terraform 🔁

Suhas Mallesh — Thu, 23 Apr 2026 07:00:00 +0000

Manual model retraining is a reliability risk. SageMaker Pipelines automates the full ML lifecycle - preprocessing, training, evaluation, conditional registration, and deployment. Here's how to build it with Terraform and the Pipelines SDK.

Through Series 5, we've built the workspace, deployed endpoints, and set up the feature store. The missing piece is automation. Right now, retraining means someone manually running a notebook, evaluating results, and updating the endpoint. That doesn't scale and it's a reliability risk.

SageMaker Pipelines brings CI/CD discipline to ML: preprocessing, training, evaluation, conditional model registration, and endpoint deployment run automatically on a schedule or triggered by new data. Each pipeline run is tracked, reproducible, and auditable. Terraform provisions the infrastructure; the Pipelines SDK defines the DAG. 🎯

🏗️ Pipeline Architecture

Trigger (EventBridge schedule or S3 event)
    ↓
ProcessingStep  →  preprocess raw data
    ↓
TrainingStep    →  train model on processed data
    ↓
ProcessingStep  →  evaluate model metrics
    ↓
ConditionStep   →  if accuracy > threshold
    ↓                       ↓
RegisterModel         FailStep
    ↓
EventBridge     →  model approved → deploy to endpoint

Step	What It Does
ProcessingStep	Data preprocessing, feature engineering, evaluation
TrainingStep	Model training with SageMaker Training Jobs
ConditionStep	Gate on metric threshold before registering
ModelStep	Register model version in Model Registry
EventBridge	Trigger deployment on model approval

🔧 Terraform: Pipeline Infrastructure

IAM Role

# pipeline/iam.tf

resource "aws_iam_role" "pipeline_execution" {
  name = "${var.environment}-pipeline-execution"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Action    = "sts:AssumeRole"
      Effect    = "Allow"
      Principal = { Service = "sagemaker.amazonaws.com" }
    }]
  })
}

resource "aws_iam_role_policy_attachments_exclusive" "pipeline" {
  role_name = aws_iam_role.pipeline_execution.name
  policy_arns = [
    "arn:aws:iam::aws:policy/AmazonSageMakerFullAccess",
    "arn:aws:iam::aws:policy/AmazonS3FullAccess",
  ]
}

Model Package Group (Model Registry)

# pipeline/registry.tf

resource "aws_sagemaker_model_package_group" "this" {
  model_package_group_name        = "${var.environment}-${var.model_name}"
  model_package_group_description = "Model registry for ${var.model_name}"

  tags = {
    Environment = var.environment
    Model       = var.model_name
  }
}

The Pipeline

# pipeline/pipeline.tf

resource "aws_sagemaker_pipeline" "this" {
  pipeline_name         = "${var.environment}-${var.model_name}-pipeline"
  pipeline_display_name = "${var.environment}-${var.model_name}"
  role_arn              = aws_iam_role.pipeline_execution.arn

  pipeline_definition = templatefile(
    "${path.module}/pipeline_definition.json",
    {
      role_arn          = aws_iam_role.pipeline_execution.arn
      region            = var.region
      account_id        = data.aws_caller_identity.current.account_id
      model_group_name  = aws_sagemaker_model_package_group.this.model_package_group_name
      training_image    = var.training_image_uri
      processing_image  = var.processing_image_uri
      data_bucket       = var.data_bucket
      output_bucket     = var.output_bucket
      accuracy_threshold = var.accuracy_threshold
    }
  )

  tags = {
    Environment = var.environment
    Model       = var.model_name
  }
}

EventBridge: Scheduled Trigger

# pipeline/trigger.tf

resource "aws_scheduler_schedule" "pipeline_trigger" {
  name = "${var.environment}-${var.model_name}-pipeline-trigger"

  flexible_time_window {
    mode = "OFF"
  }

  schedule_expression = var.pipeline_schedule  # e.g. "cron(0 2 * * ? *)"

  target {
    arn      = "arn:aws:sagemaker:${var.region}:${data.aws_caller_identity.current.account_id}:pipeline/${aws_sagemaker_pipeline.this.pipeline_name}"
    role_arn = aws_iam_role.scheduler.arn

    sagemaker_pipeline_parameters {
      pipeline_parameter_list {
        name  = "InputDataUri"
        value = "s3://${var.data_bucket}/latest/"
      }
    }
  }
}

EventBridge: Auto-Deploy on Model Approval

# pipeline/deployment_trigger.tf

resource "aws_cloudwatch_event_rule" "model_approval" {
  name = "${var.environment}-model-approved"

  event_pattern = jsonencode({
    source      = ["aws.sagemaker"]
    detail-type = ["SageMaker Model Package State Change"]
    detail = {
      ModelPackageGroupName = [aws_sagemaker_model_package_group.this.model_package_group_name]
      ModelApprovalStatus   = ["Approved"]
    }
  })
}

resource "aws_cloudwatch_event_target" "deploy_lambda" {
  rule      = aws_cloudwatch_event_rule.model_approval.name
  target_id = "deploy-approved-model"
  arn       = aws_lambda_function.deploy_model.arn
}

When a model is approved in the registry, EventBridge triggers a Lambda that updates the SageMaker endpoint to the new model version.

🐍 Pipeline Definition (Pipelines SDK)

Terraform stores the pipeline definition as a JSON file generated by the SDK:

# generate_pipeline.py

import json
import boto3
from sagemaker.workflow.pipeline import Pipeline
from sagemaker.workflow.steps import ProcessingStep, TrainingStep
from sagemaker.workflow.model_step import ModelStep
from sagemaker.workflow.conditions import ConditionGreaterThanOrEqualTo
from sagemaker.workflow.condition_step import ConditionStep
from sagemaker.workflow.parameters import ParameterString
from sagemaker.sklearn.processing import SKLearnProcessor
from sagemaker.estimator import Estimator
from sagemaker.workflow.functions import JsonGet

role = "ROLE_ARN"
session = boto3.Session()

# Pipeline parameters
input_data = ParameterString(name="InputDataUri", default_value="s3://bucket/data/")

# Step 1: Preprocessing
preprocessor = SKLearnProcessor(
    framework_version="1.2-1",
    role=role,
    instance_type="ml.m5.large",
    instance_count=1,
)

step_process = ProcessingStep(
    name="Preprocess",
    processor=preprocessor,
    inputs=[...],
    outputs=[...],
    code="scripts/preprocess.py",
)

# Step 2: Training
estimator = Estimator(
    image_uri="TRAINING_IMAGE",
    role=role,
    instance_type="ml.m5.xlarge",
    instance_count=1,
    output_path="s3://output-bucket/models/",
)

step_train = TrainingStep(
    name="Train",
    estimator=estimator,
    inputs={"train": step_process.properties.ProcessingOutputConfig.Outputs["train"].S3Output.S3Uri},
)

# Step 3: Evaluation
step_eval = ProcessingStep(
    name="Evaluate",
    processor=preprocessor,
    inputs=[...],
    outputs=[...],
    code="scripts/evaluate.py",
    property_files=[evaluation_report],
)

# Step 4: Conditional registration
accuracy_condition = ConditionGreaterThanOrEqualTo(
    left=JsonGet(step_name=step_eval.name, property_file=evaluation_report, json_path="metrics.accuracy"),
    right=0.85,  # Threshold - override with var.accuracy_threshold
)

step_register = ModelStep(
    name="RegisterModel",
    step_args=model.register(
        content_types=["application/json"],
        response_types=["application/json"],
        inference_instances=["ml.m5.xlarge"],
        model_package_group_name="MODEL_GROUP_NAME",
        approval_status="PendingManualApproval",
    ),
)

step_condition = ConditionStep(
    name="CheckAccuracy",
    conditions=[accuracy_condition],
    if_steps=[step_register],
    else_steps=[],
)

# Build pipeline
pipeline = Pipeline(
    name="PIPELINE_NAME",
    parameters=[input_data],
    steps=[step_process, step_train, step_eval, step_condition],
)

# Export definition for Terraform
with open("pipeline_definition.json", "w") as f:
    json.dump(json.loads(pipeline.definition()), f, indent=2)

Run this script to generate pipeline_definition.json, then reference it in the aws_sagemaker_pipeline Terraform resource.

📐 Environment Configuration

# environments/dev.tfvars
model_name           = "fraud-detector"
accuracy_threshold   = 0.80
pipeline_schedule    = "cron(0 6 * * ? *)"  # Daily at 6am
training_image_uri   = "123456789012.dkr.ecr.us-east-1.amazonaws.com/training:dev"

# environments/prod.tfvars
model_name           = "fraud-detector"
accuracy_threshold   = 0.90
pipeline_schedule    = "cron(0 2 * * ? *)"  # Daily at 2am
training_image_uri   = "123456789012.dkr.ecr.us-east-1.amazonaws.com/training:v2.1.0"

Higher accuracy thresholds in prod. A model that clears 80% in dev might need 90% before auto-registering in prod. The ConditionStep enforces this gate automatically.

🔧 The CI/CD Flow

1. EventBridge fires on schedule
        ↓
2. SageMaker Pipeline starts
        ↓
3. Preprocessing job runs
        ↓
4. Training job runs
        ↓
5. Evaluation job computes accuracy
        ↓
6a. accuracy >= threshold → RegisterModel (PendingManualApproval)
6b. accuracy < threshold → Pipeline fails with clear error
        ↓
7. Human reviews model in Model Registry
        ↓
8. Approved → EventBridge fires
        ↓
9. Lambda updates SageMaker Endpoint to new model

Manual approval at step 7 is optional. Set approval_status = "Approved" in the registration step for fully automated deployments.

⚠️ Gotchas and Tips

Pipeline definition is JSON. The aws_sagemaker_pipeline resource takes a JSON string. Generate it with the SDK, store it as a file, and use templatefile() to inject Terraform variable values (role ARNs, bucket names, image URIs).

Each pipeline run is versioned. Every execution is logged with its inputs, outputs, and metrics. Use the SageMaker Studio Pipelines tab to inspect any historical run.

Container images must exist before terraform apply. The pipeline references training and processing container images. Build and push them to ECR before running Terraform.

The Model Registry is the gate. PendingManualApproval gives your team a human review step before deployment. Approved auto-deploys. Choose based on your risk tolerance per environment.

Lambda for deployment. The EventBridge-to-deployment pattern uses a Lambda to call the SageMaker API and update the endpoint. Keep the Lambda simple - just a boto3 call to update the endpoint config and create a new endpoint version.

⏭️ Series 5 Complete!

This is Post 4 of the ML Pipelines & MLOps with Terraform series.

Post 1: SageMaker Studio Domain 🔬
Post 2: SageMaker Endpoints 🚀
Post 3: SageMaker Feature Store 🗃️
Post 4: SageMaker Pipelines - CI/CD for ML (you are here) 🔁

Your ML workflow is automated. Scheduled retraining, metric-gated model registration, human approval gates, and automatic endpoint updates. From raw data to production, every step is tracked, reproducible, and auditable. 🔁

Found this helpful? Follow for the next series! 💬

Agentic AWS - Day 1: Amazon Bedrock AgentCore Gateway

Suhas Mallesh — Thu, 23 Apr 2026 07:00:00 +0000

Series: Agentic AWS | Post: 1 of 6 | Cloud: AWS

The Problem with DIY MCP Tool Servers

Every production AI agent needs tools - APIs, Lambda functions, internal services. Before AgentCore Gateway, connecting those tools to an agent meant writing your own MCP server, managing OAuth flows, handling protocol translation, building throttling, and wiring up observability. That is weeks of undifferentiated work before a single line of agent logic.

AgentCore Gateway eliminates that entirely. It is a fully managed MCP server that converts Lambda functions and OpenAPI specs into agent-ready tools - with built-in auth, routing, and semantic tool discovery - in zero code.

This post provisions an AgentCore Gateway via Terraform, registers a Lambda function as a tool target, and connects a Bedrock-powered agent to it using the MCP streamable HTTP transport.

Architecture

Bedrock Agent (Python + MCP client)
        |
        | streamable HTTP (MCP protocol)
        v
AgentCore Gateway  <-- IAM inbound auth
        |
        | IAM role assumption
        v
Lambda Target: order-status-tool
        |
        v
DynamoDB (mock order table)

The gateway handles inbound authentication (IAM or OAuth), routes MCP requests to the correct target, translates between MCP and the Lambda invocation protocol, and returns tool results back to the agent.

Terraform Infrastructure

Provider and Variables

# versions.tf
terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.80"
    }
  }
  required_version = ">= 1.6"
}

provider "aws" {
  region = var.aws_region
}

# variables.tf
variable "aws_region" {
  description = "AWS region for deployment"
  type        = string
}

variable "environment" {
  description = "Deployment environment (dev or prod)"
  type        = string
}

variable "project_name" {
  description = "Project prefix for resource naming"
  type        = string
  default     = "agentic-aws"
}

variable "semantic_search_enabled" {
  description = "Enable semantic tool search index on the gateway"
  type        = bool
  default     = false
}

variable "lambda_memory_mb" {
  description = "Lambda memory allocation in MB"
  type        = number
  default     = 256
}

# dev.tfvars
aws_region             = "us-east-1"
environment            = "dev"
semantic_search_enabled = false
lambda_memory_mb       = 256

# prod.tfvars
aws_region             = "us-east-1"
environment            = "prod"
semantic_search_enabled = true
lambda_memory_mb       = 512

Lambda Tool - Order Status

# lambda.tf

# IAM role for the Lambda function
resource "aws_iam_role" "order_tool_lambda" {
  name = "${var.project_name}-order-tool-lambda-${var.environment}"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Effect    = "Allow"
      Principal = { Service = "lambda.amazonaws.com" }
      Action    = "sts:AssumeRole"
    }]
  })
}

resource "aws_iam_role_policy_attachment" "lambda_basic" {
  role       = aws_iam_role.order_tool_lambda.name
  policy_arn = "arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole"
}

resource "aws_iam_role_policy" "lambda_dynamodb" {
  name = "dynamodb-read"
  role = aws_iam_role.order_tool_lambda.id

  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Effect   = "Allow"
      Action   = ["dynamodb:GetItem", "dynamodb:Query"]
      Resource = aws_dynamodb_table.orders.arn
    }]
  })
}

# Lambda function (zip from local source)
data "archive_file" "order_tool" {
  type        = "zip"
  source_dir  = "${path.module}/lambda/order_tool"
  output_path = "${path.module}/.build/order_tool.zip"
}

resource "aws_lambda_function" "order_tool" {
  function_name    = "${var.project_name}-order-tool-${var.environment}"
  role             = aws_iam_role.order_tool_lambda.arn
  filename         = data.archive_file.order_tool.output_path
  source_code_hash = data.archive_file.order_tool.output_base64sha256
  runtime          = "python3.12"
  handler          = "handler.lambda_handler"
  memory_size      = var.lambda_memory_mb
  timeout          = 30

  environment {
    variables = {
      ORDERS_TABLE = aws_dynamodb_table.orders.name
      ENVIRONMENT  = var.environment
    }
  }
}

# Mock orders table
resource "aws_dynamodb_table" "orders" {
  name         = "${var.project_name}-orders-${var.environment}"
  billing_mode = "PAY_PER_REQUEST"
  hash_key     = "order_id"

  attribute {
    name = "order_id"
    type = "S"
  }
}

AgentCore Gateway and Target

# gateway.tf

# IAM role that AgentCore Gateway assumes to invoke Lambda
resource "aws_iam_role" "gateway_execution" {
  name = "${var.project_name}-gateway-exec-${var.environment}"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Effect    = "Allow"
      Principal = { Service = "bedrock.amazonaws.com" }
      Action    = "sts:AssumeRole"
    }]
  })
}

resource "aws_iam_role_policy" "gateway_invoke_lambda" {
  name = "invoke-order-tool"
  role = aws_iam_role.gateway_execution.id

  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Effect   = "Allow"
      Action   = "lambda:InvokeFunction"
      Resource = aws_lambda_function.order_tool.arn
    }]
  })
}

# AgentCore Gateway
resource "aws_bedrock_agent_core_gateway" "main" {
  name        = "${var.project_name}-gateway-${var.environment}"
  description = "Agentic AWS - order management tool gateway"

  # Inbound auth: IAM - agents must sign requests with SigV4
  authorizer_configuration = {
    type = "AWS_IAM"
  }

  # Enable semantic tool search in prod for large tool sets
  search_type = var.semantic_search_enabled ? "SEMANTIC" : "LEXICAL"
}

# Lambda target - registers the order tool with the gateway
resource "aws_bedrock_agent_core_gateway_target" "order_tool" {
  gateway_id  = aws_bedrock_agent_core_gateway.main.id
  name        = "order-status-target"
  description = "Retrieves order status and shipment details"

  target_configuration = {
    lambda = {
      lambda_arn       = aws_lambda_function.order_tool.arn
      execution_role   = aws_iam_role.gateway_execution.arn
    }
  }
}

# IAM policy allowing the Bedrock agent caller to invoke the gateway
resource "aws_iam_policy" "invoke_gateway" {
  name        = "${var.project_name}-invoke-gateway-${var.environment}"
  description = "Allows agent runtime to call AgentCore Gateway"

  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Effect   = "Allow"
      Action   = "bedrock:InvokeAgentCoreGateway"
      Resource = aws_bedrock_agent_core_gateway.main.arn
    }]
  })
}

# Outputs consumed by the Python agent
output "gateway_endpoint" {
  description = "MCP streamable HTTP endpoint for the gateway"
  value       = aws_bedrock_agent_core_gateway.main.endpoint_url
}

output "gateway_arn" {
  description = "Gateway ARN for IAM policy references"
  value       = aws_bedrock_agent_core_gateway.main.arn
}

Lambda Tool Implementation

# lambda/order_tool/handler.py
import json
import os
import boto3
from boto3.dynamodb.conditions import Key

dynamodb = boto3.resource("dynamodb")
table = dynamodb.Table(os.environ["ORDERS_TABLE"])


def get_order_status(order_id: str) -> dict:
    """Retrieve order status from DynamoDB."""
    response = table.get_item(Key={"order_id": order_id})
    item = response.get("Item")

    if not item:
        return {"error": f"Order {order_id} not found"}

    return {
        "order_id": item["order_id"],
        "status": item.get("status", "unknown"),
        "carrier": item.get("carrier"),
        "tracking_number": item.get("tracking_number"),
        "estimated_delivery": item.get("estimated_delivery"),
    }


def lambda_handler(event: dict, context) -> dict:
    """
    AgentCore Gateway invokes Lambda with a standard tool call structure.
    The 'tool_name' field identifies which tool to execute.
    'tool_input' contains the parameters.
    """
    tool_name = event.get("tool_name", "")
    tool_input = event.get("tool_input", {})

    if tool_name == "get_order_status":
        order_id = tool_input.get("order_id")
        if not order_id:
            return {"error": "order_id is required"}
        result = get_order_status(order_id)
    else:
        result = {"error": f"Unknown tool: {tool_name}"}

    return {
        "tool_name": tool_name,
        "tool_result": result,
    }

The Lambda receives a normalized tool call envelope from AgentCore Gateway regardless of the upstream protocol. Your function does not need to understand MCP.

Python Agent - Connecting via MCP

# agent.py
import asyncio
import boto3
import os
from mcp import ClientSession
from mcp.client.streamable_http import streamablehttp_client
from botocore.auth import SigV4Auth
from botocore.awsrequest import AWSRequest
from botocore.credentials import Credentials
import httpx

GATEWAY_ENDPOINT = os.environ["GATEWAY_ENDPOINT"]  # from Terraform output
AWS_REGION = os.environ.get("AWS_REGION", "us-east-1")
MODEL_ID = "anthropic.claude-3-5-sonnet-20241022-v2:0"

bedrock = boto3.client("bedrock-runtime", region_name=AWS_REGION)
session_creds = boto3.session.Session().get_credentials().resolve()


def signed_headers(url: str, method: str = "POST") -> dict:
    """Generate SigV4 signed headers for AgentCore Gateway inbound IAM auth."""
    request = AWSRequest(method=method, url=url)
    SigV4Auth(session_creds, "bedrock", AWS_REGION).add_auth(request)
    return dict(request.headers)


async def run_agent(user_query: str):
    """
    Connect to AgentCore Gateway, discover available tools,
    and run a Bedrock-powered agent loop.
    """
    headers = signed_headers(GATEWAY_ENDPOINT)

    async with streamablehttp_client(GATEWAY_ENDPOINT, headers=headers) as (read, write, _):
        async with ClientSession(read, write) as mcp_session:
            await mcp_session.initialize()

            # Discover tools registered on the gateway
            tools_response = await mcp_session.list_tools()
            tools = [
                {
                    "name": t.name,
                    "description": t.description,
                    "input_schema": t.inputSchema,
                }
                for t in tools_response.tools
            ]

            print(f"Discovered {len(tools)} tools: {[t['name'] for t in tools]}")

            messages = [{"role": "user", "content": user_query}]

            # Agentic loop
            while True:
                response = bedrock.invoke_model(
                    modelId=MODEL_ID,
                    contentType="application/json",
                    body=json.dumps({
                        "anthropic_version": "bedrock-2023-05-31",
                        "max_tokens": 1024,
                        "tools": tools,
                        "messages": messages,
                    }),
                )

                result = json.loads(response["body"].read())
                stop_reason = result.get("stop_reason")
                content = result.get("content", [])

                # Append assistant turn
                messages.append({"role": "assistant", "content": content})

                if stop_reason == "end_turn":
                    # Extract final text response
                    for block in content:
                        if block.get("type") == "text":
                            print(f"\nAgent: {block['text']}")
                    break

                if stop_reason == "tool_use":
                    tool_results = []
                    for block in content:
                        if block.get("type") != "tool_use":
                            continue

                        tool_name = block["name"]
                        tool_input = block["input"]
                        tool_use_id = block["id"]

                        print(f"  -> Calling tool: {tool_name}({tool_input})")

                        # Invoke tool via AgentCore Gateway MCP
                        mcp_result = await mcp_session.call_tool(tool_name, tool_input)
                        tool_output = mcp_result.content[0].text if mcp_result.content else ""

                        tool_results.append({
                            "type": "tool_result",
                            "tool_use_id": tool_use_id,
                            "content": tool_output,
                        })

                    messages.append({"role": "user", "content": tool_results})
                else:
                    break


if __name__ == "__main__":
    import sys
    query = sys.argv[1] if len(sys.argv) > 1 else "What is the status of order ORD-1234?"
    asyncio.run(run_agent(query))

Run it:

export GATEWAY_ENDPOINT=$(terraform output -raw gateway_endpoint)
python agent.py "Where is my order ORD-1234?"

Discovered 1 tools: ['get_order_status']
  -> Calling tool: get_order_status({'order_id': 'ORD-1234'})

Agent: Your order ORD-1234 is currently in transit with FedEx.
       Tracking number: 794601234567. Estimated delivery: April 17, 2026.

How AgentCore Gateway Handles the Heavy Lifting

When the Python agent calls mcp_session.call_tool("get_order_status", ...), the following happens inside AgentCore Gateway automatically:

Inbound auth - The gateway validates the SigV4 signature against IAM. No valid credentials, no access. In production you can switch this to OAuth (Cognito, Okta, Auth0) with no change to your Lambda code.

Protocol translation - The MCP tool call is converted to a normalized Lambda invocation envelope. Your Lambda never sees raw MCP.

Routing - The gateway resolves which target owns get_order_status and invokes it. As you add more targets, the gateway routes by tool name automatically.

Outbound auth - The gateway assumes the gateway_execution IAM role you configured and invokes Lambda with it. Your Lambda never handles credentials.

Semantic search (prod) - With semantic_search_enabled = true, the gateway builds a vector index of all tool descriptions. The agent can call search_tools("find shipping info") to dynamically discover the right tool rather than enumerating all of them - essential when you have dozens of tools.

Decision Framework

Scenario	Target Type	Notes
Custom business logic, no existing API	Lambda	Full control, any language
Existing REST API with OpenAPI spec	OpenAPI target	Zero code, spec-driven
Existing MCP server (third-party)	MCP server target	GA Oct 2025 - connect existing servers
Inbound auth: service-to-service	IAM (SigV4)	Simplest, no token management
Inbound auth: user-delegated flows	OAuth (Cognito/Okta)	3LO for user context
Small tool set (< 20 tools)	Lexical search	Faster, cheaper
Large tool set (> 20 tools)	Semantic search	Better accuracy, slight cost

Production Additions

A few things to layer in before real traffic:

VPC integration - AgentCore Gateway supports VPC and PrivateLink (GA Oct 2025). Add vpc_configuration to the gateway resource to keep traffic off the public internet.
CloudWatch observability - AgentCore Observability emits token usage, latency, and error rates to CloudWatch automatically. No configuration needed.
Policy guardrails - AgentCore Policy can intercept tool calls before they execute and evaluate them against Cedar rules. Useful for "never issue refunds over $500 without human approval" type controls. Post 4 in this series covers Identity; Policy integrates directly with the gateway.
Multiple targets - Add more aws_bedrock_agent_core_gateway_target resources to the same gateway. Each target can have its own auth config and tool set. One gateway endpoint, many services.

What's Next

Post 2 covers AgentCore Runtime - the serverless execution environment where you deploy the agent itself. Runtime adds 8-hour execution windows, session isolation, A2A protocol support, and built-in observability for the agent process, not just the tool calls.

The gateway you built here connects directly to an AgentCore Runtime-hosted agent with no changes to the gateway configuration.

Key Takeaways

AgentCore Gateway converts Lambda functions and OpenAPI specs into MCP-compliant tools with zero code changes to your functions
Inbound IAM auth (SigV4) is the right starting point; OAuth is a drop-in upgrade for user-delegated scenarios
Semantic tool search pays off at scale - enable it in prod when you have more than 20 tools
The gateway handles auth, routing, and protocol translation; your Lambda handles only business logic
One gateway manages multiple targets - you get a single MCP endpoint for your entire tool estate

Series: Agentic AWS | Next: Post 2 - AgentCore Runtime

Azure ML Feature Store with Terraform: Managed Feature Materialization for Training and Inference 🗃️

Suhas Mallesh — Fri, 17 Apr 2026 07:00:00 +0000

Azure ML Feature Store is a specialized workspace that manages feature engineering, offline materialization to storage, and online serving with Redis. Terraform provisions the infrastructure, SDK defines feature sets. Here's how to build it.

In the previous posts, we set up the ML workspace and deployed endpoints. Now we need consistent features feeding those endpoints. Training uses historical features from batch sources. Inference needs the latest values in real time. When these diverge, your model's accuracy degrades silently.

Azure ML Feature Store is implemented as a special type of Azure ML workspace (kind = "FeatureStore"). It manages feature transformation pipelines, materializes features to offline storage (ADLS/Blob) and an online store (Redis), and provides point-in-time feature retrieval for training. Terraform provisions the infrastructure; the SDK defines entities, feature sets, and materialization schedules. 🎯

🏗️ Feature Store Architecture

Component	What It Does
Feature Store	Specialized ML workspace with `kind = "FeatureStore"`
Entity	Logical key (e.g., customer_id, account_id) shared across feature sets
Feature Set	Collection of features with transformation code and source definition
Offline Store	ADLS/Blob storage for materialized historical features
Online Store	Redis cache for low-latency inference lookups
Materialization	Spark jobs that compute and sync features on a schedule

The key concept: feature sets include transformation code. Raw data goes in, computed features come out. The same transformation runs for both offline materialization (training) and online materialization (inference), eliminating training-serving skew.

🔧 Terraform: Provision Feature Store Infrastructure

Feature Store Workspace

# feature_store/workspace.tf

resource "azurerm_machine_learning_workspace" "feature_store" {
  name                = "${var.environment}-feature-store"
  location            = azurerm_resource_group.ml.location
  resource_group_name = azurerm_resource_group.ml.name
  application_insights_id = azurerm_application_insights.ml.id
  key_vault_id            = azurerm_key_vault.ml.id
  storage_account_id      = azurerm_storage_account.ml.id

  kind = "FeatureStore"

  identity {
    type = "SystemAssigned"
  }

  tags = var.tags
}

kind = "FeatureStore" is the critical setting. This creates a workspace optimized for feature management rather than general ML development.

Offline Materialization Store

# feature_store/offline_store.tf

resource "azurerm_storage_account" "offline_store" {
  name                     = "${var.environment}fsoffline${random_string.suffix.result}"
  location                 = azurerm_resource_group.ml.location
  resource_group_name      = azurerm_resource_group.ml.name
  account_tier             = "Standard"
  account_replication_type = var.storage_replication
  is_hns_enabled           = true   # ADLS Gen2

  tags = var.tags
}

resource "azurerm_storage_container" "features" {
  name                  = "features"
  storage_account_id    = azurerm_storage_account.offline_store.id
  container_access_type = "private"
}

is_hns_enabled = true enables ADLS Gen2 hierarchical namespace, which is required for efficient feature materialization with Parquet files.

Online Store (Redis Cache)

# feature_store/online_store.tf

resource "azurerm_redis_cache" "online_store" {
  count               = var.enable_online_store ? 1 : 0
  name                = "${var.environment}-fs-redis"
  location            = azurerm_resource_group.ml.location
  resource_group_name = azurerm_resource_group.ml.name
  capacity            = var.redis_capacity
  family              = var.redis_family
  sku_name            = var.redis_sku
  minimum_tls_version = "1.2"

  redis_configuration {
    maxmemory_policy = "allkeys-lru"
  }

  tags = var.tags
}

The online store is optional. Enable it when you need low-latency feature lookups during inference. Skip it in dev if you only need offline features for training.

Compute for Materialization

# feature_store/compute.tf

resource "azurerm_machine_learning_compute_cluster" "materialization" {
  name                          = "${var.environment}-materialization"
  machine_learning_workspace_id = azurerm_machine_learning_workspace.feature_store.id
  location                      = azurerm_resource_group.ml.location
  vm_size                       = var.materialization_vm_size
  vm_priority                   = "LowPriority"

  identity {
    type = "SystemAssigned"
  }

  scale_settings {
    min_node_count                       = 0
    max_node_count                       = var.materialization_max_nodes
    scale_down_nodes_after_idle_duration  = "PT5M"
  }

  tags = var.tags
}

Materialization jobs run as Spark pipelines on this compute cluster. min_node_count = 0 means you pay nothing when no materialization is running.

🐍 Define Entities and Feature Sets (SDK)

Terraform provisions infrastructure. The SDK defines the feature engineering logic:

Create an Entity

from azure.ai.ml import MLClient
from azure.ai.ml.entities import FeatureStoreEntity, DataColumn
from azure.identity import DefaultAzureCredential

fs_client = MLClient(
    DefaultAzureCredential(),
    subscription_id="...",
    resource_group_name="...",
    workspace_name="prod-feature-store",
)

account_entity = FeatureStoreEntity(
    name="account",
    version="1",
    index_columns=[DataColumn(name="accountID", type="string")],
    description="Account entity for transaction features",
)

fs_client.feature_store_entities.begin_create_or_update(account_entity).result()

Entities define shared join keys. Multiple feature sets can reference the same entity, ensuring consistent joins.

Define Feature Set with Transformation Code

Feature set specification (YAML):

# featuresets/transactions/spec/FeaturesetSpec.yaml
$schema: https://azuremlschemas.azureedge.net/latest/featureSetSpec.schema.json

source:
  type: parquet
  path: abfss://data@storage.dfs.core.windows.net/transactions/
  timestamp_column:
    name: timestamp

feature_transformation_code:
  path: ./transformation_code
  transformer_class: transaction_transform.TransactionFeatureTransformer

features:
  - name: transaction_count_7d
    type: integer
  - name: avg_transaction_amount_7d
    type: float
  - name: total_spend_3d
    type: float
  - name: max_transaction_amount
    type: float

index_columns:
  - name: accountID
    type: string

Transformation code (Spark):

# transformation_code/transaction_transform.py
from pyspark.sql import DataFrame
from pyspark.sql import functions as F
from pyspark.sql.window import Window

class TransactionFeatureTransformer:
    def transform(self, raw_data: DataFrame) -> DataFrame:
        window_7d = Window.partitionBy("accountID").orderBy("timestamp").rangeBetween(-7*86400, 0)
        window_3d = Window.partitionBy("accountID").orderBy("timestamp").rangeBetween(-3*86400, 0)

        return raw_data.select(
            "accountID",
            "timestamp",
            F.count("*").over(window_7d).alias("transaction_count_7d"),
            F.avg("amount").over(window_7d).alias("avg_transaction_amount_7d"),
            F.sum("amount").over(window_3d).alias("total_spend_3d"),
            F.max("amount").over(window_7d).alias("max_transaction_amount"),
        )

Register and Materialize

from azure.ai.ml.entities import FeatureSet, FeatureSetSpecification

transaction_fset = FeatureSet(
    name="transactions",
    version="1",
    description="7-day and 3-day rolling transaction aggregations",
    entities=["azureml:account:1"],
    specification=FeatureSetSpecification(
        path="./featuresets/transactions/spec"
    ),
    tags={"data_type": "nonPII"},
)

fs_client.feature_sets.begin_create_or_update(transaction_fset).result()

Configure Materialization Schedule

from azure.ai.ml.entities import (
    MaterializationSettings,
    MaterializationComputeResource,
    RecurrenceTrigger,
)

materialization = MaterializationSettings(
    resource=MaterializationComputeResource(instance_type="Standard_E8s_v3"),
    schedule=RecurrenceTrigger(frequency="Hour", interval=6),
    offline_enabled=True,
    online_enabled=True,
)

fset = fs_client.feature_sets.get(name="transactions", version="1")
fset.materialization_settings = materialization
fs_client.feature_sets.begin_create_or_update(fset).result()

📐 Environment Configuration

# environments/dev.tfvars
environment              = "dev"
enable_online_store      = false        # No Redis in dev
storage_replication      = "LRS"
materialization_vm_size  = "Standard_E4s_v3"
materialization_max_nodes = 2

# environments/prod.tfvars
environment              = "prod"
enable_online_store      = true
redis_sku                = "Standard"
redis_capacity           = 1
redis_family             = "C"
storage_replication      = "GRS"
materialization_vm_size  = "Standard_E8s_v3"
materialization_max_nodes = 8

⚠️ Gotchas and Tips

Feature store is a workspace. It's implemented as kind = "FeatureStore" on azurerm_machine_learning_workspace. It needs the same dependencies (storage, KV, App Insights) as a regular workspace.

Transformation code runs as Spark. Feature transformations execute on the materialization compute cluster using PySpark. Test your transformations locally with a Spark session before registering.

Entities enforce consistent joins. Define entities once (e.g., "account" with key "accountID") and reuse across feature sets. This prevents mismatched join keys between teams.

Materialization costs. Each scheduled run spins up the compute cluster, runs the Spark job, and writes to storage. LowPriority VMs reduce cost. min_node_count = 0 ensures you pay nothing between runs.

Redis cost for online store. Standard Redis starts at ~$40/month. Premium with replication is ~$200/month. Skip online store in dev unless you're testing real-time inference.

Feature set versioning. Feature sets are versioned. Changing the transformation logic? Create version "2". This maintains backward compatibility for models still using version "1".

⏭️ What's Next

This is Post 3 of the Azure ML Pipelines & MLOps with Terraform series.

Post 1: Azure ML Workspace 🔬
Post 2: Azure ML Online Endpoints 🚀
Post 3: Azure ML Feature Store (you are here) 🗃️
Post 4: Azure ML Pipelines + Azure DevOps

Your features have a home. ADLS for offline training, Redis for online inference, Spark transformations that run the same code for both. No training-serving skew. Versioned feature sets with scheduled materialization, all provisioned with Terraform. 🗃️

Found this helpful? Follow for the full ML Pipelines & MLOps with Terraform series! 💬

Vertex AI Feature Store with Terraform: BigQuery Offline + Bigtable Online Serving 🗃️

Suhas Mallesh — Thu, 16 Apr 2026 07:00:00 +0000

Feature Store on GCP uses BigQuery as the offline store and Bigtable for low-latency online serving. Feature groups register your data, feature views sync it to the online store. Here's how to provision the full stack with Terraform.

In the previous posts, we set up Workbench for development and deployed endpoints for inference. But the features feeding those models need a home. Training uses historical features from BigQuery. Inference needs the latest values with sub-millisecond latency. When these two sources diverge, you get training-serving skew.

Vertex AI Feature Store bridges this gap. BigQuery is the offline store - your features live in tables you already manage. Bigtable is the online store - an auto-scaling, low-latency serving layer that syncs from BigQuery on a schedule. You don't copy data to a separate system. Feature Store reads directly from BigQuery and syncs to Bigtable for serving. 🎯

🏗️ Feature Store Architecture

Component	What It Does
Feature Group	Registers a BigQuery table as a feature source
Feature	Individual column within a feature group
Feature Online Store	Bigtable instance for real-time serving
Feature View	Defines which features sync to the online store
Data Sync	Scheduled or continuous sync from BigQuery to Bigtable

The key insight: BigQuery is already your offline store. You don't move data. Feature Store registers your existing BigQuery tables, then syncs selected features to Bigtable for online serving.

🔧 Terraform: Create the Feature Online Store

APIs

# feature_store/apis.tf

resource "google_project_service" "required" {
  for_each = toset([
    "aiplatform.googleapis.com",
    "bigtable.googleapis.com",
    "bigtableadmin.googleapis.com",
    "bigquery.googleapis.com",
  ])
  project = var.project_id
  service = each.value
}

Feature Online Store (Bigtable-backed)

# feature_store/online_store.tf

resource "google_vertex_ai_feature_online_store" "this" {
  name     = "${var.environment}-feature-store"
  region   = var.region
  project  = var.project_id

  bigtable {
    auto_scaling {
      min_node_count         = var.bigtable_min_nodes
      max_node_count         = var.bigtable_max_nodes
      cpu_utilization_target = var.bigtable_cpu_target
    }
  }

  labels = {
    environment = var.environment
    managed_by  = "terraform"
  }
}

Bigtable autoscaling adjusts nodes based on CPU utilization. Set cpu_utilization_target to 50-60% for production workloads. The store scales up automatically during traffic spikes and scales down during quiet periods.

Feature Group (Register BigQuery Source)

# feature_store/feature_group.tf

resource "google_vertex_ai_feature_group" "customer_features" {
  name     = "${var.environment}-customer-features"
  region   = var.region
  project  = var.project_id

  big_query {
    big_query_source {
      input_uri = "bq://${var.project_id}.${var.dataset_id}.${var.customer_features_table}"
    }
    entity_id_columns = ["customer_id"]
  }

  labels = {
    domain = "customer"
  }
}

entity_id_columns defines the primary key for feature lookups. This is what you use to retrieve features for a specific customer during inference.

Register Individual Features

# feature_store/features.tf

resource "google_vertex_ai_feature_group_feature" "total_purchases" {
  name           = "total_purchases"
  region         = var.region
  feature_group  = google_vertex_ai_feature_group.customer_features.name
  project        = var.project_id
}

resource "google_vertex_ai_feature_group_feature" "avg_order_value" {
  name           = "avg_order_value"
  region         = var.region
  feature_group  = google_vertex_ai_feature_group.customer_features.name
  project        = var.project_id
}

resource "google_vertex_ai_feature_group_feature" "days_since_last_purchase" {
  name           = "days_since_last_purchase"
  region         = var.region
  feature_group  = google_vertex_ai_feature_group.customer_features.name
  project        = var.project_id
}

Each feature maps to a column in your BigQuery table. Registering features enables metadata tracking, drift monitoring, and controlled syncing to the online store.

Feature View (Sync to Online Store)

# feature_store/feature_view.tf

resource "google_vertex_ai_feature_online_store_featureview" "customer_view" {
  name                 = "${var.environment}-customer-view"
  region               = var.region
  feature_online_store = google_vertex_ai_feature_online_store.this.name
  project              = var.project_id

  sync_config {
    cron = var.sync_schedule
  }

  feature_registry_source {
    feature_groups {
      feature_group_id = google_vertex_ai_feature_group.customer_features.name
      feature_ids      = [
        google_vertex_ai_feature_group_feature.total_purchases.name,
        google_vertex_ai_feature_group_feature.avg_order_value.name,
        google_vertex_ai_feature_group_feature.days_since_last_purchase.name,
      ]
    }
  }
}

The feature view selects which features from which groups sync to the online store. The cron schedule controls how frequently BigQuery data is synced to Bigtable.

📐 BigQuery Source Table Structure

Your BigQuery table needs an entity ID column and a feature timestamp:

CREATE TABLE `project.ml_features.customer_features` (
  customer_id STRING NOT NULL,
  feature_timestamp TIMESTAMP NOT NULL,
  total_purchases INT64,
  avg_order_value FLOAT64,
  days_since_last_purchase INT64,
  account_age_days INT64,
  is_premium BOOL
);

Feature Store reads this table directly. The feature_timestamp column enables point-in-time queries for training. The online store always serves the latest snapshot.

🐍 Read Features (SDK)

Online Store (Real-Time Inference)

from google.cloud import aiplatform

aiplatform.init(project="my-project", location="us-central1")

feature_online_store = aiplatform.FeatureOnlineStore("prod-feature-store")
feature_view = feature_online_store.get_feature_view("prod-customer-view")

# Fetch features for a specific customer
response = feature_view.fetch_feature_values(
    entity_ids=["cust-12345"],
)

for entity in response:
    print(entity.to_dict())
# {'customer_id': 'cust-12345', 'total_purchases': 47, 'avg_order_value': 89.5, ...}

Offline Store (Training via BigQuery)

from google.cloud import bigquery

client = bigquery.Client()

query = """
SELECT customer_id, total_purchases, avg_order_value, is_premium
FROM `project.ml_features.customer_features`
WHERE feature_timestamp BETWEEN '2025-01-01' AND '2025-12-31'
"""

training_df = client.query(query).to_dataframe()
print(f"Training data: {len(training_df)} rows")

No separate offline store to manage. Query BigQuery directly with standard SQL.

📐 Environment Configuration

# environments/dev.tfvars
environment          = "dev"
bigtable_min_nodes   = 1
bigtable_max_nodes   = 1
bigtable_cpu_target  = 80
sync_schedule        = "0 */6 * * *"    # Every 6 hours

# environments/prod.tfvars
environment          = "prod"
bigtable_min_nodes   = 1
bigtable_max_nodes   = 5
bigtable_cpu_target  = 50
sync_schedule        = "0 * * * *"      # Every hour

Sync frequency vs freshness: Hourly sync means online features can be up to 1 hour stale. For near-real-time features, use continuous data sync (requires Bigtable online serving and BigQuery source in specific regions).

⚠️ Gotchas and Tips

BigQuery is the source of truth. Unlike other feature stores where you ingest data into a proprietary system, Vertex AI Feature Store reads from BigQuery. Your existing ETL pipelines that write to BigQuery already feed the feature store.

Bigtable minimum cost. Even at 1 node, Bigtable costs roughly $0.65/hour (~$470/month). For dev environments, consider whether you need online serving at all, or if BigQuery direct queries suffice.

Optimized online serving is deprecated. As of May 2026, only Bigtable online serving is supported. Don't use optimized {} in new deployments. Migrate existing optimized stores to Bigtable.

Sync latency. Scheduled sync has an inherent delay based on your cron schedule. Continuous sync is near-real-time but only available in specific regions (us, eu, us-central1).

Feature monitoring. Register features through feature groups to enable drift detection and anomaly monitoring. Without registration, you lose this observability.

Bigtable serving latency. Expect ~30ms server-side latency at moderate load (~100 QPS). Client-side latency adds 5ms+. This is fast enough for most inference use cases but not sub-millisecond.

⏭️ What's Next

This is Post 3 of the GCP ML Pipelines & MLOps with Terraform series.

Post 1: Vertex AI Workbench 🔬
Post 2: Vertex AI Endpoints - Deploy to Prod 🚀
Post 3: Vertex AI Feature Store (you are here) 🗃️
Post 4: Vertex AI Pipelines + Cloud Build

Your features have a home. BigQuery for offline training, Bigtable for online serving, automatic sync between them. No data duplication. No training-serving skew. Your existing BigQuery tables are the source of truth, all provisioned with Terraform. 🗃️

Found this helpful? Follow for the full ML Pipelines & MLOps with Terraform series! 💬

SageMaker Feature Store with Terraform: Centralized ML Features for Training and Inference 🗃️

Suhas Mallesh — Wed, 15 Apr 2026 07:00:00 +0000

Features used for training must match features used for inference, or your model breaks silently. SageMaker Feature Store keeps them in sync with online (real-time) and offline (historical) stores. Here's how to provision it with Terraform.

In the previous posts, we set up the workspace and deployed endpoints. But there's a critical gap: features. Every ML model needs consistent, reliable feature data for both training (batch, historical) and inference (real-time, latest values). When training features and serving features diverge, you get training-serving skew, and your model's accuracy degrades silently.

SageMaker Feature Store solves this with a dual-store architecture. The online store provides low-latency access to the latest feature values for real-time inference. The offline store keeps the full history in S3 (Parquet format) for training and batch inference. When you write a feature, both stores sync automatically. One source of truth. 🎯

🏗️ Feature Store Architecture

Component	What It Does
Feature Group	A collection of related features (like a table)
Online Store	Low-latency key-value store for real-time lookups
Offline Store	Historical data in S3 (Parquet) for training
Record Identifier	Primary key for feature lookups
Event Time	Timestamp for point-in-time correctness
Glue Data Catalog	Auto-created metadata catalog for Athena queries

The online store always holds the latest snapshot. The offline store is append-only, keeping every version of every record. This enables point-in-time queries for training: "What did this customer's features look like 30 days ago?"

🔧 Terraform: Create Feature Groups

IAM Role

# feature_store/iam.tf

resource "aws_iam_role" "feature_store" {
  name = "${var.environment}-feature-store"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Action    = "sts:AssumeRole"
      Effect    = "Allow"
      Principal = { Service = "sagemaker.amazonaws.com" }
    }]
  })
}

resource "aws_iam_role_policy" "feature_store_access" {
  name = "feature-store-s3-glue"
  role = aws_iam_role.feature_store.id

  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Effect = "Allow"
        Action = ["s3:GetObject", "s3:PutObject", "s3:DeleteObject", "s3:GetBucketLocation"]
        Resource = [
          "${var.offline_store_bucket_arn}",
          "${var.offline_store_bucket_arn}/*"
        ]
      },
      {
        Effect = "Allow"
        Action = [
          "glue:CreateTable", "glue:UpdateTable", "glue:GetTable",
          "glue:GetDatabase", "glue:CreateDatabase"
        ]
        Resource = "*"
      }
    ]
  })
}

Feature Group Definition

# feature_store/feature_groups.tf

resource "aws_sagemaker_feature_group" "customer_features" {
  feature_group_name             = "${var.environment}-customer-features"
  record_identifier_feature_name = "customer_id"
  event_time_feature_name        = "event_time"
  role_arn                       = aws_iam_role.feature_store.arn

  # Feature schema
  feature_definition {
    feature_name = "customer_id"
    feature_type = "String"
  }

  feature_definition {
    feature_name = "event_time"
    feature_type = "Fractional"
  }

  feature_definition {
    feature_name = "total_purchases"
    feature_type = "Integral"
  }

  feature_definition {
    feature_name = "avg_order_value"
    feature_type = "Fractional"
  }

  feature_definition {
    feature_name = "days_since_last_purchase"
    feature_type = "Integral"
  }

  feature_definition {
    feature_name = "account_age_days"
    feature_type = "Integral"
  }

  feature_definition {
    feature_name = "is_premium"
    feature_type = "Integral"
  }

  # Enable both online and offline stores
  online_store_config {
    enable_online_store = true

    security_config {
      kms_key_id = var.kms_key_arn
    }
  }

  offline_store_config {
    s3_storage_config {
      s3_uri   = "s3://${var.offline_store_bucket}/${var.environment}/feature-store"
      kms_key_id = var.kms_key_arn
    }

    table_format = var.offline_table_format  # "Glue" or "Iceberg"
  }

  tags = {
    Environment = var.environment
    Domain      = "customer"
  }
}

Three feature types: String, Fractional (float), and Integral (integer). Everything else maps to String.

table_format: Choose Glue (default, Hive-compatible) or Iceberg (better for upserts and time travel). Iceberg is recommended for production workloads that need ACID transactions.

Multiple Feature Groups

resource "aws_sagemaker_feature_group" "transaction_features" {
  feature_group_name             = "${var.environment}-transaction-features"
  record_identifier_feature_name = "transaction_id"
  event_time_feature_name        = "event_time"
  role_arn                       = aws_iam_role.feature_store.arn

  feature_definition {
    feature_name = "transaction_id"
    feature_type = "String"
  }

  feature_definition {
    feature_name = "event_time"
    feature_type = "Fractional"
  }

  feature_definition {
    feature_name = "amount"
    feature_type = "Fractional"
  }

  feature_definition {
    feature_name = "merchant_category"
    feature_type = "String"
  }

  feature_definition {
    feature_name = "is_international"
    feature_type = "Integral"
  }

  online_store_config {
    enable_online_store = true
  }

  offline_store_config {
    s3_storage_config {
      s3_uri = "s3://${var.offline_store_bucket}/${var.environment}/feature-store"
    }
    table_format = var.offline_table_format
  }
}

🐍 Ingest Features (SDK)

Terraform defines the schema. The SDK ingests the data:

import boto3
import time

featurestore_runtime = boto3.client("sagemaker-featurestore-runtime")

# Write a single record (real-time)
featurestore_runtime.put_record(
    FeatureGroupName="prod-customer-features",
    Record=[
        {"FeatureName": "customer_id", "ValueAsString": "cust-12345"},
        {"FeatureName": "event_time", "ValueAsString": str(time.time())},
        {"FeatureName": "total_purchases", "ValueAsString": "47"},
        {"FeatureName": "avg_order_value", "ValueAsString": "89.50"},
        {"FeatureName": "days_since_last_purchase", "ValueAsString": "3"},
        {"FeatureName": "account_age_days", "ValueAsString": "730"},
        {"FeatureName": "is_premium", "ValueAsString": "1"},
    ],
)

Read Features for Inference (Online Store)

# Real-time feature lookup (single-digit ms latency)
response = featurestore_runtime.get_record(
    FeatureGroupName="prod-customer-features",
    RecordIdentifierValueAsString="cust-12345",
)

features = {r["FeatureName"]: r["ValueAsString"] for r in response["Record"]}
print(features)
# {'customer_id': 'cust-12345', 'total_purchases': '47', ...}

Query Features for Training (Offline Store via Athena)

import boto3

athena = boto3.client("athena")

query = """
SELECT customer_id, total_purchases, avg_order_value, is_premium
FROM "sagemaker_featurestore"."prod-customer-features"
WHERE event_time <= 1700000000
"""

response = athena.start_query_execution(
    QueryString=query,
    QueryExecutionContext={"Database": "sagemaker_featurestore"},
    ResultConfiguration={"OutputLocation": "s3://my-bucket/athena-results/"},
)

The offline store is automatically cataloged in Glue. Query with Athena for point-in-time training datasets.

📐 Environment Configuration

# environments/dev.tfvars
environment           = "dev"
offline_table_format  = "Glue"     # Simpler for dev
kms_key_arn           = null        # No encryption in dev

# environments/prod.tfvars
environment           = "prod"
offline_table_format  = "Iceberg"  # ACID transactions, time travel
kms_key_arn           = "arn:aws:kms:us-east-1:123456789012:key/abc-123"

⚠️ Gotchas and Tips

Offline store has a ~15 minute delay. Data written via PutRecord appears in the online store immediately but takes up to 15 minutes to land in the offline store (S3). Don't rely on the offline store for near-real-time analytics.

Feature groups are mutable. You can add new features to an existing feature group using the UpdateFeatureGroup API. You cannot remove or rename existing features.

Online store costs. The online store charges per read and write unit. High-throughput inference with thousands of feature lookups per second adds up. Monitor costs and batch lookups where possible using BatchGetRecord.

Point-in-time correctness. Always use event_time for training queries. Querying without time filtering risks data leakage, where future data appears in your training set.

KMS encryption for both stores. The online and offline stores support separate KMS keys. In production, encrypt both. The Glue Data Catalog metadata is not encrypted by Feature Store, manage it separately.

Schema planning matters. Feature types (String, Fractional, Integral) cannot be changed after creation. Plan your schema carefully. Use String for anything you're unsure about, since it's the most flexible.

⏭️ What's Next

This is Post 3 of the ML Pipelines & MLOps with Terraform series.

Post 1: SageMaker Studio Domain 🔬
Post 2: SageMaker Endpoints - Deploy to Prod 🚀
Post 3: SageMaker Feature Store (you are here) 🗃️
Post 4: SageMaker Pipelines - CI/CD for ML

Your features have a home. Online store for real-time inference, offline store for training, automatic sync between them. No more training-serving skew. No more duplicated feature pipelines. One source of truth, all in Terraform. 🗃️

Found this helpful? Follow for the full ML Pipelines & MLOps with Terraform series! 💬

Azure ML Online Endpoints: Deploy Your Model to Production with Terraform 🚀

Suhas Mallesh — Mon, 13 Apr 2026 07:00:00 +0000

Your model is trained. Now deploy it to a managed online endpoint with traffic splitting for canary rollouts, autoscaling, health probes, and data collection. Here's how to deploy Azure ML endpoints with Terraform using azapi.

In the previous post, we set up the Azure ML workspace - the hub for experiments, models, and compute. Now comes the production side: taking a trained model and deploying it to a managed online endpoint that your applications can call for real-time predictions.

Azure ML online endpoints involve two resources: an Endpoint (the stable HTTPS URL with auth and traffic routing) and one or more Deployments (the model + compute behind it). Since the azurerm provider doesn't have native online endpoint resources yet, we use azapi to provision them directly via the Azure API. 🎯

🏗️ The Two-Layer Architecture

Endpoint (HTTPS URL, auth mode, traffic split)
    ↓
Deployment(s) (model, instance type, scaling, probes)

Resource	What It Defines
Endpoint	Stable URL, auth mode (key/AAD), traffic routing
Deployment	Model reference, instance type, scale settings, health probes

The endpoint URL stays fixed. You deploy new model versions as new deployments, shift traffic gradually, and delete old deployments after validation.

🔧 Terraform: Create the Online Endpoint

The Endpoint

# endpoint/main.tf

resource "azapi_resource" "online_endpoint" {
  type      = "Microsoft.MachineLearningServices/workspaces/onlineEndpoints@2025-06-01"
  name      = "${var.environment}-${var.model_name}"
  parent_id = azurerm_machine_learning_workspace.this.id
  location  = var.location

  identity {
    type = "SystemAssigned"
  }

  body = {
    properties = {
      authMode            = var.auth_mode
      publicNetworkAccess = var.public_network_access
      description         = "Production endpoint for ${var.model_name}"
      traffic = {
        (var.deployment_name) = 100
      }
    }
  }

  tags = {
    Environment = var.environment
    Model       = var.model_name
  }
}

authMode controls how clients authenticate: Key for API key auth (simpler), AADToken for Azure AD auth (more secure, no key rotation needed). Use AADToken in production.

The Deployment

# endpoint/deployment.tf

resource "azapi_resource" "deployment" {
  type      = "Microsoft.MachineLearningServices/workspaces/onlineEndpoints/deployments@2025-06-01"
  name      = var.deployment_name
  parent_id = azapi_resource.online_endpoint.id
  location  = var.location

  identity {
    type = "SystemAssigned"
  }

  body = {
    kind = "Managed"
    properties = {
      model             = var.model_uri
      environmentId     = var.environment_id
      instanceType      = var.instance_type
      appInsightsEnabled = true

      scaleSettings = {
        scaleType    = "Default"
        minInstances = var.min_instances
        maxInstances = var.max_instances
      }

      livenessProbe = {
        initialDelay   = "PT10S"
        period         = "PT10S"
        timeout        = "PT2S"
        failureThreshold = 30
        successThreshold = 1
      }

      readinessProbe = {
        initialDelay   = "PT10S"
        period         = "PT10S"
        timeout        = "PT2S"
        failureThreshold = 30
        successThreshold = 1
      }

      requestSettings = {
        requestTimeout       = "PT5S"
        maxConcurrentRequestsPerInstance = var.max_concurrent_requests
      }
    }
  }

  tags = {
    Environment = var.environment
    Model       = var.model_name
    Version     = var.model_version
  }
}

Key deployment properties:

model references the registered model in the workspace (e.g., azureml:fraud-detector:2)
environmentId references a curated or custom environment with your dependencies
scaleSettings controls autoscaling from minInstances to maxInstances
livenessProbe and readinessProbe configure health checks
requestSettings controls timeout and concurrency per instance

📐 Traffic Splitting for Canary Deployments

Deploy a new model version alongside the existing one, then gradually shift traffic:

# Endpoint with two deployments and traffic split
resource "azapi_resource" "online_endpoint" {
  type      = "Microsoft.MachineLearningServices/workspaces/onlineEndpoints@2025-06-01"
  name      = "${var.environment}-${var.model_name}"
  parent_id = azurerm_machine_learning_workspace.this.id
  location  = var.location

  identity {
    type = "SystemAssigned"
  }

  body = {
    properties = {
      authMode = var.auth_mode
      traffic = {
        "blue"  = 90   # Current stable version
        "green" = 10   # New canary version
      }
    }
  }
}

# Blue deployment (current version)
resource "azapi_resource" "blue" {
  type      = "Microsoft.MachineLearningServices/workspaces/onlineEndpoints/deployments@2025-06-01"
  name      = "blue"
  parent_id = azapi_resource.online_endpoint.id
  location  = var.location
  # ... deployment config for v1
}

# Green deployment (new version)
resource "azapi_resource" "green" {
  type      = "Microsoft.MachineLearningServices/workspaces/onlineEndpoints/deployments@2025-06-01"
  name      = "green"
  parent_id = azapi_resource.online_endpoint.id
  location  = var.location
  # ... deployment config for v2
}

Canary rollout workflow: Deploy green with 10% traffic. Monitor error rates and latency. If healthy, update traffic to "green" = 100, then delete blue. If unhealthy, set "blue" = 100 to roll back instantly.

Mirror Traffic for Shadow Testing

Test a new deployment without affecting users:

body = {
  properties = {
    authMode = var.auth_mode
    traffic = {
      "blue" = 100      # All live traffic to blue
    }
    mirrorTraffic = {
      "green" = 10       # Copy 10% of requests to green (responses discarded)
    }
  }
}

Mirror traffic sends a copy of live requests to the green deployment, but responses are discarded. This lets you validate the new model's behavior under real traffic patterns without any user impact.

📐 Data Collection for Model Monitoring

Enable request/response logging to catch model drift:

body = {
  kind = "Managed"
  properties = {
    model         = var.model_uri
    instanceType  = var.instance_type
    appInsightsEnabled = true

    dataCollector = {
      collections = {
        model_inputs = {
          dataCollectionMode = "Enabled"
          dataId             = var.data_asset_id
          samplingRate       = var.sampling_rate
        }
        model_outputs = {
          dataCollectionMode = "Enabled"
          dataId             = var.data_asset_id
          samplingRate       = var.sampling_rate
        }
      }
      requestLogging = {
        captureHeaders = ["Content-Type", "x-request-id"]
      }
      rollingRate = "Hour"
    }
  }
}

Data collection captures model inputs and outputs to a registered data asset. Use it for drift detection, fairness monitoring, and retraining triggers.

📐 Environment Configuration

# environments/dev.tfvars
model_name              = "fraud-detector"
deployment_name         = "blue"
model_uri               = "azureml:fraud-detector:2"
instance_type           = "Standard_DS2_v2"
min_instances           = 1
max_instances           = 2
max_concurrent_requests = 5
auth_mode               = "Key"
public_network_access   = "Enabled"

# environments/prod.tfvars
model_name              = "fraud-detector"
deployment_name         = "blue"
model_uri               = "azureml:fraud-detector:2"
instance_type           = "Standard_DS3_v2"
min_instances           = 2
max_instances           = 8
max_concurrent_requests = 10
auth_mode               = "AADToken"
public_network_access   = "Disabled"

🧪 Invoke the Endpoint

from azure.ai.ml import MLClient
from azure.identity import DefaultAzureCredential

ml_client = MLClient(
    DefaultAzureCredential(),
    subscription_id="...",
    resource_group_name="...",
    workspace_name="...",
)

result = ml_client.online_endpoints.invoke(
    endpoint_name="prod-fraud-detector",
    deployment_name="blue",
    request_file="sample_request.json",
)

print(result)

Or via REST:

curl -X POST \
  "https://prod-fraud-detector.eastus.inference.ml.azure.com/score" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"data": [[0.5, 1.2, 3.4, 0.8]]}'

⚠️ Gotchas and Tips

azapi is required. The azurerm provider doesn't have native azurerm_machine_learning_online_endpoint or azurerm_machine_learning_online_deployment resources. Use azapi_resource with the Microsoft.MachineLearningServices API.

Deployment creation takes 8-15 minutes. Provisioning VMs, pulling containers, loading models, and running health probes takes time. Plan for this in CI/CD pipelines.

Register models before deploying. The model field references a registered model in the workspace Model Registry (format: azureml:model-name:version). Register models via the SDK or CLI before running terraform apply.

Use AADToken in production. API keys work but require rotation. AAD token auth integrates with managed identity, eliminating key management entirely.

Scale to zero is not supported for managed endpoints. Unlike serverless compute, managed online endpoints require at least one instance running. If you need scale-to-zero, consider serverless endpoints (currently in preview).

Mirror traffic before canary. Mirror first (responses discarded, no user impact) to validate the model handles real request shapes correctly. Then switch to traffic splitting for a live canary test.

⏭️ What's Next

This is Post 2 of the Azure ML Pipelines & MLOps with Terraform series.

Post 1: Azure ML Workspace 🔬
Post 2: Azure ML Online Endpoints (you are here) 🚀
Post 3: Azure ML Feature Store
Post 4: Azure ML Pipelines + Azure DevOps

Your model is in production. Managed online endpoint with autoscaling, blue/green traffic splitting, mirror traffic for shadow testing, and data collection for drift monitoring. From workspace to production, all in Terraform. 🚀

Found this helpful? Follow for the full ML Pipelines & MLOps with Terraform series! 💬

Vertex AI Endpoints: Deploy Your Model to Production with Terraform 🚀

Suhas Mallesh — Sun, 12 Apr 2026 07:00:00 +0000

Your model is trained. Now deploy it to a scalable endpoint with autoscaling, traffic splitting for canary rollouts, and request-response logging. Here's how to deploy Vertex AI endpoints with Terraform.

In the previous post, we set up the Vertex AI Workbench - the workspace where your team trains models. Now comes the production side: taking a trained model and deploying it to an endpoint that your applications can call for real-time predictions.

Vertex AI uses a three-step deployment: upload a Model to the Model Registry, create an Endpoint (the HTTPS URL), then deploy the model to the endpoint with compute resources. Terraform provisions the endpoint and configures autoscaling, traffic splitting, and logging. The SDK handles model upload and deployment. 🎯

🏗️ The Deployment Architecture

Model Registry (uploaded model + container)
    ↓
Endpoint (HTTPS URL, traffic routing)
    ↓
Deployed Model (machine type, replicas, autoscaling)

Component	What It Defines
Model	Container image + model artifacts in GCS
Endpoint	Stable HTTPS prediction URL
Deployed Model	Machine type, replica count, autoscaling
Traffic Split	Percentage routing between model versions

The endpoint URL stays stable across model versions. You update the deployed model behind it without changing your application code.

🔧 Terraform: Create the Endpoint

Endpoint Resource

# endpoint/main.tf

resource "google_vertex_ai_endpoint" "this" {
  name         = "${var.environment}-${var.model_name}-endpoint"
  display_name = "${var.environment}-${var.model_name}"
  description  = "Production endpoint for ${var.model_name}"
  location     = var.region
  project      = var.project_id

  labels = {
    environment = var.environment
    model       = var.model_name
    managed_by  = "terraform"
  }
}

Endpoint with Private Network (Production)

For production, deploy with a private endpoint inside your VPC:

resource "google_vertex_ai_endpoint" "private" {
  name         = "${var.environment}-${var.model_name}-endpoint"
  display_name = "${var.environment}-${var.model_name}"
  location     = var.region
  project      = var.project_id
  network      = "projects/${data.google_project.this.number}/global/networks/${var.vpc_network_name}"

  depends_on = [google_service_networking_connection.vertex_vpc]
}

resource "google_service_networking_connection" "vertex_vpc" {
  network                 = var.vpc_network_id
  service                 = "servicenetworking.googleapis.com"
  reserved_peering_ranges = [google_compute_global_address.vertex_range.name]
}

resource "google_compute_global_address" "vertex_range" {
  name          = "${var.environment}-vertex-range"
  purpose       = "VPC_PEERING"
  address_type  = "INTERNAL"
  prefix_length = 16
  network       = var.vpc_network_id
  project       = var.project_id
}

Prediction Logging to BigQuery

resource "google_bigquery_dataset" "predictions" {
  dataset_id = "${var.environment}_prediction_logs"
  location   = var.region
  project    = var.project_id
}

resource "google_vertex_ai_endpoint" "logged" {
  name         = "${var.environment}-${var.model_name}-endpoint"
  display_name = "${var.environment}-${var.model_name}"
  location     = var.region
  project      = var.project_id

  predict_request_response_logging_config {
    enabled       = var.enable_prediction_logging
    sampling_rate = var.logging_sample_rate

    bigquery_destination {
      output_uri = "bq://${var.project_id}.${google_bigquery_dataset.predictions.dataset_id}.request_response"
    }
  }
}

Log a sample of prediction requests and responses to BigQuery for model monitoring, drift detection, and debugging.

🐍 Model Upload and Deployment (SDK)

Terraform creates the endpoint. The Vertex AI SDK uploads the model and deploys it with compute resources:

# deploy.py

from google.cloud import aiplatform
import json

with open("deploy_config.json") as f:
    config = json.load(f)

aiplatform.init(
    project=config["project_id"],
    location=config["region"],
)

# Upload model to Model Registry
model = aiplatform.Model.upload(
    display_name=config["model_name"],
    artifact_uri=config["model_artifact_uri"],  # gs://bucket/model/
    serving_container_image_uri=config["serving_image"],
    serving_container_predict_route="/predict",
    serving_container_health_route="/health",
)

# Get the Terraform-created endpoint
endpoint = aiplatform.Endpoint(config["endpoint_resource_name"])

# Deploy model to endpoint
model.deploy(
    endpoint=endpoint,
    machine_type=config["machine_type"],
    min_replica_count=config["min_replicas"],
    max_replica_count=config["max_replicas"],
    traffic_percentage=100,
    deploy_request_timeout=1800,
)

print(f"Model deployed to: {endpoint.resource_name}")

Terraform Config Output for SDK

# endpoint/config.tf

resource "local_file" "deploy_config" {
  filename = "${path.module}/deploy_config.json"
  content = jsonencode({
    project_id             = var.project_id
    region                 = var.region
    model_name             = "${var.environment}-${var.model_name}"
    model_artifact_uri     = var.model_artifact_uri
    serving_image          = var.serving_container_image
    endpoint_resource_name = google_vertex_ai_endpoint.this.name
    machine_type           = var.machine_type
    min_replicas           = var.min_replicas
    max_replicas           = var.max_replicas
  })
}

📐 Traffic Splitting for Canary Deployments

Deploy a new model version alongside the existing one and gradually shift traffic:

# Canary: deploy new version with 10% traffic
new_model = aiplatform.Model.upload(
    display_name="fraud-detector-v3",
    artifact_uri="gs://ml-models/fraud-detector/v3/",
    serving_container_image_uri=config["serving_image"],
)

new_model.deploy(
    endpoint=endpoint,
    machine_type=config["machine_type"],
    min_replica_count=1,
    max_replica_count=4,
    traffic_percentage=10,  # 10% to new model
)

# After validation, shift 100% to new model
endpoint.update(traffic_split={
    new_model.id: 100,
})

The endpoint URL doesn't change. Your application sends predictions to the same endpoint while you shift traffic from the old model to the new one.

📐 Model Garden Deployment (Terraform-Only)

For open models from Model Garden (Gemma, Llama, PaLiGemma), use the newer Terraform resource that handles everything in one step:

resource "google_vertex_ai_endpoint_with_model_garden_deployment" "gemma" {
  publisher_model_name = "publishers/google/models/gemma3@gemma-3-1b-it"
  location             = var.region

  model_config {
    accept_eula = true
  }

  deploy_config {
    dedicated_resources {
      machine_spec {
        machine_type      = "g2-standard-12"
        accelerator_type  = "NVIDIA_L4"
        accelerator_count = 1
      }
      min_replica_count = 1
    }
  }
}

One resource creates the endpoint, uploads the model, and deploys it. This works for Model Garden models only - custom models still need the endpoint + SDK pattern.

📐 Environment Configuration

# environments/dev.tfvars
model_name              = "fraud-detector"
machine_type            = "n1-standard-4"
min_replicas            = 1
max_replicas            = 2
enable_prediction_logging = false
logging_sample_rate     = 0.0

# environments/prod.tfvars
model_name              = "fraud-detector"
machine_type            = "n1-standard-8"
min_replicas            = 2
max_replicas            = 10
enable_prediction_logging = true
logging_sample_rate     = 0.1   # Log 10% of requests

🧪 Invoke the Endpoint

from google.cloud import aiplatform

aiplatform.init(project="my-project", location="us-central1")

endpoint = aiplatform.Endpoint("ENDPOINT_ID")

prediction = endpoint.predict(
    instances=[{"features": [0.5, 1.2, 3.4, 0.8]}]
)

print(prediction.predictions)

⚠️ Gotchas and Tips

Endpoint creation is fast, deployment is slow. Creating an endpoint takes seconds. Deploying a model (provisioning VMs, loading containers, health checks) takes 10-20 minutes. Plan for this in your CI/CD pipeline.

Autoscaling uses replica count. Set min_replica_count to your baseline and max_replica_count to your peak. Vertex AI scales based on CPU utilization and request queue depth automatically.

GPU quota must be requested. NVIDIA L4, T4, A100 accelerators require quota increases. Request early in your project setup.

Model Registry keeps versions. Every Model.upload creates a new version in the registry. Old versions remain available for rollback. Use the traffic_split to redirect traffic back to a previous version if needed.

Prediction logging costs. BigQuery logging at 10% sampling rate is manageable. At 100%, costs scale with request volume. Use sampling for high-traffic endpoints.

⏭️ What's Next

This is Post 2 of the GCP ML Pipelines & MLOps with Terraform series.

Post 1: Vertex AI Workbench 🔬
Post 2: Vertex AI Endpoints - Deploy to Prod (you are here) 🚀
Post 3: Vertex AI Feature Store
Post 4: Vertex AI Pipelines + Cloud Build

Your model is in production. Stable endpoint URL, autoscaling replicas, traffic splitting for canary rollouts, and prediction logging to BigQuery. From training to production, all in Terraform and Python. 🚀

Found this helpful? Follow for the full ML Pipelines & MLOps with Terraform series! 💬

SageMaker Endpoints: Deploy Your Model to Production with Terraform 🚀

Suhas Mallesh — Sat, 11 Apr 2026 00:48:06 +0000

Training a model is half the battle. Deploying it to a scalable, auto-scaling endpoint with blue/green deployment and rollback is the other half. Here's how to deploy SageMaker real-time endpoints with Terraform.

In the previous post, we set up the SageMaker Studio domain - the workspace where your team trains models. Now comes the production side: taking a trained model and deploying it to a scalable HTTPS endpoint that your applications can call for real-time predictions.

SageMaker endpoints involve three Terraform resources: a Model (what to serve), an Endpoint Configuration (how to serve it), and an Endpoint (the live HTTPS URL). Add autoscaling and deployment policies on top, and you have a production-grade inference system. 🎯

🏗️ The Three-Layer Architecture

Model (container image + model artifacts in S3)
    ↓
Endpoint Configuration (instance type, count, variants)
    ↓
Endpoint (HTTPS URL, auto-scaling, blue/green deployment)

Resource	What It Defines
Model	Container image + S3 model artifacts + IAM role
Endpoint Config	Instance type, initial count, production variants
Endpoint	Live endpoint with deployment policy and autoscaling

Separating these layers means you can update the model without touching the endpoint config, or change instance types without retraining.

🔧 Terraform: Deploy to Production

Model Configuration

# variables.tf

variable "model_config" {
  description = "Model deployment configuration. Change to deploy new models."
  type = object({
    name           = string
    image_uri      = string  # ECR container image
    model_data_url = string  # S3 path to model.tar.gz
    instance_type  = string
    instance_count = number
  })
}

IAM Role for Model Execution

# inference/iam.tf

resource "aws_iam_role" "sagemaker_execution" {
  name = "${var.environment}-sagemaker-inference"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Action    = "sts:AssumeRole"
      Effect    = "Allow"
      Principal = { Service = "sagemaker.amazonaws.com" }
    }]
  })
}

resource "aws_iam_role_policy" "model_access" {
  name = "model-s3-ecr-access"
  role = aws_iam_role.sagemaker_execution.id

  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Effect   = "Allow"
        Action   = ["s3:GetObject"]
        Resource = "${var.model_bucket_arn}/*"
      },
      {
        Effect = "Allow"
        Action = [
          "ecr:GetAuthorizationToken",
          "ecr:BatchGetImage",
          "ecr:GetDownloadUrlForLayer"
        ]
        Resource = "*"
      },
      {
        Effect   = "Allow"
        Action   = ["logs:CreateLogGroup", "logs:CreateLogStream", "logs:PutLogEvents"]
        Resource = "*"
      }
    ]
  })
}

The Model

# inference/model.tf

resource "aws_sagemaker_model" "this" {
  name               = "${var.environment}-${var.model_config.name}"
  execution_role_arn = aws_iam_role.sagemaker_execution.arn

  primary_container {
    image          = var.model_config.image_uri
    model_data_url = var.model_config.model_data_url
    environment = {
      SAGEMAKER_PROGRAM = "inference.py"
    }
  }

  tags = {
    Environment = var.environment
    Model       = var.model_config.name
  }
}

image is your serving container - either an AWS Deep Learning Container or your custom container from ECR. model_data_url points to model.tar.gz in S3 containing your trained weights and inference code.

Endpoint Configuration

# inference/endpoint_config.tf

resource "aws_sagemaker_endpoint_configuration" "this" {
  name = "${var.environment}-${var.model_config.name}-config"

  production_variants {
    variant_name           = "primary"
    model_name             = aws_sagemaker_model.this.name
    initial_instance_count = var.model_config.instance_count
    instance_type          = var.model_config.instance_type
    initial_variant_weight = 1.0
  }

  tags = {
    Environment = var.environment
    Model       = var.model_config.name
  }

  lifecycle {
    create_before_destroy = true
  }
}

create_before_destroy = true is important. When you update the endpoint config (new model version, different instance type), Terraform creates the new config before deleting the old one. This prevents downtime during updates.

The Endpoint

# inference/endpoint.tf

resource "aws_sagemaker_endpoint" "this" {
  name                 = "${var.environment}-${var.model_config.name}"
  endpoint_config_name = aws_sagemaker_endpoint_configuration.this.name

  deployment_config {
    blue_green_update_policy {
      traffic_routing_configuration {
        type                     = "CANARY"
        canary_size {
          type  = "INSTANCE_COUNT"
          value = 1
        }
        wait_interval_in_seconds = 300
      }

      maximum_execution_timeout_in_seconds = 1800
      termination_wait_in_seconds          = 120
    }

    auto_rollback_configuration {
      alarms {
        alarm_name = aws_cloudwatch_metric_alarm.endpoint_errors.alarm_name
      }
    }
  }

  tags = {
    Environment = var.environment
    Model       = var.model_config.name
  }
}

Canary deployment: Routes traffic to 1 instance on the new fleet first, waits 5 minutes, then shifts remaining traffic. If the CloudWatch alarm fires during the canary phase, SageMaker automatically rolls back.

Autoscaling

# inference/autoscaling.tf

resource "aws_appautoscaling_target" "endpoint" {
  max_capacity       = var.autoscaling_max
  min_capacity       = var.model_config.instance_count
  resource_id        = "endpoint/${aws_sagemaker_endpoint.this.name}/variant/primary"
  scalable_dimension = "sagemaker:variant:DesiredInstanceCount"
  service_namespace  = "sagemaker"
}

resource "aws_appautoscaling_policy" "endpoint" {
  name               = "${var.environment}-endpoint-scaling"
  policy_type        = "TargetTrackingScaling"
  resource_id        = aws_appautoscaling_target.endpoint.resource_id
  scalable_dimension = aws_appautoscaling_target.endpoint.scalable_dimension
  service_namespace  = aws_appautoscaling_target.endpoint.service_namespace

  target_tracking_scaling_policy_configuration {
    target_value = var.target_invocations_per_instance

    predefined_metric_specification {
      predefined_metric_type = "SageMakerVariantInvocationsPerInstance"
    }

    scale_in_cooldown  = 300
    scale_out_cooldown = 60
  }
}

SageMakerVariantInvocationsPerInstance scales based on requests per instance. Set the target to the maximum your instance can handle (determined by load testing). The policy adds instances when traffic exceeds the target and removes them when traffic drops.

Monitoring

# inference/monitoring.tf

resource "aws_cloudwatch_metric_alarm" "endpoint_errors" {
  alarm_name          = "${var.environment}-endpoint-5xx-errors"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = 2
  metric_name         = "Invocation5XXErrors"
  namespace           = "AWS/SageMaker"
  period              = 60
  statistic           = "Sum"
  threshold           = 5
  alarm_description   = "Endpoint returning 5xx errors"
  alarm_actions       = [var.sns_alert_topic_arn]

  dimensions = {
    EndpointName = aws_sagemaker_endpoint.this.name
    VariantName  = "primary"
  }
}

resource "aws_cloudwatch_metric_alarm" "endpoint_latency" {
  alarm_name          = "${var.environment}-endpoint-high-latency"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = 3
  metric_name         = "ModelLatency"
  namespace           = "AWS/SageMaker"
  period              = 60
  statistic           = "Average"
  threshold           = var.latency_threshold_ms * 1000  # microseconds
  alarm_description   = "Model inference latency too high"
  alarm_actions       = [var.sns_alert_topic_arn]

  dimensions = {
    EndpointName = aws_sagemaker_endpoint.this.name
    VariantName  = "primary"
  }
}

The error alarm also feeds into the blue/green deployment's auto_rollback_configuration. If 5xx errors spike during a deployment, SageMaker automatically rolls back to the previous fleet.

📐 Environment Configuration

# environments/dev.tfvars
model_config = {
  name           = "fraud-detector-v2"
  image_uri      = "123456789012.dkr.ecr.us-east-1.amazonaws.com/ml-serving:latest"
  model_data_url = "s3://ml-models-dev/fraud-detector/v2/model.tar.gz"
  instance_type  = "ml.t2.medium"
  instance_count = 1
}
autoscaling_max = 2
target_invocations_per_instance = 100

# environments/prod.tfvars
model_config = {
  name           = "fraud-detector-v2"
  image_uri      = "123456789012.dkr.ecr.us-east-1.amazonaws.com/ml-serving:v2.1.0"
  model_data_url = "s3://ml-models-prod/fraud-detector/v2/model.tar.gz"
  instance_type  = "ml.c5.xlarge"
  instance_count = 2
}
autoscaling_max = 10
target_invocations_per_instance = 500

Deploying a new model version: Update model_data_url to the new S3 path, run terraform apply. The canary deployment creates a new fleet, validates it, and shifts traffic automatically.

🧪 Invoke the Endpoint

import boto3
import json

client = boto3.client("sagemaker-runtime")

response = client.invoke_endpoint(
    EndpointName="prod-fraud-detector-v2",
    ContentType="application/json",
    Body=json.dumps({"features": [0.5, 1.2, 3.4, 0.8]})
)

prediction = json.loads(response["Body"].read())
print(prediction)

⚠️ Gotchas and Tips

Load test before setting autoscaling targets. The target_invocations_per_instance should be based on actual load testing, not guesswork. Use SageMaker Inference Recommender to find the optimal instance type and max throughput.

Endpoint creation takes 5-10 minutes. SageMaker provisions instances, downloads the container, loads the model, and runs health checks. Plan for this in your deployment pipeline.

Use create_before_destroy on endpoint configs. Without it, Terraform deletes the old config before creating the new one, causing downtime.

Pin container image tags. Use versioned tags (v2.1.0) instead of latest in production. This ensures reproducible deployments and meaningful rollbacks.

Serverless inference for low traffic. If your endpoint gets fewer than ~100 requests/hour, consider serverless_config on the endpoint configuration instead of instance-based. It scales to zero and charges per request.

⏭️ What's Next

This is Post 2 of the ML Pipelines & MLOps with Terraform series.

Post 1: SageMaker Studio Domain 🔬
Post 2: SageMaker Endpoints - Deploy to Prod (you are here) 🚀
Post 3: SageMaker Feature Store
Post 4: SageMaker Pipelines - CI/CD for ML

Your model is in production. Real-time HTTPS endpoint, autoscaling based on traffic, canary deployments with automatic rollback, and monitoring that pages you before customers notice. From notebook to production, all in Terraform. 🚀

Found this helpful? Follow for the full ML Pipelines & MLOps with Terraform series! 💬

Azure ML Workspace with Terraform: Your ML Platform on Azure 🔬

Suhas Mallesh — Fri, 03 Apr 2026 07:00:00 +0000

Azure Machine Learning workspace is the hub for all ML activities - experiments, models, endpoints, pipelines. It requires four dependent services. Here's how to provision the entire platform with Terraform including compute instances and clusters.

In Series 1-3, we worked with managed AI services - AI Foundry for models, AI Search for RAG, Agent Service for orchestration. Series 5 shifts to custom ML - training your own models, deploying endpoints, managing features, and building CI/CD pipelines.

It starts with an Azure Machine Learning workspace. The workspace is the top-level resource for all ML activities: experiments, datasets, models, compute targets, endpoints, and pipelines live here. Unlike a simple resource, the workspace requires four dependent services before it can be created: Storage Account, Key Vault, Application Insights, and Container Registry. Terraform provisions the entire stack. 🎯

🏗️ Workspace Architecture

Component	What It Does
Workspace	Central hub for ML experiments, models, and pipelines
Storage Account	Default datastore for datasets, model artifacts, logs
Key Vault	Secrets, connection strings, API keys
Application Insights	Experiment tracking, endpoint monitoring
Container Registry	Custom training images, model deployment containers
Compute Instance	Per-user notebook/IDE (JupyterLab, VS Code)
Compute Cluster	Auto-scaling training cluster (CPU or GPU)

All four dependent services must exist before the workspace. Terraform handles the dependency ordering automatically.

🔧 Terraform: The Full Workspace Setup

Dependent Services

# ml/dependencies.tf

resource "azurerm_storage_account" "ml" {
  name                     = "${var.environment}ml${random_string.suffix.result}"
  location                 = azurerm_resource_group.ml.location
  resource_group_name      = azurerm_resource_group.ml.name
  account_tier             = "Standard"
  account_replication_type = var.storage_replication
  min_tls_version          = "TLS1_2"
  allow_nested_items_to_be_public = false

  tags = var.tags
}

resource "azurerm_key_vault" "ml" {
  name                       = "${var.environment}ml${random_string.suffix.result}kv"
  location                   = azurerm_resource_group.ml.location
  resource_group_name        = azurerm_resource_group.ml.name
  tenant_id                  = data.azurerm_client_config.current.tenant_id
  sku_name                   = "standard"
  purge_protection_enabled   = true
  enable_rbac_authorization  = true

  tags = var.tags
}

resource "azurerm_application_insights" "ml" {
  name                = "${var.environment}-ml-insights"
  location            = azurerm_resource_group.ml.location
  resource_group_name = azurerm_resource_group.ml.name
  application_type    = "web"

  tags = var.tags
}

resource "azurerm_container_registry" "ml" {
  name                = "${var.environment}ml${random_string.suffix.result}acr"
  location            = azurerm_resource_group.ml.location
  resource_group_name = azurerm_resource_group.ml.name
  sku                 = var.acr_sku
  admin_enabled       = false

  tags = var.tags
}

Container Registry is optional but recommended. Without it, the workspace uses Azure-managed image building. With it, you control custom training images and model serving containers. Set admin_enabled = false and use managed identity instead.

The Workspace

# ml/workspace.tf

resource "azurerm_machine_learning_workspace" "this" {
  name                          = "${var.environment}-ml-workspace"
  location                      = azurerm_resource_group.ml.location
  resource_group_name           = azurerm_resource_group.ml.name
  application_insights_id       = azurerm_application_insights.ml.id
  key_vault_id                  = azurerm_key_vault.ml.id
  storage_account_id            = azurerm_storage_account.ml.id
  container_registry_id         = azurerm_container_registry.ml.id
  public_network_access_enabled = var.public_network_access

  identity {
    type = "SystemAssigned"
  }

  tags = var.tags
}

The workspace creates a system-assigned managed identity that accesses the dependent services. No keys or connection strings to manage.

Compute Instance (Per-User IDE)

# ml/compute_instance.tf

resource "azurerm_machine_learning_compute_instance" "this" {
  for_each = var.compute_instances

  name                          = each.key
  machine_learning_workspace_id = azurerm_machine_learning_workspace.this.id
  location                      = azurerm_resource_group.ml.location
  virtual_machine_size          = each.value.vm_size
  node_public_ip_enabled        = var.public_network_access

  identity {
    type = "SystemAssigned"
  }

  tags = merge(var.tags, {
    Team = each.value.team
    User = each.key
  })
}

Each data scientist gets their own compute instance with JupyterLab and VS Code access. Instances stop and start independently - you only pay when they're running.

Compute Cluster (Training)

# ml/compute_cluster.tf

resource "azurerm_machine_learning_compute_cluster" "training" {
  name                          = "${var.environment}-training"
  machine_learning_workspace_id = azurerm_machine_learning_workspace.this.id
  location                      = azurerm_resource_group.ml.location
  vm_size                       = var.training_vm_size
  vm_priority                   = var.training_vm_priority

  identity {
    type = "SystemAssigned"
  }

  scale_settings {
    min_node_count                       = 0
    max_node_count                       = var.training_max_nodes
    scale_down_nodes_after_idle_duration  = "PT${var.scale_down_minutes}M"
  }

  tags = var.tags
}

min_node_count = 0 is the cost control key. The cluster scales to zero when no training jobs are running. You pay nothing for idle compute. scale_down_nodes_after_idle_duration controls how quickly nodes are released after a job finishes.

vm_priority = "LowPriority" saves up to 80% on training costs. Low-priority VMs can be evicted, so use them for fault-tolerant training jobs with checkpointing.

📐 Environment Configuration

# environments/dev.tfvars
environment            = "dev"
public_network_access  = true
storage_replication    = "LRS"
acr_sku                = "Basic"
training_vm_size       = "Standard_DS3_v2"
training_vm_priority   = "Dedicated"
training_max_nodes     = 2
scale_down_minutes     = 15

compute_instances = {
  "ds-dev-1" = {
    vm_size = "Standard_DS3_v2"
    team    = "ml-team"
  }
}

# environments/prod.tfvars
environment            = "prod"
public_network_access  = false
storage_replication    = "GRS"
acr_sku                = "Standard"
training_vm_size       = "Standard_NC6s_v3"   # GPU
training_vm_priority   = "LowPriority"
training_max_nodes     = 8
scale_down_minutes     = 30

compute_instances = {
  "ds-lead" = {
    vm_size = "Standard_DS4_v2"
    team    = "ml-team"
  }
  "ds-engineer-1" = {
    vm_size = "Standard_DS3_v2"
    team    = "ml-team"
  }
  "ds-engineer-2" = {
    vm_size = "Standard_DS3_v2"
    team    = "ml-team"
  }
}

Dev: Public access, LRS storage, small dedicated cluster for quick iteration.
Prod: Private access, GRS storage, GPU cluster with low-priority VMs for cost-efficient training.

🔧 RBAC for Team Access

# ml/rbac.tf

# Data scientists get Contributor on the workspace
resource "azurerm_role_assignment" "ds_contributor" {
  for_each = var.data_scientist_principals

  scope                = azurerm_machine_learning_workspace.this.id
  role_definition_name = "AzureML Compute Operator"
  principal_id         = each.value
}

# ML engineers get full workspace access
resource "azurerm_role_assignment" "mle_contributor" {
  for_each = var.ml_engineer_principals

  scope                = azurerm_machine_learning_workspace.this.id
  role_definition_name = "Contributor"
  principal_id         = each.value
}

Use AzureML Compute Operator for data scientists who need to run experiments but shouldn't modify workspace settings. Use Contributor for ML engineers who manage the full lifecycle.

🔧 Security Hardening

Control	Dev	Prod
Network access	Public	Private (managed VNet)
Storage	LRS, TLS 1.2	GRS, TLS 1.2, no public blob
Key Vault	RBAC auth	RBAC auth, purge protection
Container Registry	Basic, no admin	Standard, managed identity
Compute	Public IP	No public IP, subnet attached
Identity	System-assigned	System-assigned + RBAC

For production, set public_network_access_enabled = false on the workspace and use managed virtual network isolation. All compute instances and clusters run inside a managed VNet with outbound rules controlled by the workspace.

⚠️ Gotchas and Tips

Globally unique names required. Storage account and ACR names must be globally unique. Use random_string suffixes to avoid conflicts across environments.

Container Registry costs. Basic SKU is $5/month. Standard is $20/month. If you're not building custom training images yet, you can skip the ACR initially and add it later.

Compute instance auto-stop. Unlike clusters, compute instances don't auto-stop by default. Set up an idle shutdown schedule through the workspace settings or Azure Policy to prevent overnight charges.

Workspace deletion is complex. Deleting a workspace doesn't automatically delete dependent resources (storage, KV, ACR). Terraform handles this correctly with depends_on, but manual deletion requires cleaning up each resource individually.

SDK v1 is deprecated. Azure ML SDK v1 reached end-of-support in March 2025. Use SDK v2 (azure-ai-ml) for all new development. Terraform provisions the infrastructure; SDK v2 handles experiments and models.

⏭️ What's Next

This is Post 1 of the Azure ML Pipelines & MLOps with Terraform series.

Post 1: Azure ML Workspace (you are here) 🔬
Post 2: Azure ML Online Endpoints - Deploy to Prod
Post 3: Azure ML Feature Store
Post 4: Azure ML Pipelines + Azure DevOps

Your ML platform is provisioned. Workspace, storage, key vault, ACR, compute instances for notebooks, auto-scaling clusters for training - all in Terraform. The foundation for model development, deployment, and production ML pipelines. 🔬

Found this helpful? Follow for the full ML Pipelines & MLOps with Terraform series! 💬