Forem: Harish Aravindan

Your AI Agent Will Lie to You in Production — Here's How to Catch It Before It Ships

Harish Aravindan — Wed, 18 Mar 2026 04:33:09 +0000

You deploy an AI agent. It passes your manual tests. It looks good in the demo.

Three weeks later, someone edits the system prompt to make the output "cleaner." The agent starts behaving differently on edge cases. No error. No alert. Just subtly wrong output — until someone notices.

This post is about the CI/CD and prompt regression setup that prevents this. Everything here is practical and works today on AWS.

The Problem With AI Agents in CI/CD

Traditional software has a clear contract: given input X, function F returns output Y. Tests verify Y. If Y changes, the test fails, the build breaks, you investigate.

LLM-based agents break this model. The "function" is a language model. The same input can produce slightly different outputs on every run. And the failure mode isn't an exception — it's a plausible-looking wrong answer.

Three things make this worse in serverless AI pipelines:

1. Prompts aren't versioned like code. Engineers edit them in a string in a Python file, or worse, in a config file outside version control. Nobody reviews a prompt change the way they'd review a code change.

2. Retries mask failures. Lambda retries on error. Your retry logic retries on low-confidence responses. By the time a bad output surfaces, it's hard to trace it back to the prompt change that caused it.

3. Silent degradation. A classification agent that's 95% accurate and drops to 80% accurate won't throw an error. It'll just be wrong more often. You'll find out from downstream effects, not logs.

The Fix: A Prompt Regression Test Suite

The idea is simple. Lock a set of golden fixtures — known inputs with known correct outputs. Run your agent against them on every deploy. Fail the build if accuracy drops below a threshold.

Here's the full setup.

Step 1: Golden Fixture Format

Each fixture is a JSON file in tests/fixtures/. Structure:

{
  "document_id": "fixture_001",
  "input": {
    "document_text": "Policy holder: Jane Smith. Coverage: accidental damage. Item: MacBook Pro 16-inch. Purchase date: 2023-08-15. Claim date: 2025-11-03. Damage description: Screen cracked after drop.",
    "tenant_id": "test-tenant"
  },
  "expected": {
    "risk_level": "MEDIUM",
    "reminder_eligible": true,
    "confidence_min": 0.70
  }
}

Keep 20–30 fixtures. Cover your edge cases: borderline risk levels, ambiguous descriptions, missing fields, very old claims. These are the documents your agent gets wrong.

Never auto-generate fixtures. Write them manually. The point is that you a human have decided what the correct output is.

Step 2: The Test Runner

# tests/test_regression.py
import json
import os
import glob
import pytest
from agents.classifier import run_classifier

FIXTURE_DIR  = "tests/fixtures"
MIN_ACCURACY = 0.90   # Fail the build if accuracy drops below this

def load_fixtures():
    paths = glob.glob(f"{FIXTURE_DIR}/*.json")
    fixtures = []
    for p in paths:
        with open(p) as f:
            fixtures.append(json.load(f))
    return fixtures

@pytest.mark.parametrize("fixture", load_fixtures())
def test_classifier_regression(fixture):
    result = run_classifier(
        document_text=fixture["input"]["document_text"],
        tenant_id=fixture["input"]["tenant_id"]
    )

    expected = fixture["expected"]

    assert result["risk_level"] == expected["risk_level"], (
        f"[{fixture['document_id']}] "
        f"Expected risk_level={expected['risk_level']}, "
        f"got {result['risk_level']}"
    )

    if "confidence_min" in expected:
        assert result["confidence"] >= expected["confidence_min"], (
            f"[{fixture['document_id']}] "
            f"Confidence {result['confidence']:.2f} below minimum "
            f"{expected['confidence_min']}"
        )


def test_overall_accuracy():
    """
    Separate test: fail the whole suite if aggregate accuracy < MIN_ACCURACY.
    This catches regression even when individual tests pass on edge cases.
    """
    fixtures = load_fixtures()
    passed   = 0

    for fixture in fixtures:
        result = run_classifier(
            document_text=fixture["input"]["document_text"],
            tenant_id=fixture["input"]["tenant_id"]
        )
        if result["risk_level"] == fixture["expected"]["risk_level"]:
            passed += 1

    accuracy = passed / len(fixtures)
    assert accuracy >= MIN_ACCURACY, (
        f"Accuracy {accuracy:.0%} below threshold {MIN_ACCURACY:.0%}. "
        f"Passed {passed}/{len(fixtures)} fixtures."
    )

Run locally with pytest tests/test_regression.py -v. You'll see per-fixture pass/fail and the aggregate accuracy check.

Step 3: GitHub Actions Pipeline

# .github/workflows/deploy.yml
name: warrantyAI CI/CD

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]

env:
  AWS_REGION:     ap-south-1
  ECR_REGISTRY:   ${{ secrets.ECR_REGISTRY }}
  ECR_REPOSITORY: warrantyai-pipeline
  LAMBDA_FUNCTION: warrantyai-processor

jobs:
  regression-tests:
    name: Prompt Regression Tests
    runs-on: ubuntu-latest

    steps:
      - uses: actions/checkout@v4

      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: '3.12'

      - name: Install dependencies
        run: pip install -r requirements.txt

      - name: Configure AWS credentials (for Bedrock)
        uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: ${{ secrets.AWS_ROLE_ARN }}
          aws-region:     ${{ env.AWS_REGION }}

      - name: Run prompt regression tests
        run: pytest tests/test_regression.py -v --tb=short
        env:
          BEDROCK_MODEL_ID: anthropic.claude-haiku-4-5-20251001

  build-and-deploy:
    name: Build → ECR → Lambda
    runs-on: ubuntu-latest
    needs: regression-tests        # Only runs if tests pass
    if: github.ref == 'refs/heads/main'

    steps:
      - uses: actions/checkout@v4

      - name: Configure AWS credentials
        uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: ${{ secrets.AWS_ROLE_ARN }}
          aws-region:     ${{ env.AWS_REGION }}

      - name: Log in to ECR
        id: login-ecr
        uses: aws-actions/amazon-ecr-login@v2

      - name: Build and push Docker image
        run: |
          IMAGE_TAG=$(git rev-parse --short HEAD)
          docker build -t $ECR_REGISTRY/$ECR_REPOSITORY:$IMAGE_TAG .
          docker push    $ECR_REGISTRY/$ECR_REPOSITORY:$IMAGE_TAG
          echo "IMAGE_TAG=$IMAGE_TAG" >> $GITHUB_ENV

      - name: Deploy to Lambda
        run: |
          aws lambda update-function-code \
            --function-name $LAMBDA_FUNCTION \
            --image-uri     $ECR_REGISTRY/$ECR_REPOSITORY:$IMAGE_TAG \
            --region        $AWS_REGION

          aws lambda wait function-updated \
            --function-name $LAMBDA_FUNCTION

          echo "Deployed image $IMAGE_TAG to $LAMBDA_FUNCTION"

Key decisions:

needs: regression-tests — deploy job won't start if tests fail
OIDC role assumption (no long-lived keys in secrets)
lambda wait function-updated — ensures the function is actually updated before the job completes

Step 4: IAM OIDC Setup for GitHub Actions (No Long-Lived Keys)

The cleanest way to give GitHub Actions access to AWS is OIDC — a temporary credential that's scoped to your repo and expires after the job.

# infra/oidc.tf

data "aws_iam_openid_connect_provider" "github" {
  url = "https://token.actions.githubusercontent.com"
}

resource "aws_iam_role" "github_actions" {
  name = "github-actions-warrantyai"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Effect    = "Allow"
      Principal = { Federated = data.aws_iam_openid_connect_provider.github.arn }
      Action    = "sts:AssumeRoleWithWebIdentity"
      Condition = {
        StringLike = {
          "token.actions.githubusercontent.com:sub" = "repo:YOUR_ORG/warrantyai:ref:refs/heads/main"
        }
      }
    }]
  })
}

resource "aws_iam_role_policy" "github_actions_policy" {
  name = "github-actions-policy"
  role = aws_iam_role.github_actions.id

  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Effect   = "Allow"
        Action   = ["ecr:GetAuthorizationToken", "ecr:BatchCheckLayerAvailability",
                    "ecr:PutImage", "ecr:InitiateLayerUpload",
                    "ecr:UploadLayerPart", "ecr:CompleteLayerUpload"]
        Resource = "*"
      },
      {
        Effect   = "Allow"
        Action   = ["lambda:UpdateFunctionCode", "lambda:GetFunction"]
        Resource = aws_lambda_function.pipeline_processor.arn
      },
      {
        Effect   = "Allow"
        Action   = ["bedrock:InvokeModel"]
        Resource = "*"
      }
    ]
  })
}

Replace YOUR_ORG/warrantyai with your actual GitHub org and repo name. The StringLike condition locks the role to your main branch only PRs get the regression test job but not deploy permissions.

What This Catches (and What It Doesn't)

It catches:

Prompt edits that shift classification behaviour on known edge cases
Model version changes that affect output structure
Output parser changes that break field extraction
Accidental removal of instructions that were doing real work

It doesn't catch:

Brand-new edge cases you haven't added to fixtures yet
Latency regressions (add a separate latency benchmark for this)
Cost regressions from prompt bloat (add token counting)

The fixture set is a living document. Every time a production bug surfaces from a new edge case, add a fixture for it. The test suite gets more valuable over time, not less.

The One Thing Worth Knowing

The first time you run this on an existing project, it will probably fail. Not because your agent is bad — but because you'll discover that your "obvious" classifications aren't as consistent as you thought.

That's the test suite doing its job. Fix the fixtures (or fix the agent), and you now have a baseline. Every future change is measured against that baseline.

That's the whole point.

if you think what its to do with warrantyAI

This is a solution which I am building to learn and implement AI systems.

Building WarrantyAI: AI Platform Engineer's 2026 North-Star Goal | Harish Aravindan posted on the topic | LinkedIn

🚀 𝗙𝗿𝗼𝗺 𝗗𝗲𝘃𝗢𝗽𝘀 𝘁𝗼 𝗠𝗟𝗢𝗽𝘀: 𝗪𝗲𝗲𝗸 𝟭 𝗼𝗳 𝗟𝗲𝗮𝗿𝗻𝗶𝗻𝗴 / 𝗕𝘂𝗶𝗹𝗱𝗶𝗻𝗴 "𝗪𝗮𝗿𝗿𝗮𝗻𝘁𝘆 𝗔𝗜" After years of managing cloud infrastructure and DevOps pipelines, I’ve officially started my transition into AI Platform Engineering. My north-star goal for 2026 is to build WarrantyAI: a production-grade, "warranty-aware" system that helps homeowners and office managers assess appliance health, identify "fine-print" gotchas, and optimize repair costs using Generative AI. 𝗧𝗵𝗲 𝗬𝗲𝗮𝗿-𝗟𝗼𝗻𝗴 𝗩𝗶𝘀𝗶𝗼𝗻: 𝗪𝗵𝘆 𝗮𝗻 𝗔𝗜 𝗣𝗹𝗮𝘁𝗳𝗼𝗿𝗺 𝗘𝗻𝗴𝗶𝗻𝗲𝗲𝗿 An AI Platform Engineer doesn't just "write prompts." They build the scalable "plumbing" that allows AI models to interact with real-world data securely and efficiently. My 12-month roadmap focuses on: 𝗠𝗟𝗢𝗽𝘀: Automating the lifecycle of models (training, deployment, monitoring). 𝗦𝗲𝗺𝗮𝗻𝘁𝗶𝗰 𝗔𝗿𝗰𝗵𝗶𝘁𝗲𝗰𝘁𝘂𝗿𝗲: Using Vector Databases to give AI a "long-term memory." 𝗖𝗹𝗼𝘂𝗱-𝗡𝗮𝘁𝗶𝘃𝗲 𝗔𝗜: Leveraging AWS resources like Bedrock and S3 Lakehouses for cost-effective scale. 𝗪𝗲𝗲𝗸 𝟭 𝗟𝗲𝗮𝗿𝗻𝗶𝗻𝗴: 𝗕𝗿𝗲𝗮𝗸𝗶𝗻𝗴 𝘁𝗵𝗲 "𝗕𝗹𝗮𝗰𝗸 𝗕𝗼𝘅" This week, I focused on Retrieval-Augmented Generation (RAG). Instead of just asking an AI what a general warranty looks like, I fed it a specific 5-page LG Refrigerator Warranty PDF and asked it to find the hidden costs. GitHub Repo: https://lnkd.in/g-hkGJ6M 𝗖𝗼𝗿𝗲 𝗖𝗼𝗻𝗰𝗲𝗽𝘁𝘀: 𝗘𝗺𝗯𝗲𝗱𝗱𝗶𝗻𝗴𝘀 (𝗧𝗵𝗲 "𝗧𝗿𝗮𝗻𝘀𝗹𝗮𝘁𝗼𝗿"): Using Amazon Titan Text Embeddings v2 to convert human text into mathematical vectors. 𝗩𝗲𝗰𝘁𝗼𝗿 𝗜𝗻𝗱𝗲𝘅𝗶𝗻𝗴 (𝗧𝗵𝗲 "𝗟𝗶𝗯𝗿𝗮𝗿𝘆"): Implementing FAISS to store these vectors so the AI can search by "meaning" rather than keywords. 𝗠𝗲𝘀𝘀𝗮𝗴𝗲𝘀 𝗔𝗣𝗜 (𝗧𝗵𝗲 "𝗖𝗼𝗻𝘃𝗲𝗿𝘀𝗮𝘁𝗶𝗼𝗻"): Transitioning from legacy string-based prompts to the modern, structured messages format required by models like Ministral-3-8b. 𝗖𝗼𝗻𝗰𝗹𝘂𝘀𝗶𝗼𝗻 & 𝗡𝗲𝘅𝘁 𝗦𝘁𝗲𝗽𝘀 Week 1 taught me that AI Engineering is 20% prompting and 80% data engineering and infrastructure. By mastering the native APIs and vector logic, I’ve built a foundation that isn't just "chatty"—it's accurate. 𝗡𝗲𝘅𝘁 𝘄𝗲𝗲𝗸: I’ll be moving these vectors into a persistent S3 Lakehouse using Apache Iceberg to ensure our data remains organized and queryable as we scale to hundreds of appliances. #MLOps #AWSBedrock #AIPlatformEngineer #WarrantyAI #GenerativeAI #DevOpsToAI #ApacheIceberg https://lnkd.in/g-hkGJ6M

linkedin.com

Your Bedrock Bill Is a Ticking Clock — Here's How to Stop It

Harish Aravindan — Thu, 12 Mar 2026 05:56:54 +0000

You deploy a Lambda that calls Bedrock. It works beautifully in testing.

Then someone runs a batch job, a retry loop goes wrong, or traffic spikes and your AWS bill at the end of the month looks like a phone number.

Bedrock has no built-in spend cap. No circuit breaker. No "stop after $X." It will happily invoke your model ten thousand times before you notice anything is wrong.

This post is about the patterns that prevent that applied specifically to serverless AI workloads on AWS.

Why Bedrock Cost Blowups Happen

Bedrock charges per input token and output token. The pricing varies by model:

Model	Input (per 1K tokens)	Output (per 1K tokens)
Claude Haiku	~$0.00025	~$0.00125
Claude Sonnet	~$0.003	~$0.015
Claude Opus	~$0.015	~$0.075

Haiku looks cheap and it is, until you're running it at scale with large prompts. A 2,000 token prompt + 500 token response at Haiku pricing is about $0.0007 per call. At 100,000 calls per day that's $70/day, $2,100/month. From a single Lambda function.

The three failure modes that turn a reasonable bill into a bad one:

1. Unbounded retry loops. Lambda retries failed invocations automatically. If your Bedrock call fails and you don't handle it properly, Lambda will retry it twice tripling your token spend on every failure.

2. Prompt size creep. You add context, history, or document content to your prompt over time. Input tokens grow. You don't notice because the latency stays roughly the same but the cost per call has doubled.

3. No model fallback logic. You default to Sonnet for everything because it performs better. You never switch to Haiku for the 80% of calls where Haiku would have been fine.

Pattern 1: Model Tiering Use the Cheapest Model That's Good Enough

The most impactful cost control you can add. Route calls to the cheapest model that can handle the task, with automatic escalation when confidence is low.

import boto3
import json

bedrock = boto3.client("bedrock-runtime", region_name="ap-south-1")

HAIKU  = "anthropic.claude-haiku-4-5-20251001"
SONNET = "anthropic.claude-sonnet-4-6"

def invoke_with_tiering(prompt: str, require_confidence: bool = True) -> dict:
    """
    Always try Haiku first.
    If confidence score < threshold, escalate to Sonnet.
    Returns: {"result": str, "model_used": str, "escalated": bool}
    """

    haiku_prompt = f"""{prompt}

After your response, on a new line write exactly:
CONFIDENCE: <score between 0.0 and 1.0>"""

    haiku_response = invoke_bedrock(HAIKU, haiku_prompt)
    confidence     = extract_confidence(haiku_response)

    if not require_confidence or confidence >= 0.75:
        return {
            "result":     clean_response(haiku_response),
            "model_used": "haiku",
            "escalated":  False,
        }

    # Escalate to Sonnet
    sonnet_response = invoke_bedrock(SONNET, prompt)
    return {
        "result":     sonnet_response,
        "model_used": "sonnet",
        "escalated":  True,
    }


def invoke_bedrock(model_id: str, prompt: str) -> str:
    response = bedrock.invoke_model(
        modelId=model_id,
        body=json.dumps({
            "anthropic_version": "bedrock-2023-05-31",
            "max_tokens":        1024,
            "messages": [{"role": "user", "content": prompt}]
        })
    )
    body = json.loads(response["body"].read())
    return body["content"][0]["text"]


def extract_confidence(text: str) -> float:
    for line in reversed(text.strip().split("\n")):
        if line.startswith("CONFIDENCE:"):
            try:
                return float(line.split(":")[1].strip())
            except ValueError:
                pass
    return 1.0  # Assume high confidence if parsing fails


def clean_response(text: str) -> str:
    lines = text.strip().split("\n")
    return "\n".join(
        l for l in lines if not l.startswith("CONFIDENCE:")
    ).strip()

In practice, Haiku handles the majority of straightforward tasks when you classify by confidence. The cost difference between Haiku and Sonnet is roughly 12–15x per call so even a 70/30 split produces significant savings at scale.

Pattern 2: Token Counting Before You Invoke

Bedrock charges for tokens you send, not just tokens you receive. A prompt that accidentally includes a full document when it only needed a summary can cost 10x more than intended.

Count your tokens before invoking. If the prompt is above a threshold, truncate or summarize first.

def estimate_tokens(text: str) -> int:
    """
    Rough estimate: ~4 characters per token for English text.
    Use this as a pre-flight check, not for billing accuracy.
    """
    return len(text) // 4


MAX_INPUT_TOKENS = 1500   # Your cost-control threshold
HARD_MAX_TOKENS  = 4000   # Bedrock model limit buffer


def safe_invoke(prompt: str, context: str = "") -> dict:
    full_prompt    = f"{prompt}\n\nContext:\n{context}" if context else prompt
    estimated_toks = estimate_tokens(full_prompt)

    if estimated_toks > HARD_MAX_TOKENS:
        # Truncate context, keep prompt intact
        max_context_chars = (HARD_MAX_TOKENS - estimate_tokens(prompt)) * 4
        context           = context[:max_context_chars] + "... [truncated]"
        full_prompt       = f"{prompt}\n\nContext:\n{context}"
        estimated_toks    = estimate_tokens(full_prompt)

    if estimated_toks > MAX_INPUT_TOKENS:
        # Log a warning — this call is more expensive than expected
        print(f"[COST WARNING] Large prompt: ~{estimated_toks} tokens estimated")

    return invoke_with_tiering(full_prompt)

This catches the most common cause of unexpected cost spikes: context that grew over time without anyone noticing.

Pattern 3: Lambda-Level Rate Limiting with DynamoDB

Bedrock has service-level quotas, but they're per-account, not per-function. If you have multiple Lambda functions all calling Bedrock, one runaway function can exhaust your quota and spike your bill before the others even notice.

Add a lightweight rate limiter using DynamoDB atomic counters:

import time

dynamodb    = boto3.resource("dynamodb", region_name="ap-south-1")
rate_table  = dynamodb.Table("bedrock-rate-limits")

MAX_CALLS_PER_MINUTE = 60   # Per function, per minute window


def check_rate_limit(function_name: str) -> bool:
    """
    Returns True if call is allowed, False if rate limit exceeded.
    Uses DynamoDB atomic increment + TTL for automatic window reset.
    """
    minute_key = f"{function_name}#{int(time.time() // 60)}"

    response = rate_table.update_item(
        Key={"rate_key": minute_key},
        UpdateExpression=(
            "SET call_count = if_not_exists(call_count, :zero) + :one, "
            "expiry_ttl = :ttl"
        ),
        ExpressionAttributeValues={
            ":zero": 0,
            ":one":  1,
            ":ttl":  int(time.time()) + 120,  # 2-minute TTL, auto-cleanup
        },
        ReturnValues="UPDATED_NEW"
    )

    count = int(response["Attributes"]["call_count"])
    return count <= MAX_CALLS_PER_MINUTE


def rate_limited_invoke(function_name: str, prompt: str) -> dict:
    if not check_rate_limit(function_name):
        raise Exception(
            f"Rate limit exceeded for {function_name}. "
            f"Max {MAX_CALLS_PER_MINUTE} Bedrock calls/minute."
        )
    return safe_invoke(prompt)

The DynamoDB TTL means the counter auto-resets every window. No cron, no cleanup Lambda. Cost for this table at moderate usage is under $1/month.

Pattern 4: CloudWatch Alarm on Bedrock Invocation Spend

All three patterns above are reactive at the code level. You also need a proactive alert before the bill hits.

Bedrock publishes InvocationCount and InputTokenCount metrics to CloudWatch. Set an alarm on invocation count as a leading indicator — it's more reliable than waiting for billing alerts.

# Terraform — alert when Bedrock invocations exceed threshold
resource "aws_cloudwatch_metric_alarm" "bedrock_invocation_spike" {
  alarm_name          = "bedrock-invocation-spike"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = 2
  metric_name         = "InvocationCount"
  namespace           = "AWS/Bedrock"
  period              = 300          # 5-minute window
  statistic           = "Sum"
  threshold           = 500          # Adjust to your expected volume
  alarm_description   = "Bedrock invocations unusually high — check for runaway loops"

  dimensions = {
    ModelId = "anthropic.claude-haiku-4-5-20251001"
  }

  alarm_actions = [var.sns_alert_topic_arn]
}

Set the threshold at roughly 2x your expected peak volume. The alarm fires before cost becomes a problem, not after.

Pattern 5: Disable Lambda Retries for Bedrock Callers

This one is often overlooked. By default, Lambda retries asynchronous invocations twice on failure. If your Bedrock call times out or returns a throttling error, Lambda will invoke your function two more times automatically tripling the number of tokens consumed for that failure.

For Bedrock-calling Lambdas, set maximum retries to zero:

resource "aws_lambda_event_source_mapping" "bedrock_processor" {
  # ... your S3/SQS trigger config
  bisect_batch_on_function_error = true
}

resource "aws_lambda_function_event_invoke_config" "bedrock_caller" {
  function_name = aws_lambda_function.bedrock_processor.function_name

  maximum_retry_attempts = 0   # No automatic retries for Bedrock callers

  destination_config {
    on_failure {
      destination = aws_sqs_queue.bedrock_dlq.arn   # Failed events go to DLQ
    }
  }
}

Handle retries explicitly in your code with backoff logic, so you control when and how many times a Bedrock call is retried not Lambda's default behaviour.

Putting It Together

A production ready Bedrock caller in a serverless AI pipeline needs all five layers:

Request
  → rate_limited_invoke()        # Pattern 3: per-function rate limit
      → safe_invoke()            # Pattern 2: token count pre-flight
          → invoke_with_tiering()  # Pattern 1: Haiku first, Sonnet on escalation
              → CloudWatch alarm   # Pattern 4: spike detection
  Lambda retry = 0               # Pattern 5: no automatic retry blowup

None of these are complex individually. The value is in having all five in place before you hit production traffic not after the bill arrives.

Cost Reference: What This Saves

Assuming a pipeline processing 10,000 documents/day with an average 1,500 input tokens and 400 output tokens per call:

Setup	Model mix	Daily cost	Monthly cost
All Sonnet, no controls	100% Sonnet	~$210	~$6,300
Tiered (80% Haiku / 20% Sonnet)	Mixed	~$35	~$1,050
Tiered + token control (avg 10% reduction)	Mixed	~$31	~$945

The tiering alone is an 83% cost reduction. Token control and rate limiting are the safety net that keeps the tiering from being undone by a bad day.

Final Thought

These five patterns are cheap to add and expensive to skip. The DynamoDB rate limiter costs under $1/month. The CloudWatch alarm is free under AWS free tier limits. The model tiering requires no infrastructure changes at all.

None of this is complex. The value is in having all five in place before you hit production traffic not after the bill arrives.

If you're running Bedrock in production and have hit a cost gotcha not covered here, drop it in the comments would be good to build out this list further.

DynamoDB as a State Machine: How I Stopped Paying for Redundant Lambda Executions

Harish Aravindan — Sun, 08 Mar 2026 15:17:15 +0000

Part of my warrantyAI build series — building an AI-powered warranty management system on AWS, one week at a time.

Building HITL into AI pipelines for high-risk decisions | Harish Aravindan posted on the topic | LinkedIn

𝗜 𝗽𝗮𝘂𝘀𝗲𝗱 𝗮𝗻 𝗔𝗜 𝗮𝗴𝗲𝗻𝘁 𝗺𝗶𝗱-𝘄𝗼𝗿𝗸𝗳𝗹𝗼𝘄 𝘁𝗵𝗶𝘀 𝘄𝗲𝗲𝗸 𝗮𝗻𝗱 𝗹𝗲𝘁 𝗮 𝗵𝘂𝗺𝗮𝗻 𝗱𝗲𝗰𝗶𝗱𝗲 𝘄𝗵𝗮𝘁 𝗵𝗮𝗽𝗽𝗲𝗻𝘀 𝗻𝗲𝘅𝘁 𝗪𝗲𝗲𝗸 𝟵 𝗼𝗳 𝗯𝘂𝗶𝗹𝗱𝗶𝗻𝗴 𝘄𝗮𝗿𝗿𝗮𝗻𝘁𝘆𝗔𝗜 - and this one changed how I think about AI pipelines. 𝗧𝗵𝗲 𝗽𝗿𝗼𝗯𝗹𝗲𝗺: Week 8's pipeline classified warranties and sent reminders automatically. Fine for low and medium risk. But for high-risk documents — expired warranties, missing serial numbers, suspicious policy terms — no one should be auto-sending anything without a human in the loop. And honestly? The AI world is finally catching up to this thinking. OpenAI, Anthropic, and Google have all started baking 𝗛𝘂𝗺𝗮𝗻-𝗶𝗻-𝘁𝗵𝗲-𝗟𝗼𝗼𝗽 (𝗛𝗜𝗧𝗟) patterns into their agent frameworks — not as an afterthought, but as a core design primitive. The reason is simple: LLMs are probabilistic. They're very good at pattern recognition across millions of documents. They're not good at knowing when they're wrong. A confident wrong answer from a classifier in a warranty pipeline doesn't just fail silently — it sends a notification to a real customer. 𝗛𝗜𝗧𝗟 𝗶𝘀 𝘁𝗵𝗲 𝗰𝗶𝗿𝗰𝘂𝗶𝘁 𝗯𝗿𝗲𝗮𝗸𝗲𝗿. You let the AI handle the 90% it's genuinely better at — reading documents, extracting structure, classifying risk — and you bring humans in precisely at the 10% where consequences matter. That's not a limitation of the AI. That's good system design. So I built it. Here's the new flow: 📄 𝗥𝗲𝗮𝗱𝗲𝗿 → 🔍 𝗖𝗹𝗮𝘀𝘀𝗶𝗳𝗶𝗲𝗿 → 🛑 𝗛𝗜𝗧𝗟 𝗔𝗴𝗲𝗻𝘁 → 📨 𝗥𝗲𝗺𝗶𝗻𝗱𝗲𝗿 The HITL agent does three things when risk = HIGH: 1. Serialises the full pipeline state to DynamoDB 2. Sends an SNS email to the reviewer with ✅ Approve and ❌ Reject links 3. Raises NodeInterrupt — LangGraph pauses the graph completely The pipeline just... stops. Waits. When the reviewer clicks Approve, a second Lambda fires, reads the DynamoDB state, and re-invokes the pipeline — but only the Reminder agent. Everything before that already ran. When they click Reject, an SNS notification goes to the tenant. No reminder. Full audit trail in S3. For medium and low risk? HITL is skipped entirely. Zero delay. 𝗪𝗵𝗮𝘁 𝗜 𝗹𝗲𝗮𝗿𝗻𝗲𝗱 𝗯𝘂𝗶𝗹𝗱𝗶𝗻𝗴 𝘁𝗵𝗶𝘀: — LangGraph's NodeInterrupt is surprisingly clean. One raise, graph pauses. — DynamoDB as a state checkpoint is more reliable than I expected (7-day TTL, on-demand billing) — The hardest part wasn't the code. It was deciding what "high risk" actually means. The best AI systems in production today aren't the ones running fully autonomously. They're the ones that know exactly when to stop and ask. Stack: LangGraph + AWS Bedrock + DynamoDB + SNS + API Gateway + Lambda Repo: https://lnkd.in/gYrC3wEW Week 10: CI/CD for the whole pipeline + prompt regression tests. #bedrock #aws #serverless #langchain #aiengineering #building #ai #agents

linkedin.com

Most people reach for DynamoDB when they need a fast key-value store. I did too.

Then I started using it as a state machine — and accidentally cut the redundant Lambda execution cost out of my AI pipeline entirely.

Here's the pattern.

The Problem: AI Pipelines That Don't Know When to Stop

In Week 8 of building warrantyAI, I had a 3-agent LangGraph pipeline:

Reader → Classifier → Reminder

Every document that came in ran the full pipeline. Reader extracted text with Textract. Classifier invoked Bedrock (Claude Haiku, fallback to Sonnet). Reminder generated a notification and published to SNS.

That's fine when every document should proceed to the end. But in Week 9, I added a human review step for high-risk warranties. The pipeline needed to:

Pause after classification
Wait for a human decision (could be hours, could be days)
Resume from exactly where it stopped — not re-run everything

The naive approach would be to re-invoke the full pipeline on resume. Reader runs again. Textract runs again. Classifier calls Bedrock again. You pay for all of it twice.

With DynamoDB as the state checkpoint, the resumed execution runs only the Reminder agent. Everything before it is already stored.

The Pattern: Checkpoint State, Not Just Data

The key mental shift: DynamoDB isn't storing the result of your pipeline. It's storing the entire state of your pipeline at the moment it paused.

Here's the DynamoDB schema I use:

Field	Type	Value
`document_id`	PK	Unique per document
`sk`	SK	Always `"REVIEW"`
`status`	S	`pending_review` / `approved` / `rejected`
`warranty_state`	S	Full pipeline state as JSON string
`created_at`	S	ISO 8601 timestamp
`ttl`	N	Unix epoch — auto-expires after 7 days

The warranty_state field holds everything: raw extracted text, classification result, risk level, model used, guardrail flags, audit log. The entire WarrantyState TypedDict serialised as a JSON string.

When the pipeline resumes, it deserialises that field and picks up exactly where it left off.

def write_to_dynamodb(state: WarrantyState) -> None:
    table = dynamodb.Table(HITL_TABLE)
    ttl   = int(time.time()) + (7 * 86400)  # 7-day auto-expiry

    table.put_item(Item={
        "document_id":    state["document_id"],
        "sk":             "REVIEW",
        "tenant_id":      state["tenant_id"],
        "status":         "pending_review",
        "warranty_state": json.dumps(state, default=str),
        "created_at":     datetime.now(timezone.utc).isoformat(),
        "ttl":            ttl,
    })

And on resume:

def get_review_record(document_id: str) -> dict:
    table    = dynamodb.Table(HITL_TABLE)
    response = table.get_item(Key={"document_id": document_id, "sk": "REVIEW"})
    item     = response.get("Item")

    if item.get("status") != "pending_review":
        raise ValueError(f"Already actioned: {item.get('status')}")

    return item

# In resume Lambda:
record         = get_review_record(document_id)
warranty_state = json.loads(record["warranty_state"])  # full state restored

# Run only the Reminder agent — Reader and Classifier already ran
reminder_update = reminder_agent(warranty_state)

No Textract. No Bedrock classification call. Just the Reminder.

The Full HITL Flow

S3 Upload → Lambda trigger
     │
     ▼
Reader Agent      (Textract + Bedrock Haiku structuring)
     │
     ▼
Classifier Agent  (Bedrock Haiku → Sonnet fallback if confidence < 0.7)
     │
     ▼
HITL Agent ──── risk != "high" ──────────────────────────┐
     │                                                     │
     │ risk == "high"                                      │
     ▼                                                     │
Write full state to DynamoDB                              │
     │                                                     │
     ▼                                                     │
SNS email to reviewer                                      │
(approve/reject links)                                     │
     │                                                     │
     ▼                                                     │
NodeInterrupt — graph pauses                              │
                                                           │
     Reviewer clicks link                                  │
     → API Gateway                                         │
     → resume Lambda                                       │
          │                                                │
          ├── APPROVE → run_from_reminder(state)           │
          └── REJECT  → SNS to tenant, stop               │
                                                           │
                                                    Reminder Agent
                                                           │
                                                    SNS to tenant

For medium and low risk documents, the HITL node is skipped entirely — the graph flows straight through to Reminder with no pause, no DynamoDB write, no cost.

Why DynamoDB Over Other Options

When I was designing this, I considered three approaches:

SQS with visibility timeout — messages can be "in flight" for up to 12 hours. Not enough for a human review that might sit overnight. Also, you can't query by document_id easily.

S3 as state store — works, but you're polling or using S3 notifications to detect resume. Awkward.

DynamoDB — point lookups by document_id, TTL handles cleanup automatically, on-demand billing means you pay per read/write not per hour, and the Streams feature gives you a path to event-driven resume if you want it later.

The on-demand billing matters more than it sounds. A warranty pipeline doesn't process documents at a steady rate. Some days 500 documents, some days 5. With provisioned capacity you're paying for peak all the time. With on-demand you pay for actual usage.

At my current volume, the DynamoDB cost for the HITL table is under $0.50/month.

The TTL Trick

This is the part I underestimated when I first built this.

Every review record gets a ttl field set 7 days from creation:

ttl = int(time.time()) + (7 * 86400)

DynamoDB's TTL feature automatically deletes expired items — no cron, no cleanup Lambda, no cost. Unactioned reviews just disappear. This matters because:

Stale review records don't accumulate
Storage costs stay flat regardless of volume
You don't need to build a cleanup process

The one thing to know: TTL deletion isn't instant. DynamoDB typically cleans up within 48 hours of expiry. If you need exact expiry (e.g. the approve link should stop working at exactly 7 days), enforce it in your Lambda:

if item.get("status") != "pending_review":
    raise ValueError("Already actioned")

# Also check TTL manually if you need hard expiry
created = datetime.fromisoformat(item["created_at"])
if (datetime.now(timezone.utc) - created).days > 7:
    raise ValueError("Review expired")

The Cost Comparison

Here's what changed between Week 8 (no HITL) and Week 9 (HITL with DynamoDB state):

	Week 8	Week 9
High-risk doc: Bedrock calls	2 (classify + reminder gen)	1 (reminder only, on approve)
High-risk doc: Textract	Yes, every run	Once, state stored
Redundant re-processing	On every retry	Zero
State cleanup	Manual	Automatic via TTL
DynamoDB cost	$0	<$0.50/month

The Bedrock saving is the real one. Claude Haiku is cheap (~$0.0004/call) but Sonnet fallback is ~$0.006/call. If a high-risk document triggered the Sonnet fallback and you re-ran the pipeline on resume, you'd pay for Sonnet twice. With DynamoDB state, classification runs once and the result is stored.

At low volume this is pennies. At scale — thousands of documents per day with a meaningful percentage flagged as high-risk — it adds up quickly.

What's Next

Week 10 adds CI/CD to the pipeline — GitHub Actions deploying to Lambda via ECR, with prompt regression tests so a bad Bedrock prompt doesn't silently break classification in production.

The DynamoDB state pattern from this week sets that up nicely: because state is checkpointed, regression tests can inject a known state at any node in the graph and assert the output without running the full pipeline.

Serverless Bedrock: How I invoke Claude from Lambda in warrantyAI

Harish Aravindan — Tue, 03 Mar 2026 17:21:42 +0000

Every week I ship a new piece of warrantyAI — an AI-powered warranty management system I'm building on AWS. This week was Week 8: a 3-agent LangGraph pipeline wired to Bedrock.

Before the agents could do anything, I needed one thing to work cleanly: invoking Claude from a Lambda function without a server, without a container fleet, without an inference endpoint sitting idle burning money.

Building warrantyAI on AWS with AI-powered pipeline | Harish Aravindan posted on the topic | LinkedIn

𝗪𝗲𝗲𝗸 𝟴 𝗼𝗳 𝗯𝘂𝗶𝗹𝗱𝗶𝗻𝗴 𝘄𝗮𝗿𝗿𝗮𝗻𝘁𝘆𝗔𝗜 👉 𝗙𝗼𝗿 𝘁𝗵𝗼𝘀𝗲 𝘀𝗲𝗲𝗶𝗻𝗴 𝘁𝗵𝗶𝘀 𝗳𝗼𝗿 𝘁𝗵𝗲 𝗳𝗶𝗿𝘀𝘁 𝘁𝗶𝗺𝗲 I'm a Senior Cloud Engineer building an AI-powered warranty management system on AWS — from scratch, one week at a time. No shortcuts. Real architecture. Real cost numbers. 𝗧𝗵𝗶𝘀 𝘄𝗲𝗲𝗸: 𝗜 𝘄𝗶𝗿𝗲𝗱 𝟯 𝗔𝗜 𝗮𝗴𝗲𝗻𝘁𝘀 𝘁𝗼𝗴𝗲𝘁𝗵𝗲𝗿 𝘂𝘀𝗶𝗻𝗴 𝗟𝗮𝗻𝗴𝗚𝗿𝗮𝗽𝗵 The problem warrantyAI solves: Most people lose track of their warranties. Appliances expire. Repairs get denied. Money is wasted. warrantyAI reads the document, classifies the risk, and reminds you before it’s too late. This week I built the core pipeline that makes that happen. Reader → Classifier → Reminder 📄 𝗥𝗲𝗮𝗱𝗲𝗿 𝗔𝗴𝗲𝗻𝘁 Customer uploads a warranty PDF to S3. Textract pulls the raw text. Bedrock Haiku structures it into named fields — product, brand, expiry date, serial number. 🔍 𝗖𝗹𝗮𝘀𝘀𝗶𝗳𝗶𝗲𝗿 𝗔𝗴𝗲𝗻𝘁 Takes those fields and classifies the warranty. Haiku first — fast and cheap. If confidence drops below 70%, automatically retries with Sonnet. GovernanceShield guardrail (built in Week 7) runs on every invocation. Outputs: category, expiry date, risk level. 🔔 𝗥𝗲𝗺𝗶𝗻𝗱𝗲𝗿 𝗔𝗴𝗲𝗻𝘁 Reads risk level from shared state. Generates a human-readable notification via Haiku. Publishes to SNS — but only for medium and high risk. Low risk? Message generated, not sent. Deliberate FinOps decision. SNS isn’t free at scale. Repo : https://lnkd.in/gsndTpQV 𝗪𝗵𝗮𝘁 𝗵𝗲𝗹𝗱 𝘁𝗵𝗶𝘀 𝘁𝗼𝗴𝗲𝘁𝗵𝗲𝗿: 𝗪𝗮𝗿𝗿𝗮𝗻𝘁𝘆𝗦𝘁𝗮𝘁𝗲 One typed Python dict shared across all 3 agents. No message queues between agents. No shared database mid-pipeline. Each agent reads from it, writes back a partial update. LangGraph handles the sequencing. What connected cleanly from previous weeks: ✔ Week 7 GovernanceShield guardrail — one import, plugged straight in ✔ Per-agent IAM roles already existed — zero new permissions needed ✔ S3 audit bucket already live — all 3 agents write to it Building incrementally pays off. What’s your multi-agent orchestration framework of choice right now? #AIPlatformEngineering #LangGraph #AWSBedrock #warrantyAI #Serverless #AI

linkedin.com

Here's exactly how I did it.

Why serverless + Bedrock is the right combo

Bedrock's invoke_model API is synchronous and stateless. It takes a request, returns a response. That's exactly what Lambda is built for. No warm model, no GPU instance, no ECS cluster. You pay per invocation, per token.

For warrantyAI's workload — sporadic document uploads, not a real-time chat product — this matters. My entire system runs under $1.30/day.

The setup: IAM first, always

Before any code, the Lambda execution role needs this policy:

{
  "Effect": "Allow",
  "Action": [
    "bedrock:InvokeModel",
    "bedrock:InvokeModelWithResponseStream"
  ],
  "Resource": [
    "arn:aws:bedrock:ap-south-1::foundation-model/anthropic.claude-haiku-4-5-20251001",
    "arn:aws:bedrock:ap-south-1::foundation-model/anthropic.claude-sonnet-4-6"
  ]
}

Scope it to specific model ARNs. Not *. Ever.

The invoke wrapper

This is the core function I reuse across all 3 agents in warrantyAI:

import json
import boto3

bedrock = boto3.client("bedrock-runtime", region_name="ap-south-1")

HAIKU  = "anthropic.claude-haiku-4-5-20251001"
SONNET = "anthropic.claude-sonnet-4-6"

def invoke_bedrock(prompt: str, model_id: str = HAIKU, max_tokens: int = 512) -> str:
    """
    Invoke a Bedrock Claude model from Lambda.
    Returns the text response as a string.
    """
    response = bedrock.invoke_model(
        modelId=model_id,
        contentType="application/json",
        accept="application/json",
        body=json.dumps({
            "anthropic_version": "bedrock-2023-05-31",
            "max_tokens": max_tokens,
            "messages": [
                {"role": "user", "content": prompt}
            ]
        })
    )
    body = json.loads(response["body"].read())
    return body["content"][0]["text"].strip()

That's it. Stateless, reusable, testable in isolation.

Haiku-first, Sonnet fallback

Haiku is fast and cheap. Sonnet is accurate and expensive. In warrantyAI's Classifier agent, I try Haiku first. If it returns low confidence, I retry with Sonnet automatically:

def classify_warranty(structured_data: dict) -> dict:
    prompt = build_classify_prompt(structured_data)

    # Attempt 1: Haiku
    result = invoke_bedrock(prompt, model_id=HAIKU)
    parsed = json.loads(result)

    # Fallback: Sonnet if confidence < 0.7
    if parsed.get("confidence", 0) < 0.7:
        result = invoke_bedrock(prompt, model_id=SONNET)
        parsed = json.loads(result)
        parsed["model_used"] = "sonnet"
    else:
        parsed["model_used"] = "haiku"

    return parsed

In practice, Haiku handles ~85% of documents. Sonnet kicks in for complex commercial warranties with ambiguous clause structures.

Three things that will burn you

1. The body is a StreamingBody, not a string.
Always call .read() before json.loads(). Forget this once and you'll spend 20 minutes confused.

# Wrong
body = json.loads(response["body"])

# Right
body = json.loads(response["body"].read())

2. Token limits on Lambda payloads.
Lambda has a 6MB synchronous response limit. Bedrock responses are usually tiny, but if you're passing large documents in your prompt, chunk them first. I cap prompts at 4,000 characters in the Reader agent.

3. Bedrock is regional.
Not all models are available in all regions. ap-south-1 (Mumbai) supports Haiku and Sonnet. If you get a ResourceNotFoundException, check model availability in your region first before debugging your code.

Cost reality check

For warrantyAI's workload (roughly 50 documents/day):

Model	Avg tokens/call	Cost/call	Daily cost
Haiku	~800	~$0.0004	~$0.017
Sonnet (15% of calls)	~800	~$0.006	~$0.005

Total Bedrock cost: under $0.025/day for this workload.
The rest of my $1.30/day budget goes to Textract, SNS, and S3.

What's next

This pattern is the foundation for the entire warrantyAI pipeline. Next Sunday I'll cover how I wired these invocations into a LangGraph StateGraph — three agents, one shared state dict, no message queues.

Follow along if you're building serverless AI on AWS. I publish every Sunday in LinkedIn

This is part of the Serverless Meets AI series — practical AWS patterns from building warrantyAI.

Serverless Endpoint Monitoring - check uptime of your app

Harish Aravindan — Thu, 18 Jan 2024 03:12:25 +0000

Do you need to monitor your application endpoints and have a dashboard to check the detail - all this in a serverless way on AWS.
Let's see how it's done.

Solution Design

This setup creates a s3 webpage that takes in the required information for monitoring.
Backend contains a lambda to create eventbridge schedule (cron) and cloudwatch alarm.
The eventbridge schedule trigger will invoke a lambda to check the endpoint and update cloudwatch with custom metric of the response status code.

Requirements:

AWS knowledge of how to create lambda function | iam roles and s3 websites

Step 1 - Setup the Lambda Function

Clone the code from github

https://github.com/uptownaravi/ping_service.git

Create Lambda function checkEndpoint that will check the endpoint and update cloudwatch metric.
Use file checkEndpoint.py from the app folder and for the role for this lambda use the iam policy document in the file checkEndpointPolicy.json.

Replace all the url / region name / account number in the code.

Create Lambda function addPingEndpoint using the file addPingEndpoint.py in app folder and the related iam policy in addPingEndpointPolicy.json

This Lambda will be creating the eventbridge schedule with the payload of the requested service to check. That will be show in the target section once created. This is the connecting part that tells the checkEndpoint Lambda what url to check and which cloudwatch metric to update.

Enable Function URL for addPingEndpoint function as we need that to be added to the s3 website. Allow headers for accept and content-type
CORS needs to be enabled - check steps after s3 bucket for this.

Step 2 - S3 Website Creation

Next create s3 website using the code in web folder
In the file app.html update the function-url to the lambda function url created in the previous step.
Steps to create a website in s3 https://docs.aws.amazon.com/AmazonS3/latest/userguide/HostingWebsiteOnS3Setup.html

Note for CORS in the addPingEndpoint Lambda function add the CORS origin of function url as the s3 bucket like - http://.s3-website..amazonaws.com
this will make sure the function url can be used from the s3 website.

After deploying the website and enabling static hosting
below page should be visible if you visit http://.s3-website..amazonaws.com

Step 3 - Test the solution

Enter the details in the webpage
Service Name: name of the service to monitor
Endpoint: URL to check
Details: description of the service
cron: cron expression of the schedule to check ( AWS Document https://docs.aws.amazon.com/eventbridge/latest/userguide/eb-cron-expressions.html )

after submitting we get the message

Check your EventBridge Schedules and CloudWatchAlarm / Metric Dashboard to see the results

Cloud watch metric showing the custom metric status code for the AnalyticsApp which we added the details in the webpage

and in alarms

This can be done for many services and all those will be available as custom metrics in their namespace.

Note There are no actions added to this alarm.
Add the required SNS alerts if required, to get the update on downtime of your endpoints.

AWS Lambda gets Python 3.11 runtime

Harish Aravindan — Sat, 29 Jul 2023 04:58:28 +0000

AWS released python 3.11 runtime official blog

There are changes to the default sys.path which can be important when migrating to this new runtime

/var/task/: User Function
/opt/python/lib/pythonX.Y/site-packages/: User Layer
/opt/python/: User Layer
/var/lang/lib/pythonX.Y/site-packages/: Pre-installed modules and default pip install location
/var/runtime/: No pre-installed modules

the /var/lang/lib/pythonX.Y/site-packages/ has been moved up the precedence so that it is searched before /var/runtime

apart from this there are a host of changes in 3.11 like tomllib file parsing and many changes
release notes https://docs.python.org/3.11/whatsnew/3.11.html

To control the minor version updates for lambda functions check the
runtime settings options under the code source --> edit runtime configuration management

details of each option https://docs.aws.amazon.com/lambda/latest/dg/runtimes-update.html#runtime-management-controls

initial python 3.11 was released on Oct 2022 and current version 3.11.4 was release on June 2023. A quick sys.version check on the lambda console shows the details. 3.11.4 (main, Jul 10 2023, 22:05:45) [GCC 7.3.1 20180712 (Red Hat 7.3.1-15)]

AWS EKS Deployment with Helm Chart using Codebuild and CodePipeline

Harish Aravindan — Sun, 16 Jul 2023 01:46:30 +0000

what is it about

Creating a deployment pipeline that install helm release in EKS cluster. We will see how to create workflow that uses the helm chart from CodeCommit --> Lint the chart --> pacakage and upload to s3 --> dry-run --> approval --> deploy to eks

Clone the Repo for the helper files https://github.com/uptownaravi/EKS_Deployment.git

Step 1 - IAM Roles and aws-auth configmap

Create a role to access eks using the file eks-deploy-role.json and add trust relationship for this role with eks-deploy-role-trust-relation.json

Add this role name in the aws-auth configmap. Create Kubernetes Role and Rolebinding for this. Make sure the username matches in aws-auth configmap and the rolebinding.
Also be careful when you edit the configmap as access to the cluster is based on this.

refer https://docs.aws.amazon.com/eks/latest/userguide/add-user-role.html

Then we create role for codebuild service role with the file codebuild-pyapp-service-role.json
Codebuild needs access to codecommit, s3 for publishing the helm chart, EKS API and Cloudwatch logs.

Codebuild service role should be able to assume the eks-deploy-role so make sure the trust relationship allows that.

Step 2 - CodeBuild projects

Two code build projects are required.

First to lint, upload the helm chart to s3 and perform dry run of the install. Use the file buildspec_prepare.yaml to create the codebuild project.

We lint the chart, package, upload to s3 (using helm s3 plugin). and perform dry-run.

helm s3 plugin reference https://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/set-up-a-helm-v3-chart-repository-in-amazon-s3.html

Second project to perform actual deployment with helm install/upgrade. Using the file buildspec_deploy.yaml to create the project.

replace the account numbers and other variables as required. Add the path of the helm if it's in different folder.

most of the steps like installing the tools and plugins in the buildspec file can be baked into a docker image and used during prepare/deploy. The idea is to show how the process works so added those commands individually.

Step 3 - Pipeline

Create a Code pipeline with 4 stages

source stage as the git repo where the helm chart is available

Second stage is codebuild prepare project which runs the validation and dry-run

third stage is manual approve, so we can check the output of helm lint and dry-run.

fourth stage is codebuild deploy project which does helm install/upgrade.

Please comment your feedback.

Pull Request Validation for AWS CodeCommit using Lambda and CodeBuild

Harish Aravindan — Sat, 08 Jul 2023 17:25:45 +0000

what is it about

Need to lint Dockerfile or perform a CI test/check when a Pull Request is raised. Lets see how to build a solution for this on AWS Codecommit using Codebuild and Lambda to perform the check when a PR is raised or updated.

overview of what we are building

We will take a sample solution here to perform a hado lint check on a dockerfile when a pr is raised in CodeCommit

Step 1 - Use the code from GitHub repo to create code build and lambda functions.

clone the repo
https://github.com/uptownaravi/aws_codecommit_pr_validate.git
Note - add required region and account number and resource names in the files.

use the file buildspec.yaml to create the CodeBuild project
and refer the policy file codebuild_role.json for the essential permissions required ( CodeBuild needs access to CodeCommit to clone and comment on PR )

create lambda function using lambda_function.py and for the policy required check file lambda_iam_role.json ( Permission to start CodeBuild Project )

Step 2 - Create event bridge that connect all the parts together

create an event bride rule to trigger when a pr status update change occurs.

sample for the event.

{
  "source": ["aws.codecommit"],
  "detail-type": ["CodeCommit Pull Request State Change"],
  "resources": ["arn:aws:codecommit:< region >:< account number >:< 
 repository name>"]
}

Add the target as the lambda function which was created in the above step.

Step 3 - Raise a PR and check if Dockerfile is linted and comments are being added

once you raise PR in CodeCommit the event bridge rule reacts to that. Lambda function runs and collects the information required to do the CI test/check. Starts the code build with the information as override parameters and environment values.
Then Code Build runs the hado lint on the Dockerfile adds the result to CodeCommit through aws cli commands.

Comment made in pr after the check

Step 4 - Customize for further CI test for pull requests

we saw one sample of Dockerfile file check, this solution can be extended to add different types of test or checks on creation/update of pull requests.

Thank you for reading. Please comment if any suggestions to improve.

GitHub Action for Commit Message Validation

Harish Aravindan — Wed, 26 Apr 2023 12:57:07 +0000

What is it about

Have you been in a situation where commit message does not convey the detail of what the code is intended for?
well we can have a validation for that at the repository itself.

Using GitHub actions we can validate commit messages if they have relevant details like story numbers for which the code is being added and so on.

How it works

have published a GitHub Action that helps in validating https://github.com/marketplace/actions/commit-meessage-check

use this in your workflow as step, with your regex of required validation

here is a sample to check if the message has Jira story id

on: [push]

jobs:
  hello_world_job:
    runs-on: ubuntu-latest
    name: commit-message-validation
    steps:
      - uses: actions/checkout@v3
      - id: foo
        uses: uptownaravi/verify-commit-message-action@v2
        with:
          regex: '(?i)jira-[0-9]{3,}'

if not then the job fails with exit code 1

The validation happens on the python file https://github.com/uptownaravi/verify-commit-message-action/blob/main/commitcheck.py

Clean up unused aws ebs volumes with lambda function

Harish Aravindan — Thu, 20 Apr 2023 15:01:49 +0000

what is it about

Recently came across unused ebs which was increasing the AWS bills. They were redundant from testing and development. To automate the removal process wrote the below lambda function that will scan for unattached volumes, tag for deletion and send an email notification. Then removed after a day.

solution overview

scan for unused ebs volume with status if available
Tag those for deletion
Add that list to Dynamodb, so we can check back the next day
Send email notifications on the volumes
The user will remove the deletion tag if the volume is required
if the delete tag is available the next day, the volume is deleted
email summary

Deploying the solution

clone the repository https://github.com/uptownaravi/aws-ebs-cleanup.git

We need a lambda function, dymanodb table, sns topic (with email subscription) and IAM roles setup to run this.

First let's create the IAM role using the file iam.json. Edit the account numbers and resource names as required. The file has 3 different inline policies which enable the lamdba function to access ebs, dynamodb and sns

Create the Dynamodb table and SNS topic ( also an email subscribed to that topic to get the summary of the cleanup )

Then create the lambda function using the file cleanupebs.py
Use the execution role as the one created in the first step.

Change the table names and SNS topic arn
https://github.com/uptownaravi/aws-ebs-cleanup/blob/main/cleanupebs.py#L9-L10
with the ones created in the second step

That's it, try a test run to check if the ebs volumes with available status are tagged also check email for the summary.

adding periodic trigger to the lambda function

Add a cron job using EventBridge Scheduler so that the function can be run every day at a specific time.

click on create schedule, give a name and for the schedule pattern

have added here cron (0 10 ? * MON-FRI *) which is 10 AM on from Mon to Friday

add the cron as required ( Flexible time window have selected off ) and click on next

In Target details elect AWS Lambda Invoke and select the function which we created earlier in the Invoke section. No input is required to be passed as the lambda functions.

Click on Next to review the configuration options, click Next again review all the inputs and create a schedule

email summary looks like the below

Please give your comments about this solution and what can be improved

Pull Request notification on Slack using AWS Lambda

Harish Aravindan — Sat, 04 Feb 2023 06:43:21 +0000

Pull request management can become hectic while working across multiple repositories. Asking for approvals individually is also a long process. How about getting notified through a common channel to save time for reviewers and developers.

This post details on how to create a notification system on slack for github pull requests.

Requirements
aws account, github repository, slack channel ( with permission to create a slack app and webhook )

Step 1 : Creating a slack app and webhook in a channel

Slack has a good document on this, please follow that. https://api.slack.com/tutorials/slack-apps-hello-world once that is done we would get a webhook from slack which would be similar to https://hooks.slack.com/services/aldnfaksndksakd/aljdfkajndkjasn/adfasdfakjdfnaksdfkajldakdnkasndlakjd

you can test this with the curl command given in the same document. Note that down webhook url which will be used later.

Step 2 : Creating Lambda and API gateway for processing the pull request event

we need to package the code required for notifications
clone or download the repo github.com/uptownaravi/pullRequestSlack

it will have lambda_function.py ( the logic ) and the requirements.txt ( dependency package )

we need to install that dependency and zip the files.
navigate to the cloned repository

pip install --target 'give the path of your current directory' -r requirements.txt

once that is done, we can see the below folder structure

then select all the files there except the readme.md and requirements.txt and zip them ( in windows select send to compressed (Zipped) folder )

give the zip a name and

log into the aws console and navigate to lambda
click on create function
- select author from scratch ( it should be the default )
- give the function a name
- select python 3.8

Then click on actions --> upload .zip file

after upload it should load the code as seen below

We need to add slack token which we got earlier as a
environment variables in the lambda function

scroll down in the lambda screen and click on manage environment variable --> add environment variable
fill in the details as below screen shot, use the webhook url which you got from slack. Make sure the key is slackNotification
( this is used in the lambda code to get that value )

Note: We can encrypt the value in transit using KMS keys.
but wanted to keep this blog simple
use this link if required https://docs.aws.amazon.com/lambda/latest/dg/configuration-envvars.html#configuration-envvars-encryption

API gateway creating and attaching this lambda

navigate to the api gateway in aws console and click on create api
then select http api --> click on build

then fill in the details as shown in the below image.
we need to use lambda as integration and select the lambda name we created in the last step ( make sure you are using the correct region )

click next and we will add the routes as shown below
using the post method here, so that github can post to this api

moving to next, add a stage name

next review and click on create

already auto deployment is enabled for this api.

so go to stages in the deploy section and copy the invoke url
this will be used in the github repository as post webhook

Step 3 : Adding the api gateway url to github

navigated to your github repistory --> click on settings --> then webhooks

add your api gateway in the below format URL/pull in the payload URL section
example: https://sample.execute-api.ap-south-1.amazonaws.com/dev/pull

( /pull is the route in api gateway, make sure the url will end with dev/pull so that that is used while calling the POST method )

select the Let me select individual events
and scroll down to select the Pull requests option and uncheck the Pushes

then click on add webhook

that's it, create a pull request in that repository and it should create a notification on slack channel

you can modify the lambda code to handle various events in the pull requests.

helm chart for fastAPI

Harish Aravindan — Sun, 29 Jan 2023 18:45:54 +0000

what is it about

packaging the fastapi as docker image and deploying as helm chart

Building docker image

Clone the repository

https://github.com/uptownaravi/LearnfastAPI.git

change directories into /FastAPI_HelmChart/
in the Dockerfile we have a ubuntu base image on which we build the layers to install application and dependencies.

Using my sample application from previous blog
https://dev.to/harisharavindan/learning-fastapi-with-a-sample-python-library-5f2n

build the image

docker build . -t fast:v2

Docker image built is stored in github package ( ref https://docs.github.com/en/packages/working-with-a-github-packages-registry/working-with-the-container-registry )

ghcr.io/uptownaravi/fast:v2

any changes required can be done to the dockerfile and build a new image from that

Creating an Helm chart

have created a basic helm chart which uses the image built from the above dockerfile

change directories in the cloned repository folder

cd FastAPI_HelmChart/helmChart-fastAPI/

explore the chart templates and values.

we have set the docker image to ghcr.io/uptownaravi/fast:v2
health check is at localhost:80/health
service is exposed at 80 as we had set that in the dockerfile

do a dry-run to check what is being installed
make sure to be in the directory FastAPI_HelmChart/helmChart-fastAPI/ where the Chart.yaml is available

helm install fastencode . --dry-run

this installs the helm release fastencode which creates the deployment, service and related resources

check the details of the installed chart

kubectl get all -l='app.kubernetes.io/instance=fastencode'

we can port-forward the service to check the app

kubectl port-forward svc/fastapi 8080:80

this will forward the port 80 of the pod to the local host 8080
check fastapi ui with url http://localhost:8080/docs

to uninstall the release

helm uninstall fastencode