<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Harish Aravindan</title>
    <description>The latest articles on Forem by Harish Aravindan (@harisharavindan).</description>
    <link>https://forem.com/harisharavindan</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F60752%2F331ec464-cbee-4654-a928-51753bf97bab.jpeg</url>
      <title>Forem: Harish Aravindan</title>
      <link>https://forem.com/harisharavindan</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/harisharavindan"/>
    <language>en</language>
    <item>
      <title>Your AI Agent Will Lie to You in Production — Here's How to Catch It Before It Ships</title>
      <dc:creator>Harish Aravindan</dc:creator>
      <pubDate>Wed, 18 Mar 2026 04:33:09 +0000</pubDate>
      <link>https://forem.com/harisharavindan/your-ai-agent-will-lie-to-you-in-production-heres-how-to-catch-it-before-it-ships-l77</link>
      <guid>https://forem.com/harisharavindan/your-ai-agent-will-lie-to-you-in-production-heres-how-to-catch-it-before-it-ships-l77</guid>
      <description>&lt;p&gt;You deploy an AI agent. It passes your manual tests. It looks good in the demo.&lt;/p&gt;

&lt;p&gt;Three weeks later, someone edits the system prompt to make the output "cleaner." The agent starts behaving differently on edge cases. No error. No alert. Just subtly wrong output — until someone notices.&lt;/p&gt;

&lt;p&gt;This post is about the CI/CD and prompt regression setup that prevents this. Everything here is practical and works today on AWS.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Problem With AI Agents in CI/CD
&lt;/h2&gt;

&lt;p&gt;Traditional software has a clear contract: given input X, function F returns output Y. Tests verify Y. If Y changes, the test fails, the build breaks, you investigate.&lt;/p&gt;

&lt;p&gt;LLM-based agents break this model. The "function" is a language model. The same input can produce slightly different outputs on every run. And the failure mode isn't an exception — it's a plausible-looking wrong answer.&lt;/p&gt;

&lt;p&gt;Three things make this worse in serverless AI pipelines:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Prompts aren't versioned like code.&lt;/strong&gt; Engineers edit them in a string in a Python file, or worse, in a config file outside version control. Nobody reviews a prompt change the way they'd review a code change.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Retries mask failures.&lt;/strong&gt; Lambda retries on error. Your retry logic retries on low-confidence responses. By the time a bad output surfaces, it's hard to trace it back to the prompt change that caused it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Silent degradation.&lt;/strong&gt; A classification agent that's 95% accurate and drops to 80% accurate won't throw an error. It'll just be wrong more often. You'll find out from downstream effects, not logs.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Fix: A Prompt Regression Test Suite
&lt;/h2&gt;

&lt;p&gt;The idea is simple. Lock a set of golden fixtures — known inputs with known correct outputs. Run your agent against them on every deploy. Fail the build if accuracy drops below a threshold.&lt;/p&gt;

&lt;p&gt;Here's the full setup.&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 1: Golden Fixture Format
&lt;/h2&gt;

&lt;p&gt;Each fixture is a JSON file in &lt;code&gt;tests/fixtures/&lt;/code&gt;. Structure:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"document_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"fixture_001"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"input"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"document_text"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Policy holder: Jane Smith. Coverage: accidental damage. Item: MacBook Pro 16-inch. Purchase date: 2023-08-15. Claim date: 2025-11-03. Damage description: Screen cracked after drop."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"tenant_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"test-tenant"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"expected"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"risk_level"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"MEDIUM"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"reminder_eligible"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"confidence_min"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.70&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Keep 20–30 fixtures. Cover your edge cases: borderline risk levels, ambiguous descriptions, missing fields, very old claims. These are the documents your agent gets wrong.&lt;/p&gt;

&lt;p&gt;Never auto-generate fixtures. Write them manually. The point is that you a human have decided what the correct output is.&lt;/p&gt;


&lt;h2&gt;
  
  
  Step 2: The Test Runner
&lt;/h2&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# tests/test_regression.py
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;glob&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pytest&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;agents.classifier&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;run_classifier&lt;/span&gt;

&lt;span class="n"&gt;FIXTURE_DIR&lt;/span&gt;  &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tests/fixtures&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;MIN_ACCURACY&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.90&lt;/span&gt;   &lt;span class="c1"&gt;# Fail the build if accuracy drops below this
&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;load_fixtures&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;paths&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;glob&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;glob&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;FIXTURE_DIR&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/*.json&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;fixtures&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;paths&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;fixtures&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;load&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;fixtures&lt;/span&gt;

&lt;span class="nd"&gt;@pytest.mark.parametrize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;fixture&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;load_fixtures&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;test_classifier_regression&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fixture&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;run_classifier&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;document_text&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;fixture&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;input&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;document_text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="n"&gt;tenant_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;fixture&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;input&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tenant_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;expected&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;fixture&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;expected&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;risk_level&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;expected&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;risk_level&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;fixture&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;document_id&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;] &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Expected risk_level=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;expected&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;risk_level&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;, &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;got &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;risk_level&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;confidence_min&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;expected&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;confidence&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="n"&gt;expected&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;confidence_min&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;fixture&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;document_id&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;] &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Confidence &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;confidence&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; below minimum &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;expected&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;confidence_min&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;test_overall_accuracy&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Separate test: fail the whole suite if aggregate accuracy &amp;lt; MIN_ACCURACY.
    This catches regression even when individual tests pass on edge cases.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;fixtures&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;load_fixtures&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;passed&lt;/span&gt;   &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;fixture&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;fixtures&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;run_classifier&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;document_text&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;fixture&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;input&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;document_text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="n"&gt;tenant_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;fixture&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;input&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tenant_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;risk_level&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;fixture&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;expected&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;risk_level&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
            &lt;span class="n"&gt;passed&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;

    &lt;span class="n"&gt;accuracy&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;passed&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fixtures&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="n"&gt;accuracy&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="n"&gt;MIN_ACCURACY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Accuracy &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;accuracy&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; below threshold &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;MIN_ACCURACY&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;. &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Passed &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;passed&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fixtures&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; fixtures.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Run locally with &lt;code&gt;pytest tests/test_regression.py -v&lt;/code&gt;. You'll see per-fixture pass/fail and the aggregate accuracy check.&lt;/p&gt;


&lt;h2&gt;
  
  
  Step 3: GitHub Actions Pipeline
&lt;/h2&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# .github/workflows/deploy.yml&lt;/span&gt;
&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;warrantyAI CI/CD&lt;/span&gt;

&lt;span class="na"&gt;on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;push&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;branches&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;main&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
  &lt;span class="na"&gt;pull_request&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;branches&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;main&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;

&lt;span class="na"&gt;env&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;AWS_REGION&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;     &lt;span class="s"&gt;ap-south-1&lt;/span&gt;
  &lt;span class="na"&gt;ECR_REGISTRY&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;   &lt;span class="s"&gt;${{ secrets.ECR_REGISTRY }}&lt;/span&gt;
  &lt;span class="na"&gt;ECR_REPOSITORY&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;warrantyai-pipeline&lt;/span&gt;
  &lt;span class="na"&gt;LAMBDA_FUNCTION&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;warrantyai-processor&lt;/span&gt;

&lt;span class="na"&gt;jobs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;regression-tests&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Prompt Regression Tests&lt;/span&gt;
    &lt;span class="na"&gt;runs-on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ubuntu-latest&lt;/span&gt;

    &lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/checkout@v4&lt;/span&gt;

      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Set up Python&lt;/span&gt;
        &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/setup-python@v5&lt;/span&gt;
        &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;python-version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;3.12'&lt;/span&gt;

      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Install dependencies&lt;/span&gt;
        &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;pip install -r requirements.txt&lt;/span&gt;

      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Configure AWS credentials (for Bedrock)&lt;/span&gt;
        &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;aws-actions/configure-aws-credentials@v4&lt;/span&gt;
        &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;role-to-assume&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ secrets.AWS_ROLE_ARN }}&lt;/span&gt;
          &lt;span class="na"&gt;aws-region&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;     &lt;span class="s"&gt;${{ env.AWS_REGION }}&lt;/span&gt;

      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Run prompt regression tests&lt;/span&gt;
        &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;pytest tests/test_regression.py -v --tb=short&lt;/span&gt;
        &lt;span class="na"&gt;env&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;BEDROCK_MODEL_ID&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;anthropic.claude-haiku-4-5-20251001&lt;/span&gt;

  &lt;span class="na"&gt;build-and-deploy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Build → ECR → Lambda&lt;/span&gt;
    &lt;span class="na"&gt;runs-on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ubuntu-latest&lt;/span&gt;
    &lt;span class="na"&gt;needs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;regression-tests&lt;/span&gt;        &lt;span class="c1"&gt;# Only runs if tests pass&lt;/span&gt;
    &lt;span class="na"&gt;if&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;github.ref == 'refs/heads/main'&lt;/span&gt;

    &lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/checkout@v4&lt;/span&gt;

      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Configure AWS credentials&lt;/span&gt;
        &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;aws-actions/configure-aws-credentials@v4&lt;/span&gt;
        &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;role-to-assume&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ secrets.AWS_ROLE_ARN }}&lt;/span&gt;
          &lt;span class="na"&gt;aws-region&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;     &lt;span class="s"&gt;${{ env.AWS_REGION }}&lt;/span&gt;

      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Log in to ECR&lt;/span&gt;
        &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;login-ecr&lt;/span&gt;
        &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;aws-actions/amazon-ecr-login@v2&lt;/span&gt;

      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Build and push Docker image&lt;/span&gt;
        &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
          &lt;span class="s"&gt;IMAGE_TAG=$(git rev-parse --short HEAD)&lt;/span&gt;
          &lt;span class="s"&gt;docker build -t $ECR_REGISTRY/$ECR_REPOSITORY:$IMAGE_TAG .&lt;/span&gt;
          &lt;span class="s"&gt;docker push    $ECR_REGISTRY/$ECR_REPOSITORY:$IMAGE_TAG&lt;/span&gt;
          &lt;span class="s"&gt;echo "IMAGE_TAG=$IMAGE_TAG" &amp;gt;&amp;gt; $GITHUB_ENV&lt;/span&gt;

      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Deploy to Lambda&lt;/span&gt;
        &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
          &lt;span class="s"&gt;aws lambda update-function-code \&lt;/span&gt;
            &lt;span class="s"&gt;--function-name $LAMBDA_FUNCTION \&lt;/span&gt;
            &lt;span class="s"&gt;--image-uri     $ECR_REGISTRY/$ECR_REPOSITORY:$IMAGE_TAG \&lt;/span&gt;
            &lt;span class="s"&gt;--region        $AWS_REGION&lt;/span&gt;

          &lt;span class="s"&gt;aws lambda wait function-updated \&lt;/span&gt;
            &lt;span class="s"&gt;--function-name $LAMBDA_FUNCTION&lt;/span&gt;

          &lt;span class="s"&gt;echo "Deployed image $IMAGE_TAG to $LAMBDA_FUNCTION"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Key decisions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;needs: regression-tests&lt;/code&gt; — deploy job won't start if tests fail&lt;/li&gt;
&lt;li&gt;OIDC role assumption (no long-lived keys in secrets)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;lambda wait function-updated&lt;/code&gt; — ensures the function is actually updated before the job completes&lt;/li&gt;
&lt;/ul&gt;


&lt;h2&gt;
  
  
  Step 4: IAM OIDC Setup for GitHub Actions (No Long-Lived Keys)
&lt;/h2&gt;

&lt;p&gt;The cleanest way to give GitHub Actions access to AWS is OIDC — a temporary credential that's scoped to your repo and expires after the job.&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="c1"&gt;# infra/oidc.tf&lt;/span&gt;

&lt;span class="nx"&gt;data&lt;/span&gt; &lt;span class="s2"&gt;"aws_iam_openid_connect_provider"&lt;/span&gt; &lt;span class="s2"&gt;"github"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;url&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"https://token.actions.githubusercontent.com"&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_iam_role"&lt;/span&gt; &lt;span class="s2"&gt;"github_actions"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"github-actions-warrantyai"&lt;/span&gt;

  &lt;span class="nx"&gt;assume_role_policy&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;jsonencode&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="nx"&gt;Version&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"2012-10-17"&lt;/span&gt;
    &lt;span class="nx"&gt;Statement&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;
      &lt;span class="nx"&gt;Effect&lt;/span&gt;    &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"Allow"&lt;/span&gt;
      &lt;span class="nx"&gt;Principal&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;Federated&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;aws_iam_openid_connect_provider&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;github&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;arn&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
      &lt;span class="nx"&gt;Action&lt;/span&gt;    &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"sts:AssumeRoleWithWebIdentity"&lt;/span&gt;
      &lt;span class="nx"&gt;Condition&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nx"&gt;StringLike&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
          &lt;span class="s2"&gt;"token.actions.githubusercontent.com:sub"&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"repo:YOUR_ORG/warrantyai:ref:refs/heads/main"&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}]&lt;/span&gt;
  &lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_iam_role_policy"&lt;/span&gt; &lt;span class="s2"&gt;"github_actions_policy"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"github-actions-policy"&lt;/span&gt;
  &lt;span class="nx"&gt;role&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_iam_role&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;github_actions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;

  &lt;span class="nx"&gt;policy&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;jsonencode&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="nx"&gt;Version&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"2012-10-17"&lt;/span&gt;
    &lt;span class="nx"&gt;Statement&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
      &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nx"&gt;Effect&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"Allow"&lt;/span&gt;
        &lt;span class="nx"&gt;Action&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"ecr:GetAuthorizationToken"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"ecr:BatchCheckLayerAvailability"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="s2"&gt;"ecr:PutImage"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"ecr:InitiateLayerUpload"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="s2"&gt;"ecr:UploadLayerPart"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"ecr:CompleteLayerUpload"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="nx"&gt;Resource&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"*"&lt;/span&gt;
      &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nx"&gt;Effect&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"Allow"&lt;/span&gt;
        &lt;span class="nx"&gt;Action&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"lambda:UpdateFunctionCode"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"lambda:GetFunction"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="nx"&gt;Resource&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_lambda_function&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;pipeline_processor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;arn&lt;/span&gt;
      &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nx"&gt;Effect&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"Allow"&lt;/span&gt;
        &lt;span class="nx"&gt;Action&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"bedrock:InvokeModel"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="nx"&gt;Resource&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"*"&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;
  &lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Replace &lt;code&gt;YOUR_ORG/warrantyai&lt;/code&gt; with your actual GitHub org and repo name. The &lt;code&gt;StringLike&lt;/code&gt; condition locks the role to your main branch only PRs get the regression test job but not deploy permissions.&lt;/p&gt;


&lt;h2&gt;
  
  
  What This Catches (and What It Doesn't)
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;It catches:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Prompt edits that shift classification behaviour on known edge cases&lt;/li&gt;
&lt;li&gt;Model version changes that affect output structure&lt;/li&gt;
&lt;li&gt;Output parser changes that break field extraction&lt;/li&gt;
&lt;li&gt;Accidental removal of instructions that were doing real work&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;It doesn't catch:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Brand-new edge cases you haven't added to fixtures yet&lt;/li&gt;
&lt;li&gt;Latency regressions (add a separate latency benchmark for this)&lt;/li&gt;
&lt;li&gt;Cost regressions from prompt bloat (add token counting)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The fixture set is a living document. Every time a production bug surfaces from a new edge case, add a fixture for it. The test suite gets more valuable over time, not less.&lt;/p&gt;


&lt;h2&gt;
  
  
  The One Thing Worth Knowing
&lt;/h2&gt;

&lt;p&gt;The first time you run this on an existing project, it will probably fail. Not because your agent is bad — but because you'll discover that your "obvious" classifications aren't as consistent as you thought.&lt;/p&gt;

&lt;p&gt;That's the test suite doing its job. Fix the fixtures (or fix the agent), and you now have a baseline. Every future change is measured against that baseline.&lt;/p&gt;

&lt;p&gt;That's the whole point.&lt;/p&gt;


&lt;h2&gt;
  
  
  if you think what its to do with warrantyAI
&lt;/h2&gt;

&lt;p&gt;This is a solution which I am building to learn and implement AI systems.&lt;br&gt;


&lt;/p&gt;
&lt;div class="crayons-card c-embed text-styles text-styles--secondary"&gt;
    &lt;div class="c-embed__content"&gt;
        &lt;div class="c-embed__cover"&gt;
          &lt;a href="https://www.linkedin.com/posts/harish-aravindan_mlops-awsbedrock-aiplatformengineer-activity-7416095685768429568-Gb_2?utm_source=share&amp;amp;amp%3Butm_medium=member_desktop&amp;amp;amp%3Brcm=ACoAAAZdZV0B6jNPTfwYZj3O5Lh0p6lcypaLVAo" class="c-link align-middle" rel="noopener noreferrer"&gt;
            &lt;img alt="" src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmedia.licdn.com%2Fdms%2Fimage%2Fv2%2FD5622AQF1Z8f42_isow%2Ffeedshare-shrink_800%2FB56ZutDsi8HcAg-%2F0%2F1768134994582%3Fe%3D2147483647%26v%3Dbeta%26t%3DhsNSlku-Gf_bElvW-5o1Q3PSobfiptb7JQnnrewBkGA" height="auto" class="m-0"&gt;
          &lt;/a&gt;
        &lt;/div&gt;
      &lt;div class="c-embed__body"&gt;
        &lt;h2 class="fs-xl lh-tight"&gt;
          &lt;a href="https://www.linkedin.com/posts/harish-aravindan_mlops-awsbedrock-aiplatformengineer-activity-7416095685768429568-Gb_2?utm_source=share&amp;amp;amp%3Butm_medium=member_desktop&amp;amp;amp%3Brcm=ACoAAAZdZV0B6jNPTfwYZj3O5Lh0p6lcypaLVAo" rel="noopener noreferrer" class="c-link"&gt;
            Building WarrantyAI: AI Platform Engineer's 2026 North-Star Goal | Harish Aravindan posted on the topic | LinkedIn
          &lt;/a&gt;
        &lt;/h2&gt;
          &lt;p class="truncate-at-3"&gt;
            🚀 𝗙𝗿𝗼𝗺 𝗗𝗲𝘃𝗢𝗽𝘀 𝘁𝗼 𝗠𝗟𝗢𝗽𝘀: 𝗪𝗲𝗲𝗸 𝟭 𝗼𝗳 𝗟𝗲𝗮𝗿𝗻𝗶𝗻𝗴 / 𝗕𝘂𝗶𝗹𝗱𝗶𝗻𝗴 "𝗪𝗮𝗿𝗿𝗮𝗻𝘁𝘆 𝗔𝗜"
After years of managing cloud infrastructure and DevOps pipelines, I’ve officially started my transition into AI Platform Engineering.
My north-star goal for 2026 is to build WarrantyAI: a production-grade, "warranty-aware" system that helps homeowners and office managers assess appliance health, identify "fine-print" gotchas, and optimize repair costs using Generative AI.

𝗧𝗵𝗲 𝗬𝗲𝗮𝗿-𝗟𝗼𝗻𝗴 𝗩𝗶𝘀𝗶𝗼𝗻: 𝗪𝗵𝘆 𝗮𝗻 𝗔𝗜 𝗣𝗹𝗮𝘁𝗳𝗼𝗿𝗺 𝗘𝗻𝗴𝗶𝗻𝗲𝗲𝗿
An AI Platform Engineer doesn't just "write prompts." They build the scalable "plumbing" that allows AI models to interact with real-world data securely and efficiently. My 12-month roadmap focuses on:
𝗠𝗟𝗢𝗽𝘀: Automating the lifecycle of models (training, deployment, monitoring).
𝗦𝗲𝗺𝗮𝗻𝘁𝗶𝗰 𝗔𝗿𝗰𝗵𝗶𝘁𝗲𝗰𝘁𝘂𝗿𝗲: Using Vector Databases to give AI a "long-term memory."
𝗖𝗹𝗼𝘂𝗱-𝗡𝗮𝘁𝗶𝘃𝗲 𝗔𝗜: Leveraging AWS resources like Bedrock and S3 Lakehouses for cost-effective scale.

𝗪𝗲𝗲𝗸 𝟭 𝗟𝗲𝗮𝗿𝗻𝗶𝗻𝗴: 𝗕𝗿𝗲𝗮𝗸𝗶𝗻𝗴 𝘁𝗵𝗲 "𝗕𝗹𝗮𝗰𝗸 𝗕𝗼𝘅"
This week, I focused on Retrieval-Augmented Generation (RAG). Instead of just asking an AI what a general warranty looks like, I fed it a specific 5-page LG Refrigerator Warranty PDF and asked it to find the hidden costs.
GitHub Repo: https://lnkd.in/g-hkGJ6M

𝗖𝗼𝗿𝗲 𝗖𝗼𝗻𝗰𝗲𝗽𝘁𝘀:
𝗘𝗺𝗯𝗲𝗱𝗱𝗶𝗻𝗴𝘀 (𝗧𝗵𝗲 "𝗧𝗿𝗮𝗻𝘀𝗹𝗮𝘁𝗼𝗿"): Using Amazon Titan Text Embeddings v2 to convert human text into mathematical vectors.
𝗩𝗲𝗰𝘁𝗼𝗿 𝗜𝗻𝗱𝗲𝘅𝗶𝗻𝗴 (𝗧𝗵𝗲 "𝗟𝗶𝗯𝗿𝗮𝗿𝘆"): Implementing FAISS to store these vectors so the AI can search by "meaning" rather than keywords.
𝗠𝗲𝘀𝘀𝗮𝗴𝗲𝘀 𝗔𝗣𝗜 (𝗧𝗵𝗲 "𝗖𝗼𝗻𝘃𝗲𝗿𝘀𝗮𝘁𝗶𝗼𝗻"): Transitioning from legacy string-based prompts to the modern, structured messages format required by models like Ministral-3-8b.

𝗖𝗼𝗻𝗰𝗹𝘂𝘀𝗶𝗼𝗻 &amp;amp; 𝗡𝗲𝘅𝘁 𝗦𝘁𝗲𝗽𝘀
Week 1 taught me that AI Engineering is 20% prompting and 80% data engineering and infrastructure. By mastering the native APIs and vector logic, I’ve built a foundation that isn't just "chatty"—it's accurate.

𝗡𝗲𝘅𝘁 𝘄𝗲𝗲𝗸: I’ll be moving these vectors into a persistent S3 Lakehouse using Apache Iceberg to ensure our data remains organized and queryable as we scale to hundreds of appliances.
#MLOps #AWSBedrock #AIPlatformEngineer #WarrantyAI #GenerativeAI #DevOpsToAI #ApacheIceberg
https://lnkd.in/g-hkGJ6M
          &lt;/p&gt;
        &lt;div class="color-secondary fs-s flex items-center"&gt;
            &lt;img alt="favicon" class="c-embed__favicon m-0 mr-2 radius-0" src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fstatic.licdn.com%2Faero-v1%2Fsc%2Fh%2Fal2o9zrvru7aqj8e1x2rzsrca"&gt;
          linkedin.com
        &lt;/div&gt;
      &lt;/div&gt;
    &lt;/div&gt;
&lt;/div&gt;





</description>
      <category>aws</category>
      <category>ai</category>
      <category>testing</category>
      <category>mlops</category>
    </item>
    <item>
      <title>Your Bedrock Bill Is a Ticking Clock — Here's How to Stop It</title>
      <dc:creator>Harish Aravindan</dc:creator>
      <pubDate>Thu, 12 Mar 2026 05:56:54 +0000</pubDate>
      <link>https://forem.com/harisharavindan/your-bedrock-bill-is-a-ticking-clock-heres-how-to-stop-it-1c0i</link>
      <guid>https://forem.com/harisharavindan/your-bedrock-bill-is-a-ticking-clock-heres-how-to-stop-it-1c0i</guid>
      <description>&lt;p&gt;You deploy a Lambda that calls Bedrock. It works beautifully in testing.&lt;/p&gt;

&lt;p&gt;Then someone runs a batch job, a retry loop goes wrong, or traffic spikes and your AWS bill at the end of the month looks like a phone number.&lt;/p&gt;

&lt;p&gt;Bedrock has no built-in spend cap. No circuit breaker. No "stop after $X." It will happily invoke your model ten thousand times before you notice anything is wrong.&lt;/p&gt;

&lt;p&gt;This post is about the patterns that prevent that applied specifically to serverless AI workloads on AWS.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why Bedrock Cost Blowups Happen
&lt;/h2&gt;

&lt;p&gt;Bedrock charges per input token and output token. The pricing varies by model:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Input (per 1K tokens)&lt;/th&gt;
&lt;th&gt;Output (per 1K tokens)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Claude Haiku&lt;/td&gt;
&lt;td&gt;~$0.00025&lt;/td&gt;
&lt;td&gt;~$0.00125&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Claude Sonnet&lt;/td&gt;
&lt;td&gt;~$0.003&lt;/td&gt;
&lt;td&gt;~$0.015&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Claude Opus&lt;/td&gt;
&lt;td&gt;~$0.015&lt;/td&gt;
&lt;td&gt;~$0.075&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Haiku looks cheap and it is, until you're running it at scale with large prompts. A 2,000 token prompt + 500 token response at Haiku pricing is about $0.0007 per call. At 100,000 calls per day that's $70/day, $2,100/month. From a single Lambda function.&lt;/p&gt;

&lt;p&gt;The three failure modes that turn a reasonable bill into a bad one:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Unbounded retry loops.&lt;/strong&gt; Lambda retries failed invocations automatically. If your Bedrock call fails and you don't handle it properly, Lambda will retry it twice tripling your token spend on every failure.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Prompt size creep.&lt;/strong&gt; You add context, history, or document content to your prompt over time. Input tokens grow. You don't notice because the latency stays roughly the same but the cost per call has doubled.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. No model fallback logic.&lt;/strong&gt; You default to Sonnet for everything because it performs better. You never switch to Haiku for the 80% of calls where Haiku would have been fine.&lt;/p&gt;




&lt;h2&gt;
  
  
  Pattern 1: Model Tiering Use the Cheapest Model That's Good Enough
&lt;/h2&gt;

&lt;p&gt;The most impactful cost control you can add. Route calls to the cheapest model that can handle the task, with automatic escalation when confidence is low.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;

&lt;span class="n"&gt;bedrock&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;bedrock-runtime&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;region_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ap-south-1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;HAIKU&lt;/span&gt;  &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;anthropic.claude-haiku-4-5-20251001&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;SONNET&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;anthropic.claude-sonnet-4-6&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;invoke_with_tiering&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;require_confidence&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;bool&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Always try Haiku first.
    If confidence score &amp;lt; threshold, escalate to Sonnet.
    Returns: {&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;result&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: str, &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;model_used&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: str, &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;escalated&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: bool}
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="n"&gt;haiku_prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;

After your response, on a new line write exactly:
CONFIDENCE: &amp;lt;score between 0.0 and 1.0&amp;gt;&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="n"&gt;haiku_response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;invoke_bedrock&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;HAIKU&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;haiku_prompt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;confidence&lt;/span&gt;     &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;extract_confidence&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;haiku_response&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;require_confidence&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="n"&gt;confidence&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.75&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;result&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;     &lt;span class="nf"&gt;clean_response&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;haiku_response&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;model_used&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;haiku&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;escalated&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;  &lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="c1"&gt;# Escalate to Sonnet
&lt;/span&gt;    &lt;span class="n"&gt;sonnet_response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;invoke_bedrock&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;SONNET&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;result&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;     &lt;span class="n"&gt;sonnet_response&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;model_used&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sonnet&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;escalated&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;  &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;invoke_bedrock&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bedrock&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke_model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;modelId&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;model_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;body&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;anthropic_version&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;bedrock-2023-05-31&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;max_tokens&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;        &lt;span class="mi"&gt;1024&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;
        &lt;span class="p"&gt;})&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;body&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;body&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;read&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;body&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;extract_confidence&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;line&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;reversed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)):&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;line&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;startswith&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CONFIDENCE:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;float&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;line&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
            &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;ValueError&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="k"&gt;pass&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="mf"&gt;1.0&lt;/span&gt;  &lt;span class="c1"&gt;# Assume high confidence if parsing fails
&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;clean_response&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;lines&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;l&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;l&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;lines&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;l&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;startswith&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CONFIDENCE:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In practice, Haiku handles the majority of straightforward tasks when you classify by confidence. The cost difference between Haiku and Sonnet is roughly 12–15x per call so even a 70/30 split produces significant savings at scale.&lt;/p&gt;




&lt;h2&gt;
  
  
  Pattern 2: Token Counting Before You Invoke
&lt;/h2&gt;

&lt;p&gt;Bedrock charges for tokens you send, not just tokens you receive. A prompt that accidentally includes a full document when it only needed a summary can cost 10x more than intended.&lt;/p&gt;

&lt;p&gt;Count your tokens before invoking. If the prompt is above a threshold, truncate or summarize first.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;estimate_tokens&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Rough estimate: ~4 characters per token for English text.
    Use this as a pre-flight check, not for billing accuracy.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;//&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;


&lt;span class="n"&gt;MAX_INPUT_TOKENS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1500&lt;/span&gt;   &lt;span class="c1"&gt;# Your cost-control threshold
&lt;/span&gt;&lt;span class="n"&gt;HARD_MAX_TOKENS&lt;/span&gt;  &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;4000&lt;/span&gt;   &lt;span class="c1"&gt;# Bedrock model limit buffer
&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;safe_invoke&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;full_prompt&lt;/span&gt;    &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s"&gt;Context:&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;
    &lt;span class="n"&gt;estimated_toks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;estimate_tokens&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;full_prompt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;estimated_toks&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;HARD_MAX_TOKENS&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# Truncate context, keep prompt intact
&lt;/span&gt;        &lt;span class="n"&gt;max_context_chars&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;HARD_MAX_TOKENS&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="nf"&gt;estimate_tokens&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;
        &lt;span class="n"&gt;context&lt;/span&gt;           &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="n"&gt;max_context_chars&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;... [truncated]&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="n"&gt;full_prompt&lt;/span&gt;       &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s"&gt;Context:&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="n"&gt;estimated_toks&lt;/span&gt;    &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;estimate_tokens&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;full_prompt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;estimated_toks&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;MAX_INPUT_TOKENS&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# Log a warning — this call is more expensive than expected
&lt;/span&gt;        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[COST WARNING] Large prompt: ~&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;estimated_toks&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; tokens estimated&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;invoke_with_tiering&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;full_prompt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This catches the most common cause of unexpected cost spikes: context that grew over time without anyone noticing.&lt;/p&gt;




&lt;h2&gt;
  
  
  Pattern 3: Lambda-Level Rate Limiting with DynamoDB
&lt;/h2&gt;

&lt;p&gt;Bedrock has service-level quotas, but they're per-account, not per-function. If you have multiple Lambda functions all calling Bedrock, one runaway function can exhaust your quota and spike your bill before the others even notice.&lt;/p&gt;

&lt;p&gt;Add a lightweight rate limiter using DynamoDB atomic counters:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;

&lt;span class="n"&gt;dynamodb&lt;/span&gt;    &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;resource&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;dynamodb&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;region_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ap-south-1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;rate_table&lt;/span&gt;  &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dynamodb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Table&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;bedrock-rate-limits&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;MAX_CALLS_PER_MINUTE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;60&lt;/span&gt;   &lt;span class="c1"&gt;# Per function, per minute window
&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;check_rate_limit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;function_name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;bool&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Returns True if call is allowed, False if rate limit exceeded.
    Uses DynamoDB atomic increment + TTL for automatic window reset.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;minute_key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;function_name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;#&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;time&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;//&lt;/span&gt; &lt;span class="mi"&gt;60&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;rate_table&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;update_item&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;Key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rate_key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;minute_key&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="n"&gt;UpdateExpression&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SET call_count = if_not_exists(call_count, :zero) + :one, &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;expiry_ttl = :ttl&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="n"&gt;ExpressionAttributeValues&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;:zero&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;:one&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;  &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;:ttl&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;  &lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;time&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;120&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# 2-minute TTL, auto-cleanup
&lt;/span&gt;        &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="n"&gt;ReturnValues&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;UPDATED_NEW&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;count&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Attributes&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;call_count&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;count&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="n"&gt;MAX_CALLS_PER_MINUTE&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;rate_limited_invoke&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;function_name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="nf"&gt;check_rate_limit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;function_name&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;Exception&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Rate limit exceeded for &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;function_name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;. &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Max &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;MAX_CALLS_PER_MINUTE&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; Bedrock calls/minute.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;safe_invoke&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The DynamoDB TTL means the counter auto-resets every window. No cron, no cleanup Lambda. Cost for this table at moderate usage is under $1/month.&lt;/p&gt;




&lt;h2&gt;
  
  
  Pattern 4: CloudWatch Alarm on Bedrock Invocation Spend
&lt;/h2&gt;

&lt;p&gt;All three patterns above are reactive at the code level. You also need a proactive alert before the bill hits.&lt;/p&gt;

&lt;p&gt;Bedrock publishes &lt;code&gt;InvocationCount&lt;/code&gt; and &lt;code&gt;InputTokenCount&lt;/code&gt; metrics to CloudWatch. Set an alarm on invocation count as a leading indicator — it's more reliable than waiting for billing alerts.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Terraform — alert when Bedrock invocations exceed threshold&lt;/span&gt;
&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_cloudwatch_metric_alarm"&lt;/span&gt; &lt;span class="s2"&gt;"bedrock_invocation_spike"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;alarm_name&lt;/span&gt;          &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"bedrock-invocation-spike"&lt;/span&gt;
  &lt;span class="nx"&gt;comparison_operator&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"GreaterThanThreshold"&lt;/span&gt;
  &lt;span class="nx"&gt;evaluation_periods&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;
  &lt;span class="nx"&gt;metric_name&lt;/span&gt;         &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"InvocationCount"&lt;/span&gt;
  &lt;span class="nx"&gt;namespace&lt;/span&gt;           &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"AWS/Bedrock"&lt;/span&gt;
  &lt;span class="nx"&gt;period&lt;/span&gt;              &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;300&lt;/span&gt;          &lt;span class="c1"&gt;# 5-minute window&lt;/span&gt;
  &lt;span class="nx"&gt;statistic&lt;/span&gt;           &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"Sum"&lt;/span&gt;
  &lt;span class="nx"&gt;threshold&lt;/span&gt;           &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;500&lt;/span&gt;          &lt;span class="c1"&gt;# Adjust to your expected volume&lt;/span&gt;
  &lt;span class="nx"&gt;alarm_description&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"Bedrock invocations unusually high — check for runaway loops"&lt;/span&gt;

  &lt;span class="nx"&gt;dimensions&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;ModelId&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"anthropic.claude-haiku-4-5-20251001"&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="nx"&gt;alarm_actions&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;sns_alert_topic_arn&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Set the threshold at roughly 2x your expected peak volume. The alarm fires before cost becomes a problem, not after.&lt;/p&gt;




&lt;h2&gt;
  
  
  Pattern 5: Disable Lambda Retries for Bedrock Callers
&lt;/h2&gt;

&lt;p&gt;This one is often overlooked. By default, Lambda retries asynchronous invocations twice on failure. If your Bedrock call times out or returns a throttling error, Lambda will invoke your function two more times automatically tripling the number of tokens consumed for that failure.&lt;/p&gt;

&lt;p&gt;For Bedrock-calling Lambdas, set maximum retries to zero:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_lambda_event_source_mapping"&lt;/span&gt; &lt;span class="s2"&gt;"bedrock_processor"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="c1"&gt;# ... your S3/SQS trigger config&lt;/span&gt;
  &lt;span class="nx"&gt;bisect_batch_on_function_error&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nx"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"aws_lambda_function_event_invoke_config"&lt;/span&gt; &lt;span class="s2"&gt;"bedrock_caller"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;function_name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_lambda_function&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;bedrock_processor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;function_name&lt;/span&gt;

  &lt;span class="nx"&gt;maximum_retry_attempts&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;   &lt;span class="c1"&gt;# No automatic retries for Bedrock callers&lt;/span&gt;

  &lt;span class="nx"&gt;destination_config&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;on_failure&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;destination&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;aws_sqs_queue&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;bedrock_dlq&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;arn&lt;/span&gt;   &lt;span class="c1"&gt;# Failed events go to DLQ&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Handle retries explicitly in your code with backoff logic, so you control when and how many times a Bedrock call is retried not Lambda's default behaviour.&lt;/p&gt;




&lt;h2&gt;
  
  
  Putting It Together
&lt;/h2&gt;

&lt;p&gt;A production ready Bedrock caller in a serverless AI pipeline needs all five layers:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Request
  → rate_limited_invoke()        # Pattern 3: per-function rate limit
      → safe_invoke()            # Pattern 2: token count pre-flight
          → invoke_with_tiering()  # Pattern 1: Haiku first, Sonnet on escalation
              → CloudWatch alarm   # Pattern 4: spike detection
  Lambda retry = 0               # Pattern 5: no automatic retry blowup
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;None of these are complex individually. The value is in having all five in place before you hit production traffic not after the bill arrives.&lt;/p&gt;




&lt;h2&gt;
  
  
  Cost Reference: What This Saves
&lt;/h2&gt;

&lt;p&gt;Assuming a pipeline processing 10,000 documents/day with an average 1,500 input tokens and 400 output tokens per call:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Setup&lt;/th&gt;
&lt;th&gt;Model mix&lt;/th&gt;
&lt;th&gt;Daily cost&lt;/th&gt;
&lt;th&gt;Monthly cost&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;All Sonnet, no controls&lt;/td&gt;
&lt;td&gt;100% Sonnet&lt;/td&gt;
&lt;td&gt;~$210&lt;/td&gt;
&lt;td&gt;~$6,300&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tiered (80% Haiku / 20% Sonnet)&lt;/td&gt;
&lt;td&gt;Mixed&lt;/td&gt;
&lt;td&gt;~$35&lt;/td&gt;
&lt;td&gt;~$1,050&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tiered + token control (avg 10% reduction)&lt;/td&gt;
&lt;td&gt;Mixed&lt;/td&gt;
&lt;td&gt;~$31&lt;/td&gt;
&lt;td&gt;~$945&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The tiering alone is an 83% cost reduction. Token control and rate limiting are the safety net that keeps the tiering from being undone by a bad day.&lt;/p&gt;




&lt;h2&gt;
  
  
  Final Thought
&lt;/h2&gt;

&lt;p&gt;These five patterns are cheap to add and expensive to skip. The DynamoDB rate limiter costs under $1/month. The CloudWatch alarm is free under AWS free tier limits. The model tiering requires no infrastructure changes at all.&lt;/p&gt;

&lt;p&gt;None of this is complex. The value is in having all five in place before you hit production traffic not after the bill arrives.&lt;/p&gt;

&lt;p&gt;If you're running Bedrock in production and have hit a cost gotcha not covered here, drop it in the comments would be good to build out this list further.&lt;/p&gt;

</description>
      <category>aws</category>
      <category>bedrock</category>
      <category>claude</category>
      <category>ai</category>
    </item>
    <item>
      <title>DynamoDB as a State Machine: How I Stopped Paying for Redundant Lambda Executions</title>
      <dc:creator>Harish Aravindan</dc:creator>
      <pubDate>Sun, 08 Mar 2026 15:17:15 +0000</pubDate>
      <link>https://forem.com/harisharavindan/dynamodb-as-a-state-machine-how-i-stopped-paying-for-redundant-lambda-executions-30cc</link>
      <guid>https://forem.com/harisharavindan/dynamodb-as-a-state-machine-how-i-stopped-paying-for-redundant-lambda-executions-30cc</guid>
      <description>&lt;p&gt;&lt;em&gt;Part of my warrantyAI build series — building an AI-powered warranty management system on AWS, one week at a time.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  

&lt;div class="crayons-card c-embed text-styles text-styles--secondary"&gt;
    &lt;div class="c-embed__content"&gt;
        &lt;div class="c-embed__cover"&gt;
          &lt;a href="https://www.linkedin.com/posts/harish-aravindan_bedrock-aws-serverless-activity-7436337218853462016--IRP?utm_source=share&amp;amp;amp%3Butm_medium=member_desktop&amp;amp;amp%3Brcm=ACoAACLhSMABNoio6_wUp-BQYf-oen15z_L_GKg" class="c-link align-middle" rel="noopener noreferrer"&gt;
            &lt;img alt="" src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmedia.licdn.com%2Fdms%2Fimage%2Fv2%2FD5622AQHVSLK3VC4LKg%2Ffeedshare-shrink_800%2FB56ZzMtS47HsAc-%2F0%2F1772960952325%3Fe%3D2147483647%26v%3Dbeta%26t%3D3CzV9mQp1M7Gg5ZspT-3gbiLqu0bOsgy8fCNhrJUj7I" height="auto" class="m-0"&gt;
          &lt;/a&gt;
        &lt;/div&gt;
      &lt;div class="c-embed__body"&gt;
        &lt;h2 class="fs-xl lh-tight"&gt;
          &lt;a href="https://www.linkedin.com/posts/harish-aravindan_bedrock-aws-serverless-activity-7436337218853462016--IRP?utm_source=share&amp;amp;amp%3Butm_medium=member_desktop&amp;amp;amp%3Brcm=ACoAACLhSMABNoio6_wUp-BQYf-oen15z_L_GKg" rel="noopener noreferrer" class="c-link"&gt;
            Building HITL into AI pipelines for high-risk decisions | Harish Aravindan posted on the topic | LinkedIn
          &lt;/a&gt;
        &lt;/h2&gt;
          &lt;p class="truncate-at-3"&gt;
            𝗜 𝗽𝗮𝘂𝘀𝗲𝗱 𝗮𝗻 𝗔𝗜 𝗮𝗴𝗲𝗻𝘁 𝗺𝗶𝗱-𝘄𝗼𝗿𝗸𝗳𝗹𝗼𝘄 𝘁𝗵𝗶𝘀 𝘄𝗲𝗲𝗸 𝗮𝗻𝗱 𝗹𝗲𝘁 𝗮 𝗵𝘂𝗺𝗮𝗻 𝗱𝗲𝗰𝗶𝗱𝗲 𝘄𝗵𝗮𝘁 𝗵𝗮𝗽𝗽𝗲𝗻𝘀 𝗻𝗲𝘅𝘁

𝗪𝗲𝗲𝗸 𝟵 𝗼𝗳 𝗯𝘂𝗶𝗹𝗱𝗶𝗻𝗴 𝘄𝗮𝗿𝗿𝗮𝗻𝘁𝘆𝗔𝗜 - and this one changed how I think about AI pipelines.

𝗧𝗵𝗲 𝗽𝗿𝗼𝗯𝗹𝗲𝗺: Week 8's pipeline classified warranties and sent reminders automatically. Fine for low and medium risk. But for high-risk documents — expired warranties, missing serial numbers, suspicious policy terms — no one should be auto-sending anything without a human in the loop.

And honestly? The AI world is finally catching up to this thinking.

OpenAI, Anthropic, and Google have all started baking 𝗛𝘂𝗺𝗮𝗻-𝗶𝗻-𝘁𝗵𝗲-𝗟𝗼𝗼𝗽 (𝗛𝗜𝗧𝗟) patterns into their agent frameworks — not as an afterthought, but as a core design primitive. The reason is simple: LLMs are probabilistic. They're very good at pattern recognition across millions of documents. They're not good at knowing when they're wrong. A confident wrong answer from a classifier in a warranty pipeline doesn't just fail silently — it sends a notification to a real customer.

𝗛𝗜𝗧𝗟 𝗶𝘀 𝘁𝗵𝗲 𝗰𝗶𝗿𝗰𝘂𝗶𝘁 𝗯𝗿𝗲𝗮𝗸𝗲𝗿. You let the AI handle the 90% it's genuinely better at — reading documents, extracting structure, classifying risk — and you bring humans in precisely at the 10% where consequences matter. That's not a limitation of the AI. That's good system design.

So I built it. Here's the new flow:
📄 𝗥𝗲𝗮𝗱𝗲𝗿 → 🔍 𝗖𝗹𝗮𝘀𝘀𝗶𝗳𝗶𝗲𝗿 → 🛑 𝗛𝗜𝗧𝗟 𝗔𝗴𝗲𝗻𝘁 → 📨 𝗥𝗲𝗺𝗶𝗻𝗱𝗲𝗿

The HITL agent does three things when risk = HIGH:
1. Serialises the full pipeline state to DynamoDB
2. Sends an SNS email to the reviewer with ✅ Approve and ❌ Reject links
3. Raises NodeInterrupt — LangGraph pauses the graph completely

The pipeline just... stops. Waits.

When the reviewer clicks Approve, a second Lambda fires, reads the DynamoDB state, and re-invokes the pipeline — but only the Reminder agent. Everything before that already ran.

When they click Reject, an SNS notification goes to the tenant. No reminder. Full audit trail in S3.

For medium and low risk? HITL is skipped entirely. Zero delay.

𝗪𝗵𝗮𝘁 𝗜 𝗹𝗲𝗮𝗿𝗻𝗲𝗱 𝗯𝘂𝗶𝗹𝗱𝗶𝗻𝗴 𝘁𝗵𝗶𝘀:
— LangGraph's NodeInterrupt is surprisingly clean. One raise, graph pauses.
— DynamoDB as a state checkpoint is more reliable than I expected (7-day TTL, on-demand billing)
— The hardest part wasn't the code. It was deciding what "high risk" actually means.

The best AI systems in production today aren't the ones running fully autonomously. They're the ones that know exactly when to stop and ask.

Stack: LangGraph + AWS Bedrock + DynamoDB + SNS + API Gateway + Lambda
Repo: https://lnkd.in/gYrC3wEW 

Week 10: CI/CD for the whole pipeline + prompt regression tests.

#bedrock #aws #serverless #langchain #aiengineering #building #ai #agents
          &lt;/p&gt;
        &lt;div class="color-secondary fs-s flex items-center"&gt;
            &lt;img alt="favicon" class="c-embed__favicon m-0 mr-2 radius-0" src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fstatic.licdn.com%2Faero-v1%2Fsc%2Fh%2Fal2o9zrvru7aqj8e1x2rzsrca"&gt;
          linkedin.com
        &lt;/div&gt;
      &lt;/div&gt;
    &lt;/div&gt;
&lt;/div&gt;



&lt;/h2&gt;

&lt;p&gt;Most people reach for DynamoDB when they need a fast key-value store. I did too.&lt;/p&gt;

&lt;p&gt;Then I started using it as a state machine — and accidentally cut the redundant Lambda execution cost out of my AI pipeline entirely.&lt;/p&gt;

&lt;p&gt;Here's the pattern.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Problem: AI Pipelines That Don't Know When to Stop
&lt;/h2&gt;

&lt;p&gt;In Week 8 of building warrantyAI, I had a 3-agent LangGraph pipeline:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Reader → Classifier → Reminder
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Every document that came in ran the full pipeline. Reader extracted text with Textract. Classifier invoked Bedrock (Claude Haiku, fallback to Sonnet). Reminder generated a notification and published to SNS.&lt;/p&gt;

&lt;p&gt;That's fine when every document should proceed to the end. But in Week 9, I added a human review step for high-risk warranties. The pipeline needed to:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Pause after classification&lt;/li&gt;
&lt;li&gt;Wait for a human decision (could be hours, could be days)&lt;/li&gt;
&lt;li&gt;Resume from exactly where it stopped — not re-run everything&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The naive approach would be to re-invoke the full pipeline on resume. Reader runs again. Textract runs again. Classifier calls Bedrock again. You pay for all of it twice.&lt;/p&gt;

&lt;p&gt;With DynamoDB as the state checkpoint, the resumed execution runs &lt;strong&gt;only the Reminder agent&lt;/strong&gt;. Everything before it is already stored.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Pattern: Checkpoint State, Not Just Data
&lt;/h2&gt;

&lt;p&gt;The key mental shift: DynamoDB isn't storing the &lt;em&gt;result&lt;/em&gt; of your pipeline. It's storing the &lt;em&gt;entire state&lt;/em&gt; of your pipeline at the moment it paused.&lt;/p&gt;

&lt;p&gt;Here's the DynamoDB schema I use:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Field&lt;/th&gt;
&lt;th&gt;Type&lt;/th&gt;
&lt;th&gt;Value&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;document_id&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;PK&lt;/td&gt;
&lt;td&gt;Unique per document&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;sk&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;SK&lt;/td&gt;
&lt;td&gt;Always &lt;code&gt;"REVIEW"&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;status&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;S&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;pending_review&lt;/code&gt; / &lt;code&gt;approved&lt;/code&gt; / &lt;code&gt;rejected&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;warranty_state&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;S&lt;/td&gt;
&lt;td&gt;Full pipeline state as JSON string&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;created_at&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;S&lt;/td&gt;
&lt;td&gt;ISO 8601 timestamp&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;ttl&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;N&lt;/td&gt;
&lt;td&gt;Unix epoch — auto-expires after 7 days&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The &lt;code&gt;warranty_state&lt;/code&gt; field holds everything: raw extracted text, classification result, risk level, model used, guardrail flags, audit log. The entire &lt;code&gt;WarrantyState&lt;/code&gt; TypedDict serialised as a JSON string.&lt;/p&gt;

&lt;p&gt;When the pipeline resumes, it deserialises that field and picks up exactly where it left off.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;write_to_dynamodb&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;WarrantyState&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;table&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dynamodb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Table&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;HITL_TABLE&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;ttl&lt;/span&gt;   &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;time&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;7&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;86400&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# 7-day auto-expiry
&lt;/span&gt;
    &lt;span class="n"&gt;table&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;put_item&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Item&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;document_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;    &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;document_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sk&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;             &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;REVIEW&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tenant_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;      &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tenant_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;         &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pending_review&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;warranty_state&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;default&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;created_at&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;     &lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;timezone&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;utc&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;isoformat&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ttl&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;            &lt;span class="n"&gt;ttl&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And on resume:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_review_record&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;document_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;table&lt;/span&gt;    &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dynamodb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Table&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;HITL_TABLE&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;table&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_item&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;document_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;document_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sk&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;REVIEW&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
    &lt;span class="n"&gt;item&lt;/span&gt;     &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Item&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pending_review&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;ValueError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Already actioned: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;item&lt;/span&gt;

&lt;span class="c1"&gt;# In resume Lambda:
&lt;/span&gt;&lt;span class="n"&gt;record&lt;/span&gt;         &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get_review_record&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;document_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;warranty_state&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;record&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;warranty_state&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;  &lt;span class="c1"&gt;# full state restored
&lt;/span&gt;
&lt;span class="c1"&gt;# Run only the Reminder agent — Reader and Classifier already ran
&lt;/span&gt;&lt;span class="n"&gt;reminder_update&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;reminder_agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;warranty_state&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No Textract. No Bedrock classification call. Just the Reminder.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Full HITL Flow
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;S3 Upload → Lambda trigger
     │
     ▼
Reader Agent      (Textract + Bedrock Haiku structuring)
     │
     ▼
Classifier Agent  (Bedrock Haiku → Sonnet fallback if confidence &amp;lt; 0.7)
     │
     ▼
HITL Agent ──── risk != "high" ──────────────────────────┐
     │                                                     │
     │ risk == "high"                                      │
     ▼                                                     │
Write full state to DynamoDB                              │
     │                                                     │
     ▼                                                     │
SNS email to reviewer                                      │
(approve/reject links)                                     │
     │                                                     │
     ▼                                                     │
NodeInterrupt — graph pauses                              │
                                                           │
     Reviewer clicks link                                  │
     → API Gateway                                         │
     → resume Lambda                                       │
          │                                                │
          ├── APPROVE → run_from_reminder(state)           │
          └── REJECT  → SNS to tenant, stop               │
                                                           │
                                                    Reminder Agent
                                                           │
                                                    SNS to tenant
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For medium and low risk documents, the HITL node is skipped entirely — the graph flows straight through to Reminder with no pause, no DynamoDB write, no cost.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why DynamoDB Over Other Options
&lt;/h2&gt;

&lt;p&gt;When I was designing this, I considered three approaches:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;SQS with visibility timeout&lt;/strong&gt; — messages can be "in flight" for up to 12 hours. Not enough for a human review that might sit overnight. Also, you can't query by document_id easily.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;S3 as state store&lt;/strong&gt; — works, but you're polling or using S3 notifications to detect resume. Awkward.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;DynamoDB&lt;/strong&gt; — point lookups by document_id, TTL handles cleanup automatically, on-demand billing means you pay per read/write not per hour, and the Streams feature gives you a path to event-driven resume if you want it later.&lt;/p&gt;

&lt;p&gt;The on-demand billing matters more than it sounds. A warranty pipeline doesn't process documents at a steady rate. Some days 500 documents, some days 5. With provisioned capacity you're paying for peak all the time. With on-demand you pay for actual usage.&lt;/p&gt;

&lt;p&gt;At my current volume, the DynamoDB cost for the HITL table is under $0.50/month.&lt;/p&gt;




&lt;h2&gt;
  
  
  The TTL Trick
&lt;/h2&gt;

&lt;p&gt;This is the part I underestimated when I first built this.&lt;/p&gt;

&lt;p&gt;Every review record gets a &lt;code&gt;ttl&lt;/code&gt; field set 7 days from creation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;ttl&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;time&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;7&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;86400&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;DynamoDB's TTL feature automatically deletes expired items — no cron, no cleanup Lambda, no cost. Unactioned reviews just disappear. This matters because:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Stale review records don't accumulate&lt;/li&gt;
&lt;li&gt;Storage costs stay flat regardless of volume&lt;/li&gt;
&lt;li&gt;You don't need to build a cleanup process&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The one thing to know: TTL deletion isn't instant. DynamoDB typically cleans up within 48 hours of expiry. If you need exact expiry (e.g. the approve link should stop working at exactly 7 days), enforce it in your Lambda:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pending_review&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;ValueError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Already actioned&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Also check TTL manually if you need hard expiry
&lt;/span&gt;&lt;span class="n"&gt;created&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fromisoformat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;created_at&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="nf"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;timezone&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;utc&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;created&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;days&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;7&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;ValueError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Review expired&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  The Cost Comparison
&lt;/h2&gt;

&lt;p&gt;Here's what changed between Week 8 (no HITL) and Week 9 (HITL with DynamoDB state):&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Week 8&lt;/th&gt;
&lt;th&gt;Week 9&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;High-risk doc: Bedrock calls&lt;/td&gt;
&lt;td&gt;2 (classify + reminder gen)&lt;/td&gt;
&lt;td&gt;1 (reminder only, on approve)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;High-risk doc: Textract&lt;/td&gt;
&lt;td&gt;Yes, every run&lt;/td&gt;
&lt;td&gt;Once, state stored&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Redundant re-processing&lt;/td&gt;
&lt;td&gt;On every retry&lt;/td&gt;
&lt;td&gt;Zero&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;State cleanup&lt;/td&gt;
&lt;td&gt;Manual&lt;/td&gt;
&lt;td&gt;Automatic via TTL&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DynamoDB cost&lt;/td&gt;
&lt;td&gt;$0&lt;/td&gt;
&lt;td&gt;&amp;lt;$0.50/month&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The Bedrock saving is the real one. Claude Haiku is cheap (~$0.0004/call) but Sonnet fallback is ~$0.006/call. If a high-risk document triggered the Sonnet fallback and you re-ran the pipeline on resume, you'd pay for Sonnet twice. With DynamoDB state, classification runs once and the result is stored.&lt;/p&gt;

&lt;p&gt;At low volume this is pennies. At scale — thousands of documents per day with a meaningful percentage flagged as high-risk — it adds up quickly.&lt;/p&gt;




&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;p&gt;Week 10 adds CI/CD to the pipeline — GitHub Actions deploying to Lambda via ECR, with prompt regression tests so a bad Bedrock prompt doesn't silently break classification in production.&lt;/p&gt;

&lt;p&gt;The DynamoDB state pattern from this week sets that up nicely: because state is checkpointed, regression tests can inject a known state at any node in the graph and assert the output without running the full pipeline.&lt;/p&gt;




</description>
      <category>aws</category>
      <category>serverless</category>
      <category>ai</category>
      <category>dynamodb</category>
    </item>
    <item>
      <title>Serverless Bedrock: How I invoke Claude from Lambda in warrantyAI</title>
      <dc:creator>Harish Aravindan</dc:creator>
      <pubDate>Tue, 03 Mar 2026 17:21:42 +0000</pubDate>
      <link>https://forem.com/harisharavindan/serverless-bedrock-how-i-invoke-claude-from-lambda-in-warrantyai-3hk</link>
      <guid>https://forem.com/harisharavindan/serverless-bedrock-how-i-invoke-claude-from-lambda-in-warrantyai-3hk</guid>
      <description>&lt;p&gt;Every week I ship a new piece of warrantyAI — an AI-powered warranty management system I'm building on AWS. This week was Week 8: a 3-agent LangGraph pipeline wired to Bedrock.&lt;/p&gt;

&lt;p&gt;Before the agents could do anything, I needed one thing to work cleanly: &lt;strong&gt;invoking Claude from a Lambda function without a server, without a container fleet, without an inference endpoint sitting idle burning money.&lt;/strong&gt;&lt;br&gt;


&lt;/p&gt;
&lt;div class="crayons-card c-embed text-styles text-styles--secondary"&gt;
    &lt;div class="c-embed__content"&gt;
        &lt;div class="c-embed__cover"&gt;
          &lt;a href="https://www.linkedin.com/posts/harish-aravindan_aiplatformengineering-langgraph-awsbedrock-activity-7433883183760408576-EuL5?utm_source=share&amp;amp;amp%3Butm_medium=member_desktop&amp;amp;amp%3Brcm=ACoAAAZdZV0B6jNPTfwYZj3O5Lh0p6lcypaLVAo" class="c-link align-middle" rel="noopener noreferrer"&gt;
            &lt;img alt="" src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fmedia.licdn.com%2Fdms%2Fimage%2Fv2%2FD5622AQF-WtgegAgotw%2Ffeedshare-shrink_800%2FB56Zyp1XLHIYAg-%2F0%2F1772375864528%3Fe%3D2147483647%26v%3Dbeta%26t%3DfaYx2MoVW5LfC_6lpkSVln8YCv0TP-7RdKChvLhDJrA" height="auto" class="m-0"&gt;
          &lt;/a&gt;
        &lt;/div&gt;
      &lt;div class="c-embed__body"&gt;
        &lt;h2 class="fs-xl lh-tight"&gt;
          &lt;a href="https://www.linkedin.com/posts/harish-aravindan_aiplatformengineering-langgraph-awsbedrock-activity-7433883183760408576-EuL5?utm_source=share&amp;amp;amp%3Butm_medium=member_desktop&amp;amp;amp%3Brcm=ACoAAAZdZV0B6jNPTfwYZj3O5Lh0p6lcypaLVAo" rel="noopener noreferrer" class="c-link"&gt;
            Building warrantyAI on AWS with AI-powered pipeline | Harish Aravindan posted on the topic | LinkedIn
          &lt;/a&gt;
        &lt;/h2&gt;
          &lt;p class="truncate-at-3"&gt;
            𝗪𝗲𝗲𝗸 𝟴 𝗼𝗳 𝗯𝘂𝗶𝗹𝗱𝗶𝗻𝗴 𝘄𝗮𝗿𝗿𝗮𝗻𝘁𝘆𝗔𝗜

👉 𝗙𝗼𝗿 𝘁𝗵𝗼𝘀𝗲 𝘀𝗲𝗲𝗶𝗻𝗴 𝘁𝗵𝗶𝘀 𝗳𝗼𝗿 𝘁𝗵𝗲 𝗳𝗶𝗿𝘀𝘁 𝘁𝗶𝗺𝗲
I'm a Senior Cloud Engineer building an AI-powered warranty management system on AWS — from scratch, one week at a time.
No shortcuts. Real architecture. Real cost numbers.

𝗧𝗵𝗶𝘀 𝘄𝗲𝗲𝗸: 𝗜 𝘄𝗶𝗿𝗲𝗱 𝟯 𝗔𝗜 𝗮𝗴𝗲𝗻𝘁𝘀 𝘁𝗼𝗴𝗲𝘁𝗵𝗲𝗿 𝘂𝘀𝗶𝗻𝗴 𝗟𝗮𝗻𝗴𝗚𝗿𝗮𝗽𝗵
The problem warrantyAI solves:
Most people lose track of their warranties.
Appliances expire. Repairs get denied. Money is wasted.
warrantyAI reads the document, classifies the risk, and reminds you before it’s too late.

This week I built the core pipeline that makes that happen.
Reader → Classifier → Reminder

📄 𝗥𝗲𝗮𝗱𝗲𝗿 𝗔𝗴𝗲𝗻𝘁
Customer uploads a warranty PDF to S3.
Textract pulls the raw text.
Bedrock Haiku structures it into named fields —
product, brand, expiry date, serial number.

🔍 𝗖𝗹𝗮𝘀𝘀𝗶𝗳𝗶𝗲𝗿 𝗔𝗴𝗲𝗻𝘁
Takes those fields and classifies the warranty.
Haiku first — fast and cheap.
If confidence drops below 70%, automatically retries with Sonnet.
GovernanceShield guardrail (built in Week 7) runs on every invocation.
Outputs: category, expiry date, risk level.

🔔 𝗥𝗲𝗺𝗶𝗻𝗱𝗲𝗿 𝗔𝗴𝗲𝗻𝘁
Reads risk level from shared state.
Generates a human-readable notification via Haiku.
Publishes to SNS — but only for medium and high risk.
Low risk? Message generated, not sent.
Deliberate FinOps decision. SNS isn’t free at scale.

Repo : https://lnkd.in/gsndTpQV

𝗪𝗵𝗮𝘁 𝗵𝗲𝗹𝗱 𝘁𝗵𝗶𝘀 𝘁𝗼𝗴𝗲𝘁𝗵𝗲𝗿: 𝗪𝗮𝗿𝗿𝗮𝗻𝘁𝘆𝗦𝘁𝗮𝘁𝗲

One typed Python dict shared across all 3 agents.
No message queues between agents.
No shared database mid-pipeline.
Each agent reads from it, writes back a partial update.
LangGraph handles the sequencing.

What connected cleanly from previous weeks:
✔ Week 7 GovernanceShield guardrail — one import, plugged straight in
✔ Per-agent IAM roles already existed — zero new permissions needed
✔ S3 audit bucket already live — all 3 agents write to it
Building incrementally pays off.

What’s your multi-agent orchestration framework of choice right now?

#AIPlatformEngineering #LangGraph #AWSBedrock #warrantyAI #Serverless #AI
          &lt;/p&gt;
        &lt;div class="color-secondary fs-s flex items-center"&gt;
            &lt;img alt="favicon" class="c-embed__favicon m-0 mr-2 radius-0" src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fstatic.licdn.com%2Faero-v1%2Fsc%2Fh%2Fal2o9zrvru7aqj8e1x2rzsrca"&gt;
          linkedin.com
        &lt;/div&gt;
      &lt;/div&gt;
    &lt;/div&gt;
&lt;/div&gt;




&lt;p&gt;Here's exactly how I did it.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why serverless + Bedrock is the right combo
&lt;/h2&gt;

&lt;p&gt;Bedrock's &lt;code&gt;invoke_model&lt;/code&gt; API is synchronous and stateless. It takes a request, returns a response. That's exactly what Lambda is built for. No warm model, no GPU instance, no ECS cluster. You pay per invocation, per token.&lt;/p&gt;

&lt;p&gt;For warrantyAI's workload — sporadic document uploads, not a real-time chat product — this matters. My entire system runs under $1.30/day.&lt;/p&gt;




&lt;h2&gt;
  
  
  The setup: IAM first, always
&lt;/h2&gt;

&lt;p&gt;Before any code, the Lambda execution role needs this policy:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"Effect"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Allow"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"Action"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"bedrock:InvokeModel"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"bedrock:InvokeModelWithResponseStream"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"Resource"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"arn:aws:bedrock:ap-south-1::foundation-model/anthropic.claude-haiku-4-5-20251001"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"arn:aws:bedrock:ap-south-1::foundation-model/anthropic.claude-sonnet-4-6"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Scope it to specific model ARNs. Not &lt;code&gt;*&lt;/code&gt;. Ever.&lt;/p&gt;




&lt;h2&gt;
  
  
  The invoke wrapper
&lt;/h2&gt;

&lt;p&gt;This is the core function I reuse across all 3 agents in warrantyAI:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;

&lt;span class="n"&gt;bedrock&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;bedrock-runtime&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;region_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ap-south-1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;HAIKU&lt;/span&gt;  &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;anthropic.claude-haiku-4-5-20251001&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;SONNET&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;anthropic.claude-sonnet-4-6&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;invoke_bedrock&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;model_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;HAIKU&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;512&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Invoke a Bedrock Claude model from Lambda.
    Returns the text response as a string.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bedrock&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke_model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;modelId&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;model_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;contentType&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;application/json&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;accept&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;application/json&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;body&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;anthropic_version&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;bedrock-2023-05-31&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;max_tokens&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
                &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
            &lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="p"&gt;})&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;body&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;body&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;read&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;body&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's it. Stateless, reusable, testable in isolation.&lt;/p&gt;




&lt;h2&gt;
  
  
  Haiku-first, Sonnet fallback
&lt;/h2&gt;

&lt;p&gt;Haiku is fast and cheap. Sonnet is accurate and expensive. In warrantyAI's Classifier agent, I try Haiku first. If it returns low confidence, I retry with Sonnet automatically:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;classify_warranty&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;structured_data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;build_classify_prompt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;structured_data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Attempt 1: Haiku
&lt;/span&gt;    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;invoke_bedrock&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;model_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;HAIKU&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;parsed&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Fallback: Sonnet if confidence &amp;lt; 0.7
&lt;/span&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;parsed&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;confidence&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mf"&gt;0.7&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;invoke_bedrock&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;model_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;SONNET&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;parsed&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;parsed&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;model_used&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sonnet&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;parsed&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;model_used&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;haiku&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;parsed&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In practice, Haiku handles ~85% of documents. Sonnet kicks in for complex commercial warranties with ambiguous clause structures.&lt;/p&gt;




&lt;h2&gt;
  
  
  Three things that will burn you
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;1. The &lt;code&gt;body&lt;/code&gt; is a StreamingBody, not a string.&lt;/strong&gt;&lt;br&gt;
Always call &lt;code&gt;.read()&lt;/code&gt; before &lt;code&gt;json.loads()&lt;/code&gt;. Forget this once and you'll spend 20 minutes confused.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Wrong
&lt;/span&gt;&lt;span class="n"&gt;body&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;body&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="c1"&gt;# Right
&lt;/span&gt;&lt;span class="n"&gt;body&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;body&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;read&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;2. Token limits on Lambda payloads.&lt;/strong&gt;&lt;br&gt;
Lambda has a 6MB synchronous response limit. Bedrock responses are usually tiny, but if you're passing large documents in your prompt, chunk them first. I cap prompts at 4,000 characters in the Reader agent.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Bedrock is regional.&lt;/strong&gt;&lt;br&gt;
Not all models are available in all regions. &lt;code&gt;ap-south-1&lt;/code&gt; (Mumbai) supports Haiku and Sonnet. If you get a &lt;code&gt;ResourceNotFoundException&lt;/code&gt;, check model availability in your region first before debugging your code.&lt;/p&gt;




&lt;h2&gt;
  
  
  Cost reality check
&lt;/h2&gt;

&lt;p&gt;For warrantyAI's workload (roughly 50 documents/day):&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Avg tokens/call&lt;/th&gt;
&lt;th&gt;Cost/call&lt;/th&gt;
&lt;th&gt;Daily cost&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Haiku&lt;/td&gt;
&lt;td&gt;~800&lt;/td&gt;
&lt;td&gt;~$0.0004&lt;/td&gt;
&lt;td&gt;~$0.017&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Sonnet (15% of calls)&lt;/td&gt;
&lt;td&gt;~800&lt;/td&gt;
&lt;td&gt;~$0.006&lt;/td&gt;
&lt;td&gt;~$0.005&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Total Bedrock cost: under $0.025/day for this workload.&lt;br&gt;
The rest of my $1.30/day budget goes to Textract, SNS, and S3.&lt;/p&gt;




&lt;h2&gt;
  
  
  What's next
&lt;/h2&gt;

&lt;p&gt;This pattern is the foundation for the entire warrantyAI pipeline. Next Sunday I'll cover how I wired these invocations into a LangGraph StateGraph — three agents, one shared state dict, no message queues.&lt;/p&gt;

&lt;p&gt;Follow along if you're building serverless AI on AWS. I publish every Sunday in LinkedIn&lt;/p&gt;

&lt;p&gt;&lt;em&gt;This is part of the Serverless Meets AI series — practical AWS patterns from building warrantyAI.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>aws</category>
      <category>serverless</category>
      <category>ai</category>
      <category>bedrock</category>
    </item>
    <item>
      <title>Serverless Endpoint Monitoring - check uptime of your app</title>
      <dc:creator>Harish Aravindan</dc:creator>
      <pubDate>Thu, 18 Jan 2024 03:12:25 +0000</pubDate>
      <link>https://forem.com/harisharavindan/serverless-endpoint-monitoring-using-monitor-your-uptime-of-your-app-40pn</link>
      <guid>https://forem.com/harisharavindan/serverless-endpoint-monitoring-using-monitor-your-uptime-of-your-app-40pn</guid>
      <description>&lt;p&gt;Do you need to monitor your application endpoints and have a dashboard to check the detail - all this in a serverless way on AWS. &lt;br&gt;
Let's see how it's done.&lt;/p&gt;

&lt;p&gt;Solution Design&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Feqbhemik1babx92b2sgl.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Feqbhemik1babx92b2sgl.png" alt="Ping Service Architecture" width="800" height="600"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;This setup creates a s3 webpage that takes in the required information for monitoring.&lt;/li&gt;
&lt;li&gt;Backend contains a lambda to create eventbridge schedule (cron) and cloudwatch alarm.&lt;/li&gt;
&lt;li&gt;The eventbridge schedule trigger will invoke a lambda to check the endpoint and update cloudwatch with custom metric of the response status code.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Requirements:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;AWS knowledge of how to create lambda function | iam roles and s3 websites&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Step 1 - Setup the Lambda Function
&lt;/h3&gt;

&lt;p&gt;Clone the code from github&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;https://github.com/uptownaravi/ping_service.git
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Create Lambda function checkEndpoint that will check the endpoint and update cloudwatch metric.&lt;br&gt;
Use file checkEndpoint.py from the app folder and for the role for this lambda use the iam policy document in the file checkEndpointPolicy.json. &lt;/p&gt;

&lt;p&gt;Replace all the url / region name / account number in the code.&lt;/p&gt;

&lt;p&gt;Create Lambda function addPingEndpoint using the file addPingEndpoint.py in app folder and the related iam policy in addPingEndpointPolicy.json&lt;/p&gt;

&lt;p&gt;This Lambda will be creating the eventbridge schedule with the payload of the requested service to check. That will be show in the target section once created. This is the connecting part that tells the checkEndpoint Lambda what url to check and which cloudwatch metric to update.&lt;/p&gt;

&lt;p&gt;Enable Function URL for addPingEndpoint function as we need that to be added to the s3 website. Allow headers for accept and content-type&lt;br&gt;
CORS needs to be enabled - check steps after s3 bucket for this.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2 - S3 Website Creation
&lt;/h3&gt;

&lt;p&gt;Next create s3 website using the code in web folder&lt;br&gt;
In the file app.html update the function-url to the lambda function url created in the previous step.&lt;br&gt;
Steps to create a website in s3 &lt;a href="https://docs.aws.amazon.com/AmazonS3/latest/userguide/HostingWebsiteOnS3Setup.html" rel="noopener noreferrer"&gt;https://docs.aws.amazon.com/AmazonS3/latest/userguide/HostingWebsiteOnS3Setup.html&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Note for CORS in the addPingEndpoint Lambda function add the CORS origin of function url as the s3 bucket like - http://.s3-website..amazonaws.com&lt;br&gt;
this will make sure the function url can be used from the s3 website.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;After deploying the website and enabling static hosting &lt;br&gt;
below page should be visible if you visit http://.s3-website..amazonaws.com&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F46eqdrcjoxkawns4vhoc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F46eqdrcjoxkawns4vhoc.png" alt="webpage with details" width="607" height="932"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 3 - Test the solution
&lt;/h3&gt;

&lt;p&gt;Enter the details in the webpage&lt;br&gt;
Service Name: name of the service to monitor&lt;br&gt;
Endpoint: URL to check&lt;br&gt;
Details: description of the service&lt;br&gt;
cron: cron expression of the schedule to check ( AWS Document &lt;a href="https://docs.aws.amazon.com/eventbridge/latest/userguide/eb-cron-expressions.html" rel="noopener noreferrer"&gt;https://docs.aws.amazon.com/eventbridge/latest/userguide/eb-cron-expressions.html&lt;/a&gt; )&lt;/p&gt;

&lt;p&gt;after submitting we get the message&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgeqxpmvxie9a9cqm9yvh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgeqxpmvxie9a9cqm9yvh.png" alt="success message after form submission" width="522" height="270"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Check your EventBridge Schedules and CloudWatchAlarm / Metric Dashboard to see the results&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq8fzs8awq82xmt5ddhrq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq8fzs8awq82xmt5ddhrq.png" alt="Event Bridge Schedule" width="692" height="214"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa6gahdwap0lhh5uuhwqi.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa6gahdwap0lhh5uuhwqi.png" alt="metric namespace" width="358" height="121"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Cloud watch metric showing the custom metric status code for the AnalyticsApp which we added the details in the webpage&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8l8ooey6c8fbahm6vqnq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8l8ooey6c8fbahm6vqnq.png" alt="metric graph" width="800" height="247"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;and in alarms&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7oq3l9h9du6l8dr9clfs.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7oq3l9h9du6l8dr9clfs.png" alt="cloudwatch alarm" width="800" height="120"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This can be done for many services and all those will be available as custom metrics in their namespace.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Note There are no actions added to this alarm. &lt;br&gt;
Add the required SNS alerts if required, to get the update on downtime of your endpoints.&lt;/p&gt;
&lt;/blockquote&gt;

</description>
      <category>aws</category>
      <category>serverless</category>
      <category>monitoring</category>
    </item>
    <item>
      <title>AWS Lambda gets Python 3.11 runtime</title>
      <dc:creator>Harish Aravindan</dc:creator>
      <pubDate>Sat, 29 Jul 2023 04:58:28 +0000</pubDate>
      <link>https://forem.com/harisharavindan/aws-lambda-gets-python-311-runtime-1a14</link>
      <guid>https://forem.com/harisharavindan/aws-lambda-gets-python-311-runtime-1a14</guid>
      <description>&lt;p&gt;AWS released python 3.11 runtime &lt;a href="https://aws.amazon.com/blogs/compute/python-3-11-runtime-now-available-in-aws-lambda/" rel="noopener noreferrer"&gt;official blog&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;There are changes to the default sys.path which can be important when migrating to this new runtime&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;/var/task/: User Function
/opt/python/lib/pythonX.Y/site-packages/: User Layer
/opt/python/: User Layer
/var/lang/lib/pythonX.Y/site-packages/: Pre-installed modules and default pip &lt;span class="nb"&gt;install &lt;/span&gt;location
/var/runtime/: No pre-installed modules
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;the /var/lang/lib/pythonX.Y/site-packages/ has been moved up the precedence so that it is searched before /var/runtime&lt;/p&gt;

&lt;p&gt;apart from this there are a host of changes in 3.11 like tomllib file parsing and many changes&lt;br&gt;
release notes &lt;a href="https://docs.python.org/3.11/whatsnew/3.11.html" rel="noopener noreferrer"&gt;https://docs.python.org/3.11/whatsnew/3.11.html&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;To control the minor version updates for lambda functions check the &lt;br&gt;
runtime settings options under the code source --&amp;gt; edit runtime configuration management&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa1o6xrx1n0nnzbgulo4b.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa1o6xrx1n0nnzbgulo4b.png" alt="runtime setting" width="800" height="103"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fphko8d7i4sspi9qpqobb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fphko8d7i4sspi9qpqobb.png" alt="edit runtime configuration" width="800" height="274"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;details of each option &lt;a href="https://docs.aws.amazon.com/lambda/latest/dg/runtimes-update.html#runtime-management-controls" rel="noopener noreferrer"&gt;https://docs.aws.amazon.com/lambda/latest/dg/runtimes-update.html#runtime-management-controls&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;initial python 3.11 was released on Oct 2022 and current version 3.11.4 was release on June 2023. A quick sys.version check on the lambda console shows the details. 3.11.4 (main, Jul 10 2023, 22:05:45) [GCC 7.3.1 20180712 (Red Hat 7.3.1-15)]&lt;/p&gt;
&lt;/blockquote&gt;

</description>
      <category>aws</category>
      <category>serverless</category>
      <category>python</category>
    </item>
    <item>
      <title>AWS EKS Deployment with Helm Chart using Codebuild and CodePipeline</title>
      <dc:creator>Harish Aravindan</dc:creator>
      <pubDate>Sun, 16 Jul 2023 01:46:30 +0000</pubDate>
      <link>https://forem.com/harisharavindan/aws-eks-deployment-with-helm-chart-using-codebuild-and-codepipeline-379a</link>
      <guid>https://forem.com/harisharavindan/aws-eks-deployment-with-helm-chart-using-codebuild-and-codepipeline-379a</guid>
      <description>&lt;h3&gt;
  
  
  what is it about
&lt;/h3&gt;

&lt;p&gt;Creating a deployment pipeline that install helm release in EKS cluster. We will see how to create workflow that uses the helm chart from CodeCommit --&amp;gt; Lint the chart --&amp;gt; pacakage and upload to s3 --&amp;gt; dry-run --&amp;gt; approval --&amp;gt; deploy to eks&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Clone the Repo for the helper files &lt;a href="https://github.com/uptownaravi/EKS_Deployment.git" rel="noopener noreferrer"&gt;https://github.com/uptownaravi/EKS_Deployment.git&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h4&gt;
  
  
  Step 1 - IAM Roles and aws-auth configmap
&lt;/h4&gt;

&lt;p&gt;Create a role to access eks using the file eks-deploy-role.json and add trust relationship for this role with eks-deploy-role-trust-relation.json&lt;/p&gt;

&lt;p&gt;Add this role name in the aws-auth configmap. Create Kubernetes Role and Rolebinding for this. Make sure the username matches in aws-auth configmap and the rolebinding. &lt;br&gt;
Also be careful when you edit the configmap as access to the cluster is based on this.&lt;/p&gt;

&lt;p&gt;refer &lt;a href="https://docs.aws.amazon.com/eks/latest/userguide/add-user-role.html" rel="noopener noreferrer"&gt;https://docs.aws.amazon.com/eks/latest/userguide/add-user-role.html&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Then we create role for codebuild service role with the file codebuild-pyapp-service-role.json&lt;br&gt;
Codebuild needs access to codecommit, s3 for publishing the helm chart, EKS API and Cloudwatch logs.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Codebuild service role should be able to assume the eks-deploy-role so make sure the trust relationship allows that.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h4&gt;
  
  
  Step 2 - CodeBuild projects
&lt;/h4&gt;

&lt;p&gt;Two code build projects are required.&lt;/p&gt;

&lt;p&gt;First to lint, upload the helm chart to s3 and perform dry run of the install. Use the file buildspec_prepare.yaml to create the codebuild project.&lt;/p&gt;

&lt;p&gt;We lint the chart, package, upload to s3 (using helm s3 plugin). and perform dry-run.&lt;/p&gt;

&lt;p&gt;helm s3 plugin reference &lt;a href="https://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/set-up-a-helm-v3-chart-repository-in-amazon-s3.html" rel="noopener noreferrer"&gt;https://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/set-up-a-helm-v3-chart-repository-in-amazon-s3.html&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Second project to perform actual deployment with helm install/upgrade. Using the file buildspec_deploy.yaml to create the project.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;replace the account numbers and other variables as required. Add the path of the helm if it's in different folder.&lt;/p&gt;

&lt;p&gt;most of the steps like installing the tools and plugins in the buildspec file can be baked into a docker image and used during prepare/deploy. The idea is to show how the process works so added those commands individually.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h4&gt;
  
  
  Step 3 - Pipeline
&lt;/h4&gt;

&lt;p&gt;Create a Code pipeline with 4 stages&lt;/p&gt;

&lt;p&gt;source stage as the git repo where the helm chart is available&lt;/p&gt;

&lt;p&gt;Second stage is codebuild prepare project which runs the validation  and dry-run&lt;/p&gt;

&lt;p&gt;third stage is manual approve, so we can check the output of helm lint and dry-run.&lt;/p&gt;

&lt;p&gt;fourth stage is codebuild deploy project which does helm install/upgrade.&lt;/p&gt;

&lt;p&gt;Please comment your feedback.&lt;/p&gt;

</description>
      <category>aws</category>
      <category>devops</category>
      <category>eks</category>
      <category>kubernetes</category>
    </item>
    <item>
      <title>Pull Request Validation for AWS CodeCommit using Lambda and CodeBuild</title>
      <dc:creator>Harish Aravindan</dc:creator>
      <pubDate>Sat, 08 Jul 2023 17:25:45 +0000</pubDate>
      <link>https://forem.com/harisharavindan/pull-request-validation-for-aws-codecommit-using-lambda-and-codebuild-4dcg</link>
      <guid>https://forem.com/harisharavindan/pull-request-validation-for-aws-codecommit-using-lambda-and-codebuild-4dcg</guid>
      <description>&lt;h3&gt;
  
  
  what is it about
&lt;/h3&gt;

&lt;p&gt;Need to lint Dockerfile or perform a CI test/check when a Pull Request is raised. Lets see how to build a solution for this on AWS Codecommit using Codebuild and Lambda to perform the check when a PR is raised or updated.&lt;/p&gt;

&lt;h3&gt;
  
  
  overview of what we are building
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqvpnis51d2pvkvos4w5p.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqvpnis51d2pvkvos4w5p.png" alt="ci for python Dockerfile Lint" width="800" height="600"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We will take a sample solution here to perform a hado lint check on a dockerfile when a pr is raised in CodeCommit&lt;/p&gt;

&lt;h4&gt;
  
  
  Step 1 - Use the code from GitHub repo to create code build and lambda functions.
&lt;/h4&gt;

&lt;p&gt;clone the repo &lt;br&gt;
&lt;a href="https://github.com/uptownaravi/aws_codecommit_pr_validate.git" rel="noopener noreferrer"&gt;https://github.com/uptownaravi/aws_codecommit_pr_validate.git&lt;/a&gt;&lt;br&gt;
Note - add required region and account number and resource names in the files.&lt;/p&gt;

&lt;p&gt;use the file buildspec.yaml to create the CodeBuild project&lt;br&gt;
and refer the policy file codebuild_role.json for the essential permissions required ( CodeBuild needs access to CodeCommit to clone and comment on PR )&lt;/p&gt;

&lt;p&gt;create lambda function using lambda_function.py and for the policy required check file lambda_iam_role.json ( Permission to start CodeBuild Project )&lt;/p&gt;
&lt;h4&gt;
  
  
  Step 2 - Create event bridge that connect all the parts together
&lt;/h4&gt;

&lt;p&gt;create an event bride rule to trigger when a pr status update change occurs.&lt;/p&gt;

&lt;p&gt;sample for the event.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"source"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"aws.codecommit"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"detail-type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"CodeCommit Pull Request State Change"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"resources"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"arn:aws:codecommit:&amp;lt; region &amp;gt;:&amp;lt; account number &amp;gt;:&amp;lt; 
 repository name&amp;gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Add the target as the lambda function which was created in the above step.&lt;/p&gt;

&lt;h4&gt;
  
  
  Step 3 - Raise a PR and check if Dockerfile is linted and comments are being added
&lt;/h4&gt;

&lt;p&gt;once you raise PR in CodeCommit the event bridge rule reacts to that. Lambda function runs and collects the information required to do the CI test/check. Starts the code build with the information as override parameters and environment values.&lt;br&gt;
Then Code Build runs the hado lint on the Dockerfile adds the result to CodeCommit through aws cli commands.&lt;/p&gt;

&lt;p&gt;Comment made in pr after the check&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjr793f0k0viax7et8xl6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjr793f0k0viax7et8xl6.png" alt="PR with updated comment" width="698" height="297"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Step 4 - Customize for further CI test for pull requests
&lt;/h4&gt;

&lt;p&gt;we saw one sample of Dockerfile file check, this solution can be extended to add different types of test or checks on creation/update of pull requests.&lt;/p&gt;

&lt;p&gt;Thank you for reading. Please comment if any suggestions to improve.&lt;/p&gt;

</description>
      <category>aws</category>
      <category>docker</category>
      <category>devops</category>
    </item>
    <item>
      <title>GitHub Action for Commit Message Validation</title>
      <dc:creator>Harish Aravindan</dc:creator>
      <pubDate>Wed, 26 Apr 2023 12:57:07 +0000</pubDate>
      <link>https://forem.com/harisharavindan/github-action-for-commit-message-validation-5b36</link>
      <guid>https://forem.com/harisharavindan/github-action-for-commit-message-validation-5b36</guid>
      <description>&lt;h3&gt;
  
  
  What is it about
&lt;/h3&gt;

&lt;p&gt;Have you been in a situation where commit message does not convey the detail of what the code is intended for?&lt;br&gt;
well we can have a validation for that at the repository itself.&lt;/p&gt;

&lt;p&gt;Using GitHub actions we can validate commit messages if they have relevant details like story numbers for which the code is being added and so on.&lt;/p&gt;
&lt;h3&gt;
  
  
  How it works
&lt;/h3&gt;

&lt;p&gt;have published a GitHub Action that helps in validating &lt;a href="https://github.com/marketplace/actions/commit-meessage-check" rel="noopener noreferrer"&gt;https://github.com/marketplace/actions/commit-meessage-check&lt;/a&gt; &lt;/p&gt;

&lt;p&gt;use this in your workflow as step, with your regex of required validation&lt;/p&gt;

&lt;p&gt;here is a sample to check if the message has Jira story id&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;push&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;

&lt;span class="na"&gt;jobs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;hello_world_job&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;runs-on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ubuntu-latest&lt;/span&gt;
    &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;commit-message-validation&lt;/span&gt;
    &lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/checkout@v3&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;foo&lt;/span&gt;
        &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;uptownaravi/verify-commit-message-action@v2&lt;/span&gt;
        &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;regex&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;(?i)jira-[0-9]{3,}'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1h8s1dzf1bsdt95oo547.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1h8s1dzf1bsdt95oo547.png" alt="commit message success" width="586" height="172"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;if not then the job fails with exit code 1&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F98zb0chwy9n1gg4h67h7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F98zb0chwy9n1gg4h67h7.png" alt="commit check failure" width="586" height="172"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The validation happens on the python file &lt;a href="https://github.com/uptownaravi/verify-commit-message-action/blob/main/commitcheck.py" rel="noopener noreferrer"&gt;https://github.com/uptownaravi/verify-commit-message-action/blob/main/commitcheck.py&lt;/a&gt;&lt;/p&gt;

</description>
      <category>github</category>
      <category>actions</category>
      <category>devops</category>
    </item>
    <item>
      <title>Clean up unused aws ebs volumes with lambda function</title>
      <dc:creator>Harish Aravindan</dc:creator>
      <pubDate>Thu, 20 Apr 2023 15:01:49 +0000</pubDate>
      <link>https://forem.com/harisharavindan/clean-up-unused-aws-ebs-volumes-with-lambda-function-bli</link>
      <guid>https://forem.com/harisharavindan/clean-up-unused-aws-ebs-volumes-with-lambda-function-bli</guid>
      <description>&lt;h3&gt;
  
  
  what is it about
&lt;/h3&gt;

&lt;p&gt;Recently came across unused ebs which was increasing the AWS bills. They were redundant from testing and development. To automate the removal process wrote the below lambda function that will scan for unattached volumes, tag for deletion and send an email notification. Then removed after a day.&lt;/p&gt;

&lt;h3&gt;
  
  
  solution overview
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgww9dx5bs2ymrfmyi30k.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgww9dx5bs2ymrfmyi30k.png" alt="solution overview" width="800" height="600"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;scan for unused ebs volume with status if available&lt;br&gt;
Tag those for deletion&lt;br&gt;
Add that list to Dynamodb, so we can check back the next day&lt;br&gt;
Send email notifications on the volumes&lt;br&gt;
The user will remove the deletion tag if the volume is required&lt;br&gt;
if the delete tag is available the next day, the volume is deleted&lt;br&gt;
email summary&lt;/p&gt;

&lt;h3&gt;
  
  
  Deploying the solution
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;clone the repository &lt;a href="https://github.com/uptownaravi/aws-ebs-cleanup.git" rel="noopener noreferrer"&gt;https://github.com/uptownaravi/aws-ebs-cleanup.git&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;We need a lambda function, dymanodb table, sns topic (with email subscription) and IAM roles setup to run this.&lt;/p&gt;

&lt;p&gt;First let's create the IAM role using the file iam.json. Edit the account numbers and resource names as required. The file has 3 different inline policies which enable the lamdba function to access ebs, dynamodb and sns&lt;/p&gt;

&lt;p&gt;Create the Dynamodb table and SNS topic ( also an email subscribed to that topic to get the summary of the cleanup )&lt;/p&gt;

&lt;p&gt;Then create the lambda function using the file cleanupebs.py&lt;br&gt;
Use the execution role as the one created in the first step.&lt;/p&gt;

&lt;p&gt;Change the table names and SNS topic arn&lt;br&gt;
&lt;a href="https://github.com/uptownaravi/aws-ebs-cleanup/blob/main/cleanupebs.py#L9-L10" rel="noopener noreferrer"&gt;https://github.com/uptownaravi/aws-ebs-cleanup/blob/main/cleanupebs.py#L9-L10&lt;/a&gt;&lt;br&gt;
with the ones created in the second step&lt;/p&gt;

&lt;p&gt;That's it, try a test run to check if the ebs volumes with available status are tagged also check email for the summary.&lt;/p&gt;

&lt;h3&gt;
  
  
  adding periodic trigger to the lambda function
&lt;/h3&gt;

&lt;p&gt;Add a cron job using EventBridge Scheduler so that the function can be run every day at a specific time.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F437sq0qaq0gg2vdmgo8x.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F437sq0qaq0gg2vdmgo8x.png" alt="event bridge scheduler" width="800" height="96"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;click on create schedule, give a name and for the schedule pattern&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuj6kb2j2169mice6qgaq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuj6kb2j2169mice6qgaq.png" alt="schedule pattern " width="800" height="461"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;have added here cron (0 10 ? * MON-FRI *) which is 10 AM on from Mon to Friday&lt;/p&gt;

&lt;p&gt;add the cron as required ( Flexible time window have selected off ) and click on next&lt;/p&gt;

&lt;p&gt;In Target details elect AWS Lambda Invoke and select the function which we created earlier in the Invoke section. No input is required to be passed as the lambda functions.&lt;/p&gt;

&lt;p&gt;Click on Next to review the configuration options, click Next again review all the inputs and create a schedule&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fypi9co9s821wc8jwi8z7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fypi9co9s821wc8jwi8z7.png" alt="creating a schedule" width="800" height="111"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;email summary looks like the below&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqsxmszov08kf5ea37thm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqsxmszov08kf5ea37thm.png" alt="email summary" width="458" height="402"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Please give your comments about this solution and what can be improved&lt;/p&gt;

</description>
      <category>aws</category>
      <category>lambda</category>
      <category>python</category>
      <category>ebs</category>
    </item>
    <item>
      <title>Pull Request notification on Slack using AWS Lambda</title>
      <dc:creator>Harish Aravindan</dc:creator>
      <pubDate>Sat, 04 Feb 2023 06:43:21 +0000</pubDate>
      <link>https://forem.com/harisharavindan/pull-request-notification-on-slack-using-aws-lambda-4mjo</link>
      <guid>https://forem.com/harisharavindan/pull-request-notification-on-slack-using-aws-lambda-4mjo</guid>
      <description>&lt;p&gt;Pull request management can become hectic while working across multiple repositories. Asking for approvals individually is also a long process. How about getting notified through a common channel to save time for reviewers and developers.&lt;/p&gt;

&lt;p&gt;This post details on how to create a notification system on slack for github pull requests.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Requirements&lt;br&gt;
aws account, github repository, slack channel ( with permission to create a slack app and webhook )&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Step 1 : Creating a slack app and webhook in a channel  &lt;a&gt;&lt;/a&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Slack has a good document on this, please follow that.
&lt;a href="https://api.slack.com/tutorials/slack-apps-hello-world" rel="noopener noreferrer"&gt;https://api.slack.com/tutorials/slack-apps-hello-world&lt;/a&gt;
once that is done we would get a webhook from slack which would be similar to 
&lt;a href="https://hooks.slack.com/services/aldnfaksndksakd/aljdfkajndkjasn/adfasdfakjdfnaksdfkajldakdnkasndlakjd" rel="noopener noreferrer"&gt;https://hooks.slack.com/services/aldnfaksndksakd/aljdfkajndkjasn/adfasdfakjdfnaksdfkajldakdnkasndlakjd&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;you can test this with the curl command given in the same document. Note that down webhook url which will be used later.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2 : Creating Lambda and API gateway for processing the pull request event  &lt;a&gt;&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;we need to package the code required for notifications&lt;br&gt;
clone or download the repo &lt;a href="https://github.com/uptownaravi/pullRequestSlack" rel="noopener noreferrer"&gt;github.com/uptownaravi/pullRequestSlack&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;it will have lambda_function.py ( the logic ) and the requirements.txt ( dependency package )&lt;/p&gt;

&lt;p&gt;we need to install that dependency and zip the files.&lt;br&gt;
navigate to the cloned repository&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;pip&lt;/span&gt; &lt;span class="n"&gt;install&lt;/span&gt; &lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;target&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;give the path of your current directory&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="n"&gt;requirements&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;txt&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;once that is done, we can see the below folder structure &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fdyivqjfibpn1rdz0hxwe.JPG" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fdyivqjfibpn1rdz0hxwe.JPG" alt="folderZip" width="268" height="310"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;then select all the files there except the readme.md and requirements.txt and zip them ( in windows select send to compressed (Zipped) folder )&lt;/p&gt;

&lt;p&gt;give the zip a name and &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;log into the aws console and navigate to lambda &lt;/li&gt;
&lt;li&gt;click on create function

&lt;ul&gt;
&lt;li&gt;select author from scratch ( it should be the default )&lt;/li&gt;
&lt;li&gt;give the function a name&lt;/li&gt;
&lt;li&gt;select python 3.8 &lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fidy8e86sndamgl157fnb.JPG" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fidy8e86sndamgl157fnb.JPG" alt="Create Function" width="800" height="413"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Then click on actions --&amp;gt; upload .zip file&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fwznncj8pm1tqddptapeb.JPG" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fwznncj8pm1tqddptapeb.JPG" alt="function upload" width="800" height="317"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;after upload it should load the code as seen below&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fei9ydxr9kxpk11ppzkn8.JPG" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fei9ydxr9kxpk11ppzkn8.JPG" alt="lambda Code" width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We need to add slack token which we got earlier as a &lt;br&gt;
environment variables in the lambda function&lt;/p&gt;

&lt;p&gt;scroll down in the lambda screen and click on manage environment variable --&amp;gt; add environment variable&lt;br&gt;
fill in the details as below screen shot, use the webhook url which you got from slack. Make sure the key is slackNotification&lt;br&gt;
( this is used in the lambda code to get that value )&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fmzd4iggvyli74nqpcncj.JPG" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fmzd4iggvyli74nqpcncj.JPG" alt="env" width="800" height="597"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Note: We can encrypt the value in transit using KMS keys.&lt;br&gt;
        but wanted to keep this blog simple&lt;br&gt;
        use this link if required &lt;a href="https://docs.aws.amazon.com/lambda/latest/dg/configuration-envvars.html#configuration-envvars-encryption" rel="noopener noreferrer"&gt;https://docs.aws.amazon.com/lambda/latest/dg/configuration-envvars.html#configuration-envvars-encryption&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h4&gt;
  
  
  API gateway creating and attaching this lambda
&lt;/h4&gt;

&lt;p&gt;navigate to the api gateway in aws console and click on create api&lt;br&gt;
then select http api --&amp;gt; click on build&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F0m82rp9x7ppp5fo01v6l.JPG" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F0m82rp9x7ppp5fo01v6l.JPG" alt="apig http" width="800" height="282"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;then fill in the details as shown in the below image.&lt;br&gt;
we need to use lambda as integration and select the lambda name we created in the last step ( make sure you are using the correct region )&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fk6cvuhjhbt4d60ti364i.JPG" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fk6cvuhjhbt4d60ti364i.JPG" alt="integration" width="800" height="597"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;click next and we will add the routes as shown below&lt;br&gt;
using the post method here, so that github can post to this api&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F0pjy0njy0rpqehy3jb3c.JPG" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F0pjy0njy0rpqehy3jb3c.JPG" alt="route" width="800" height="396"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;moving to next, add a stage name &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fnt7m3hqln1p1c7y98ah0.JPG" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fnt7m3hqln1p1c7y98ah0.JPG" alt="stage" width="800" height="414"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;next review and click on create&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F2wen7ifofevik2udg78l.JPG" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F2wen7ifofevik2udg78l.JPG" alt="apig created" width="800" height="378"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;already auto deployment is enabled for this api.&lt;/p&gt;

&lt;p&gt;so go to stages in the deploy section and copy the invoke url &lt;br&gt;
this will be used in the github repository as post webhook&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F77lypk4t2s9wchs9eof4.JPG" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F77lypk4t2s9wchs9eof4.JPG" alt="deploy" width="800" height="251"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 3 : Adding the api gateway url to github
&lt;/h3&gt;

&lt;p&gt;navigated to your github repistory --&amp;gt; click on settings --&amp;gt; then webhooks&lt;/p&gt;

&lt;p&gt;add your api gateway in the below format &lt;strong&gt;URL/pull&lt;/strong&gt; in the payload URL section&lt;br&gt;
example: &lt;a href="https://sample.execute-api.ap-south-1.amazonaws.com/dev/pull" rel="noopener noreferrer"&gt;https://sample.execute-api.ap-south-1.amazonaws.com/dev/pull&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;( /pull is the route in api gateway, make sure the url will end with dev/pull so that that is used while calling the POST method )&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fnojlh1seyih9ule74l1f.JPG" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fnojlh1seyih9ule74l1f.JPG" alt="webhook" width="800" height="370"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;select the &lt;strong&gt;Let me select individual events&lt;/strong&gt;&lt;br&gt;
and scroll down to select the Pull requests option and uncheck the Pushes &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F0ra0d2pmuie451dte319.JPG" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F0ra0d2pmuie451dte319.JPG" alt="pullRequests" width="796" height="724"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;then click on add webhook&lt;/p&gt;

&lt;p&gt;that's it, create a pull request in that repository and it should create a notification on slack channel&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fv4l7pesfcjy9bziqyrru.JPG" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fv4l7pesfcjy9bziqyrru.JPG" alt="botChat" width="800" height="102"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;you can modify the lambda code to handle various events in the pull requests.&lt;/p&gt;

</description>
      <category>git</category>
      <category>aws</category>
      <category>serverless</category>
      <category>slack</category>
    </item>
    <item>
      <title>helm chart for fastAPI</title>
      <dc:creator>Harish Aravindan</dc:creator>
      <pubDate>Sun, 29 Jan 2023 18:45:54 +0000</pubDate>
      <link>https://forem.com/harisharavindan/helm-chart-for-fastapi-2ej1</link>
      <guid>https://forem.com/harisharavindan/helm-chart-for-fastapi-2ej1</guid>
      <description>&lt;h3&gt;
  
  
  what is it about
&lt;/h3&gt;

&lt;p&gt;packaging the fastapi as docker image and deploying as helm chart&lt;/p&gt;

&lt;h3&gt;
  
  
  Building docker image
&lt;/h3&gt;

&lt;p&gt;Clone the repository&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;https://github.com/uptownaravi/LearnfastAPI.git
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;change directories into /FastAPI_HelmChart/&lt;br&gt;
in the Dockerfile we have a ubuntu base image on which we build the layers to install application and dependencies.&lt;/p&gt;

&lt;p&gt;Using my sample application from previous blog&lt;br&gt;
&lt;a href="https://dev.to/harisharavindan/learning-fastapi-with-a-sample-python-library-5f2n"&gt;https://dev.to/harisharavindan/learning-fastapi-with-a-sample-python-library-5f2n&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;build the image&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker build &lt;span class="nb"&gt;.&lt;/span&gt; &lt;span class="nt"&gt;-t&lt;/span&gt; fast:v2 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Docker image built is stored in github package ( ref &lt;a href="https://docs.github.com/en/packages/working-with-a-github-packages-registry/working-with-the-container-registry" rel="noopener noreferrer"&gt;https://docs.github.com/en/packages/working-with-a-github-packages-registry/working-with-the-container-registry&lt;/a&gt; )&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ghcr.io/uptownaravi/fast:v2
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;any changes required can be done to the dockerfile and build a new image from that&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Creating an Helm chart
&lt;/h3&gt;

&lt;p&gt;have created a basic helm chart which uses the image built from the above dockerfile &lt;/p&gt;

&lt;p&gt;change directories in the cloned repository folder&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cd &lt;/span&gt;FastAPI_HelmChart/helmChart-fastAPI/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;explore the chart templates and values.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;we have set the docker image to ghcr.io/uptownaravi/fast:v2&lt;/li&gt;
&lt;li&gt;health check is at localhost:80/health &lt;/li&gt;
&lt;li&gt;service is exposed at 80 as we had set that in the dockerfile&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;do a dry-run to check what is being installed&lt;br&gt;
make sure to be in the directory FastAPI_HelmChart/helmChart-fastAPI/ where the Chart.yaml is available&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;helm &lt;span class="nb"&gt;install &lt;/span&gt;fastencode &lt;span class="nb"&gt;.&lt;/span&gt; &lt;span class="nt"&gt;--dry-run&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;this installs the helm release fastencode which creates the deployment, service and related resources&lt;/p&gt;

&lt;p&gt;check the details of the installed chart&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl get all &lt;span class="nt"&gt;-l&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'app.kubernetes.io/instance=fastencode'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;we can port-forward the service to check the app&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl port-forward svc/fastapi 8080:80
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;this will forward the port 80 of the pod to the local host 8080&lt;br&gt;
check fastapi ui with url &lt;a href="http://localhost:8080/docs" rel="noopener noreferrer"&gt;http://localhost:8080/docs&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;to uninstall the release&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;helm uninstall fastencode
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



</description>
      <category>helm</category>
      <category>k8s</category>
      <category>kubernetes</category>
      <category>fastapi</category>
    </item>
  </channel>
</rss>
