<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Swati Tyagi</title>
    <description>The latest articles on Forem by Swati Tyagi (@swat_24).</description>
    <link>https://forem.com/swat_24</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F2736165%2F67d5937a-8cdd-488b-a357-fd17f024f227.png</url>
      <title>Forem: Swati Tyagi</title>
      <link>https://forem.com/swat_24</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/swat_24"/>
    <language>en</language>
    <item>
      <title>Deploying a Serverless AI Agent with AWS Bedrock, Lambda, and API Gateway</title>
      <dc:creator>Swati Tyagi</dc:creator>
      <pubDate>Sun, 28 Dec 2025 01:15:14 +0000</pubDate>
      <link>https://forem.com/swat_24/deploying-a-serverless-ai-agent-with-aws-bedrock-lambda-and-api-gateway-2123</link>
      <guid>https://forem.com/swat_24/deploying-a-serverless-ai-agent-with-aws-bedrock-lambda-and-api-gateway-2123</guid>
      <description>&lt;p&gt;This guide walks through building a question-answering service powered by GenAI using AWS bedrock. The architecture accepts prompts via HTTP and returns model-generated responses using Amazon Bedrock—all while keeping costs minimal through serverless infrastructure.&lt;/p&gt;

&lt;h2&gt;
  
  
  Architecture
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8ekdi55u9mhlrdumivdm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8ekdi55u9mhlrdumivdm.png" alt=" " width="800" height="391"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Credit : &lt;a href="https://aws.amazon.com/blogs/architecture/building-an-ai-gateway-to-amazon-bedrock-with-amazon-api-gateway/" rel="noopener noreferrer"&gt;https://aws.amazon.com/blogs/architecture/building-an-ai-gateway-to-amazon-bedrock-with-amazon-api-gateway/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Data Flow:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;External clients send HTTP requests to API Gateway&lt;/li&gt;
&lt;li&gt;API Gateway routes requests to the Lambda function&lt;/li&gt;
&lt;li&gt;Lambda invokes Amazon Bedrock's Nova Micro model&lt;/li&gt;
&lt;li&gt;ECR stores the Lambda container image (deployment artifact)&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  The Challenge
&lt;/h2&gt;

&lt;p&gt;When implementing generative AI services, choosing the right architecture matters. This implementation demonstrates a lightweight GenAI solution that can integrate with existing systems or be exposed externally through an API.&lt;/p&gt;

&lt;h2&gt;
  
  
  Requirements
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Functional Goals
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Requirement&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Prompt Processing&lt;/td&gt;
&lt;td&gt;Accept prompts and return Nova Micro completions&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;HTTP Endpoint&lt;/td&gt;
&lt;td&gt;Expose an endpoint for triggering responses&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Estimated Volume&lt;/td&gt;
&lt;td&gt;~100 monthly requests (for cost estimation)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Operational Goals
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Aspect&lt;/th&gt;
&lt;th&gt;Requirement&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Automation&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Fully automated deployment via GitHub Actions&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Availability&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;99.9%+ monthly uptime&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Security&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;IAM-scoped Bedrock access, OpenID Connect auth, HTTPS-only&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Observability&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Structured logging with CloudWatch dashboards&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Intentional Omissions
&lt;/h3&gt;

&lt;p&gt;Authentication, input sanitization, and authorization are excluded to keep focus on the core GenAI implementation.&lt;/p&gt;

&lt;h2&gt;
  
  
  Cost Analysis
&lt;/h2&gt;

&lt;p&gt;Based on an estimated &lt;strong&gt;22 input tokens&lt;/strong&gt; and &lt;strong&gt;232 output tokens&lt;/strong&gt; per request:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Service&lt;/th&gt;
&lt;th&gt;Monthly Cost&lt;/th&gt;
&lt;th&gt;Notes&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Bedrock (Nova Micro)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;~$0.003&lt;/td&gt;
&lt;td&gt;2,200 input / 23,200 output tokens&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Lambda&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Free&lt;/td&gt;
&lt;td&gt;Within free tier (1M requests, 400K GB-seconds)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;API Gateway&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Free (Year 1)&lt;/td&gt;
&lt;td&gt;~$0.0004/month after&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;ECR&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;~$0.01&lt;/td&gt;
&lt;td&gt;300MB image after 500MB free tier&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Scaling Projections
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Monthly Requests&lt;/th&gt;
&lt;th&gt;Estimated Cost&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1,000&lt;/td&gt;
&lt;td&gt;~$0.04&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;10,000&lt;/td&gt;
&lt;td&gt;~$0.39&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;100,000&lt;/td&gt;
&lt;td&gt;~$3.76&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Building the Agent
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Project Setup
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;mkdir&lt;/span&gt; &lt;span class="nt"&gt;-p&lt;/span&gt; handler terraform 
&lt;span class="nb"&gt;cd &lt;/span&gt;handler
pnpm init &lt;span class="nt"&gt;-y&lt;/span&gt;
pnpm &lt;span class="nt"&gt;--package&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;typescript dlx tsc &lt;span class="nt"&gt;--init&lt;/span&gt;
&lt;span class="nb"&gt;mkdir&lt;/span&gt; &lt;span class="nt"&gt;-p&lt;/span&gt; src __tests__
&lt;span class="nb"&gt;touch &lt;/span&gt;src/&lt;span class="o"&gt;{&lt;/span&gt;app,env,index&lt;span class="o"&gt;}&lt;/span&gt;.ts 

pnpm add &lt;span class="nt"&gt;-D&lt;/span&gt; @types/node tsx typescript
pnpm add ai @ai-sdk/amazon-bedrock zod dotenv
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Core Components
&lt;/h3&gt;

&lt;p&gt;The implementation has three layers:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;flowchart TB
    A["Lambda Handler&amp;lt;br/&amp;gt;&amp;lt;i&amp;gt;Parses events, returns responses&amp;lt;/i&amp;gt;"] --&amp;gt; B["Application Logic&amp;lt;br/&amp;gt;&amp;lt;i&amp;gt;Manages prompts &amp;amp; orchestration&amp;lt;/i&amp;gt;"]
    B --&amp;gt; C["Bedrock Integration&amp;lt;br/&amp;gt;&amp;lt;i&amp;gt;Model invocation via AI SDK&amp;lt;/i&amp;gt;"]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Lambda Entry Point
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// index.ts&lt;/span&gt;
&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;handler&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;any&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;context&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;any&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;body&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;parse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{};&lt;/span&gt;
        &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Welcome from Warike technologies&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="na"&gt;statusCode&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="na"&gt;body&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;success&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="p"&gt;}),&lt;/span&gt;
        &lt;span class="p"&gt;};&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="na"&gt;statusCode&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="na"&gt;body&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
                &lt;span class="na"&gt;success&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="na"&gt;error&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;error&lt;/span&gt; &lt;span class="k"&gt;instanceof&lt;/span&gt; &lt;span class="nb"&gt;Error&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Unexpected error&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
            &lt;span class="p"&gt;}),&lt;/span&gt;
        &lt;span class="p"&gt;};&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Bedrock Integration
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// utils/bedrock.ts&lt;/span&gt;
&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;generateResponse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;regionId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;modelId&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;config&lt;/span&gt;&lt;span class="p"&gt;({});&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;bedrock&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;createAmazonBedrock&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;region&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;regionId&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;usage&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;generateText&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
        &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;bedrock&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;modelId&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="na"&gt;system&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;You are a helpful assistant.&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt; &lt;span class="na"&gt;role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;user&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;prompt&lt;/span&gt; &lt;span class="p"&gt;}],&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`model: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;modelId&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;, response: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;, usage: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;usage&lt;/span&gt;&lt;span class="p"&gt;)}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Environment Variables
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;AWS_REGION=us-west-2
AWS_BEDROCK_MODEL='amazon.nova-micro-v1:0'
AWS_BEARER_TOKEN_BEDROCK='aws_bearer_token_bedrock'
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;⚠️ &lt;strong&gt;Security Note:&lt;/strong&gt; Use short-lived API keys only.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Infrastructure
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Dockerfile
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="c"&gt;# Build Stage&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;node:22-alpine&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;AS&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;builder&lt;/span&gt;
&lt;span class="k"&gt;WORKDIR&lt;/span&gt;&lt;span class="s"&gt; /usr/src/app&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;corepack &lt;span class="nb"&gt;enable&lt;/span&gt;
&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; package.json pnpm-lock.yaml* ./&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;pnpm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;--frozen-lockfile&lt;/span&gt;
&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; . .&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;pnpm run build

&lt;span class="c"&gt;# Runtime Stage&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="s"&gt; public.ecr.aws/lambda/nodejs:22&lt;/span&gt;
&lt;span class="k"&gt;WORKDIR&lt;/span&gt;&lt;span class="s"&gt; ${LAMBDA_TASK_ROOT}&lt;/span&gt;
&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; --from=builder /usr/src/app/dist/src ./ &lt;/span&gt;
&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; --from=builder /usr/src/app/node_modules ./node_modules&lt;/span&gt;
&lt;span class="k"&gt;CMD&lt;/span&gt;&lt;span class="s"&gt; [ "index.handler" ]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Terraform Resources
&lt;/h3&gt;

&lt;p&gt;Key infrastructure components:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;API Gateway&lt;/strong&gt; — HTTP protocol with Lambda integration, CORS headers, JSON access logs&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Bedrock Permissions&lt;/strong&gt; — Nova Micro inference profile access&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Lambda Function&lt;/strong&gt; — 900-second timeout, CloudWatch logging enabled&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;📝 &lt;strong&gt;Note:&lt;/strong&gt; The ECR seeding resource requires Docker running locally.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  CI/CD Pipeline
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;flowchart LR
    A[Push to Main] --&amp;gt; B[Build &amp;amp; Test]
    B --&amp;gt; C[Build Docker Image]
    C --&amp;gt; D[Push to ECR]
    D --&amp;gt; E[Deploy Lambda]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The GitHub Actions workflow handles building, testing, Docker image creation, ECR push, and Lambda deployment—triggered on pushes to main.&lt;/p&gt;

&lt;h2&gt;
  
  
  Testing
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-sS&lt;/span&gt; &lt;span class="s2"&gt;"https://123456.execute-api.us-west-2.amazonaws.com/dev/"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{"prompt":"Heeey hoe gaat het?"}'&lt;/span&gt; | jq
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Expected Response:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"success"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"data"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Hoi! Het gaat prima, bedankt voor het vragen..."&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Monitoring
&lt;/h2&gt;

&lt;p&gt;CloudWatch dashboards provide visibility into errors and performance metrics.&lt;/p&gt;




&lt;h2&gt;
  
  
  Cleanup
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;terraform destroy
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Takeaways
&lt;/h2&gt;

&lt;p&gt;✅ Serverless GenAI with API Gateway, Lambda, and Bedrock's Nova Micro delivers a functional, cost-effective solution&lt;/p&gt;

&lt;p&gt;✅ Pricing remains negligible even at significant scale&lt;/p&gt;

&lt;p&gt;✅ Terraform handles infrastructure; GitHub Actions automates deployment&lt;/p&gt;

&lt;p&gt;✅ Foundation readily supports more sophisticated generative AI applications&lt;/p&gt;

</description>
      <category>aws</category>
      <category>lambda</category>
      <category>bedrock</category>
    </item>
    <item>
      <title>Comparing Cloud AI Platforms in 2025: Bedrock, Azure OpenAI, and Gemini</title>
      <dc:creator>Swati Tyagi</dc:creator>
      <pubDate>Sun, 28 Dec 2025 01:06:21 +0000</pubDate>
      <link>https://forem.com/swat_24/comparing-cloud-ai-platforms-in-2025-bedrock-azure-openai-and-gemini-4ij2</link>
      <guid>https://forem.com/swat_24/comparing-cloud-ai-platforms-in-2025-bedrock-azure-openai-and-gemini-4ij2</guid>
      <description>&lt;p&gt;Picking the right cloud AI service comes down to more than raw model performance. Your decision hinges on how well it meshes with your current stack, what it costs, and whether it meets your compliance needs.&lt;/p&gt;

&lt;p&gt;Having shipped production workloads on each of these platforms, here's what I've learned.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Quick Version
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Platform&lt;/th&gt;
&lt;th&gt;Ideal Use Case&lt;/th&gt;
&lt;th&gt;What Sets It Apart&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;AWS Bedrock&lt;/td&gt;
&lt;td&gt;Switching between multiple models&lt;/td&gt;
&lt;td&gt;Smart routing that picks the right model automatically&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Azure OpenAI&lt;/td&gt;
&lt;td&gt;Enterprise access to GPT&lt;/td&gt;
&lt;td&gt;Tight Microsoft 365 connectivity&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Gemini API&lt;/td&gt;
&lt;td&gt;Processing huge documents&lt;/td&gt;
&lt;td&gt;Context window up to 2M tokens&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  AWS Bedrock
&lt;/h2&gt;

&lt;p&gt;Bedrock is Amazon's managed gateway to foundation models from Anthropic, Meta, Mistral, Cohere, and others—all through one unified API.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why it stands out:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Access Claude, Llama, Mistral, and Stable Diffusion without juggling multiple integrations&lt;/li&gt;
&lt;li&gt;Automatic prompt routing selects the most cost-effective model for each request (potential 30% savings)&lt;/li&gt;
&lt;li&gt;Plugs directly into S3, Lambda, and SageMaker&lt;/li&gt;
&lt;li&gt;Native RAG support with built-in vector storage&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Pricing snapshot (Claude 3.5 Sonnet):&lt;/strong&gt; $3/million input tokens, $15/million output tokens. Batch processing cuts costs in half.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Best fit:&lt;/strong&gt; Teams already on AWS who want model flexibility and strong compliance credentials.&lt;/p&gt;

&lt;h2&gt;
  
  
  Azure OpenAI
&lt;/h2&gt;

&lt;p&gt;Microsoft's enterprise-grade wrapper around OpenAI's models, with security and governance baked in.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why it stands out:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Direct access to GPT-4o, o1, DALL-E 3, and Whisper&lt;/li&gt;
&lt;li&gt;Seamless hooks into Teams, Power Platform, and the broader Microsoft ecosystem&lt;/li&gt;
&lt;li&gt;Your data stays private and isn't used for training&lt;/li&gt;
&lt;li&gt;Provisioned Throughput Units (PTUs) for predictable billing&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Pricing snapshot (GPT-4o):&lt;/strong&gt; $2.50/million input tokens, $10/million output tokens.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Best fit:&lt;/strong&gt; Organizations already running Microsoft infrastructure who specifically need OpenAI models.&lt;/p&gt;

&lt;h2&gt;
  
  
  Gemini API
&lt;/h2&gt;

&lt;p&gt;Google's multimodal platform with an industry-leading context window and native support for text, images, audio, and video.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why it stands out:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;2M token context—roughly 8x what GPT-4 offers&lt;/li&gt;
&lt;li&gt;True multimodal processing without preprocessing steps&lt;/li&gt;
&lt;li&gt;Built-in web search grounding for real-time information&lt;/li&gt;
&lt;li&gt;Generous free tier (1,500+ daily requests)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Pricing snapshot (Gemini 2.5 Pro):&lt;/strong&gt; $1.25/million input tokens (under 200K context), $10/million output tokens.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Best fit:&lt;/strong&gt; Document-heavy applications, multimodal use cases, or teams prototyping on a budget.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to Decide
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Already deep in AWS?&lt;/strong&gt; → Bedrock&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Need GPT-4 specifically?&lt;/strong&gt; → Azure OpenAI&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Processing documents over 200K tokens?&lt;/strong&gt; → Gemini&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Early-stage or budget-conscious?&lt;/strong&gt; → Gemini's free tier&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Want to experiment across models?&lt;/strong&gt; → Bedrock&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Saving Money
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Bedrock:&lt;/strong&gt; Use batch mode and smart routing; enable prompt caching&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Azure:&lt;/strong&gt; Reserve PTUs for steady workloads; use batch API for non-urgent tasks&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Gemini:&lt;/strong&gt; Max out the free tier during development; use Flash models when speed matters less&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Bottom Line
&lt;/h2&gt;

&lt;p&gt;Each platform excels in different scenarios. Bedrock offers unmatched model flexibility. Azure OpenAI delivers the smoothest experience for Microsoft-centric teams. Gemini's massive context window changes what's possible for document analysis.&lt;/p&gt;

&lt;p&gt;No single platform wins across the board—your best choice depends on your existing infrastructure, specific model requirements, and budget. And honestly? You'll probably end up using more than one. 😅&lt;/p&gt;

</description>
      <category>aws</category>
      <category>bedrock</category>
      <category>openai</category>
      <category>gemini</category>
    </item>
  </channel>
</rss>
