<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Pablo Albaladejo</title>
    <description>The latest articles on Forem by Pablo Albaladejo (@pabloalbaladejo).</description>
    <link>https://forem.com/pabloalbaladejo</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F2130560%2F9f69ea37-3a48-41f6-9a1e-cd1cdd60e412.jpeg</url>
      <title>Forem: Pablo Albaladejo</title>
      <link>https://forem.com/pabloalbaladejo</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/pabloalbaladejo"/>
    <language>en</language>
    <item>
      <title>Deploy Observable AI Streaming on AWS: CDK Stack with Tests</title>
      <dc:creator>Pablo Albaladejo</dc:creator>
      <pubDate>Sun, 15 Feb 2026 11:50:13 +0000</pubDate>
      <link>https://forem.com/pabloalbaladejo/observable-ai-streaming-on-aws-part-4-complete-cdk-stack-38cj</link>
      <guid>https://forem.com/pabloalbaladejo/observable-ai-streaming-on-aws-part-4-complete-cdk-stack-38cj</guid>
      <description>&lt;p&gt;This is the final post in a four-part series. In &lt;a href="https://dev.to/pabloalbaladejo/observable-ai-streaming-on-aws-part-1-api-gateway-rest-with-lambda-595a"&gt;Part 1&lt;/a&gt;, we set up the architecture: API Gateway REST streaming through Lambda with Middy and the Vercel AI SDK. In &lt;a href="https://dev.to/pabloalbaladejo/observable-ai-streaming-on-aws-part-2-the-middy-after-hook-problem-3b7c"&gt;Part 2&lt;/a&gt;, we discovered that Middy's &lt;code&gt;after&lt;/code&gt; hook fires before the stream body is consumed, making it useless for observability.&lt;/p&gt;

&lt;p&gt;In &lt;a href="https://dev.to/pabloalbaladejo/observable-ai-streaming-on-aws-part-3-the-transformstream-pipeline-4p4j"&gt;Part 3&lt;/a&gt;, we built the solution: a &lt;code&gt;TransformStream&lt;/code&gt; pipeline that defers logging and data publishing to the &lt;code&gt;flush()&lt;/code&gt; callback, which the WHATWG Streams spec guarantees runs only after all chunks have been consumed.&lt;/p&gt;

&lt;p&gt;Now we deploy it. This post walks through every piece: the CDK infrastructure, the handler composition, the test suite, and the deployment. The companion repo contains the complete, runnable code as &lt;code&gt;step-4&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;By the end, you will have a working stack that streams structured JSON from Bedrock Claude through API Gateway to the client, with observability that fires at the right time: after the stream completes, not before.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;TL;DR:&lt;/strong&gt; A single CDK stack deploys API Gateway REST with streaming, a Lambda function running Middy + AI SDK v6, and Bedrock IAM permissions. The handler composes AI SDK middlewares (logging + data store) with a Middy middleware (&lt;code&gt;publishLog&lt;/code&gt;) that uses the Dual Middleware Bridge and TransformStream Pipeline patterns. Deploy with &lt;code&gt;npx cdk deploy&lt;/code&gt;, test with &lt;code&gt;curl --no-buffer&lt;/code&gt;. Full test suite included: 13 tests covering flush timing, error classification, and the streaming pipeline.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  How do you deploy API Gateway REST streaming with CDK?
&lt;/h2&gt;

&lt;p&gt;The infrastructure is a single CDK stack with three resources: a Lambda function, an API Gateway REST API, and the IAM permissions to call Bedrock.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// lib/streaming-stack.ts&lt;/span&gt;

&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nx"&gt;cdk&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;aws-cdk-lib&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nx"&gt;apigateway&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;aws-cdk-lib/aws-apigateway&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nx"&gt;iam&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;aws-cdk-lib/aws-iam&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nx"&gt;lambda&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;aws-cdk-lib/aws-lambda&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nx"&gt;logs&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;aws-cdk-lib/aws-logs&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="kd"&gt;type&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;Construct&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;constructs&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nx"&gt;path&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;node:path&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;StreamingStack&lt;/span&gt; &lt;span class="kd"&gt;extends&lt;/span&gt; &lt;span class="nc"&gt;cdk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Stack&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nf"&gt;constructor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;scope&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;Construct&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;props&lt;/span&gt;&lt;span class="p"&gt;?:&lt;/span&gt; &lt;span class="nx"&gt;cdk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;StackProps&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;super&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;scope&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;props&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="c1"&gt;// --- Lambda Function ---&lt;/span&gt;

    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;fn&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nx"&gt;lambda&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Function&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;StreamingHandler&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;runtime&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;lambda&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Runtime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;NODEJS_22_X&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;handler&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;index.handler&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;code&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;lambda&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Code&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fromAsset&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;__dirname&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;..&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;lambda&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="na"&gt;bundling&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
          &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;lambda&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Runtime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;NODEJS_22_X&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;bundlingImage&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="na"&gt;command&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;bash&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;-c&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;[&lt;/span&gt;
              &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;npx esbuild index.ts --bundle --platform=node --target=node22 --outfile=/asset-output/index.mjs --format=esm --external:@aws-sdk/*&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
              &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;cp /asset-output/index.mjs /asset-output/index.js&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt; &amp;amp;&amp;amp; &lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
          &lt;span class="p"&gt;],&lt;/span&gt;
          &lt;span class="na"&gt;local&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="nf"&gt;tryBundle&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="na"&gt;outputDir&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nx"&gt;boolean&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
              &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;execSync&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;require&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;node:child_process&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
                &lt;span class="nf"&gt;execSync&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                  &lt;span class="s2"&gt;`npx esbuild lambda/index.ts --bundle --platform=node --target=node22 --outfile=&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;outputDir&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;/index.js --format=esm --external:@aws-sdk/*`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                  &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;cwd&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;__dirname&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;..&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
                &lt;span class="p"&gt;);&lt;/span&gt;
                &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
              &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
              &lt;span class="p"&gt;}&lt;/span&gt;
            &lt;span class="p"&gt;},&lt;/span&gt;
          &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="p"&gt;}),&lt;/span&gt;
      &lt;span class="na"&gt;memorySize&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;512&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;timeout&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;cdk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Duration&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;seconds&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;60&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
      &lt;span class="na"&gt;logRetention&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;logs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;RetentionDays&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ONE_WEEK&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;environment&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="na"&gt;NODE_OPTIONS&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;--enable-source-maps&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="c1"&gt;// Grant Bedrock model invocation permissions&lt;/span&gt;
    &lt;span class="nx"&gt;fn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;addToRolePolicy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
      &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nx"&gt;iam&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;PolicyStatement&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
        &lt;span class="na"&gt;actions&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
          &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;bedrock:InvokeModel&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;bedrock:InvokeModelWithResponseStream&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="na"&gt;resources&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;arn:aws:bedrock:*::foundation-model/*&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
      &lt;span class="p"&gt;}),&lt;/span&gt;
    &lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="c1"&gt;// --- API Gateway REST API ---&lt;/span&gt;

    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;api&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nx"&gt;apigateway&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;RestApi&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;StreamingApi&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;restApiName&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;streaming-llm-api&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;API Gateway REST with Lambda response streaming&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;deployOptions&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="na"&gt;stageName&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;v1&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;streamResource&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;api&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;root&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;addResource&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;stream&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="c1"&gt;// OPTIONS for CORS preflight&lt;/span&gt;
    &lt;span class="nx"&gt;streamResource&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;addCorsPreflight&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
      &lt;span class="na"&gt;allowOrigins&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;apigateway&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Cors&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ALL_ORIGINS&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;allowMethods&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;POST&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;OPTIONS&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
      &lt;span class="na"&gt;allowHeaders&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Content-Type&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="c1"&gt;// POST /stream -- Lambda proxy integration&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;postMethod&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;streamResource&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;addMethod&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
      &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;POST&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nx"&gt;apigateway&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;LambdaIntegration&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;fn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;proxy&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt; &lt;span class="p"&gt;}),&lt;/span&gt;
    &lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="c1"&gt;// --- CfnMethod escape hatch for streaming ---&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;cfnMethod&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;postMethod&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;node&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;defaultChild&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nx"&gt;apigateway&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;CfnMethod&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="nx"&gt;cfnMethod&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;addPropertyOverride&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
      &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Integration.ResponseTransferMode&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;STREAM&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="nx"&gt;cfnMethod&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;addPropertyOverride&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Integration.TimeoutInMillis&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;60&lt;/span&gt;&lt;span class="nx"&gt;_000&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="nx"&gt;cfnMethod&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;addPropertyOverride&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
      &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Integration.Uri&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="nx"&gt;cdk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Fn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sub&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;arn:aws:apigateway:${AWS::Region}:lambda:path/2021-11-15/functions/${FnArn}/response-streaming-invocations&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;FnArn&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;fn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;functionArn&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="c1"&gt;// --- Outputs ---&lt;/span&gt;

    &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nx"&gt;cdk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;CfnOutput&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;ApiUrl&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;value&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;api&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;stream`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;POST endpoint for streaming LLM responses&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A few things to note.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Node.js 22 with esbuild bundling.&lt;/strong&gt; The &lt;code&gt;local.tryBundle&lt;/code&gt; hook runs esbuild on your machine if it is available, falling back to Docker if not.&lt;/p&gt;

&lt;p&gt;We bundle as ESM, exclude &lt;code&gt;@aws-sdk/*&lt;/code&gt; (already in the Lambda runtime), and enable source maps for readable stack traces.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bedrock permissions.&lt;/strong&gt; We grant both &lt;code&gt;bedrock:InvokeModel&lt;/code&gt; and &lt;code&gt;bedrock:InvokeModelWithResponseStream&lt;/code&gt;. The AI SDK uses the streaming variant, but having both keeps the door open for non-streaming calls in the same handler.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The CfnMethod escape hatch.&lt;/strong&gt; As we covered in &lt;a href="https://dev.to/pabloalbaladejo/observable-ai-streaming-on-aws-part-1-api-gateway-rest-with-lambda-595a"&gt;Part 1&lt;/a&gt;, CDK's L2 &lt;code&gt;addMethod&lt;/code&gt; construct does not expose &lt;code&gt;ResponseTransferMode&lt;/code&gt; or the &lt;code&gt;response-streaming-invocations&lt;/code&gt; Lambda invocation URI.&lt;/p&gt;

&lt;p&gt;We drop down to the L1 &lt;code&gt;CfnMethod&lt;/code&gt; and override three properties: the transfer mode, the timeout, and the integration URI. This is the only CDK workaround required to enable response streaming through API Gateway REST.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;CORS via &lt;code&gt;addCorsPreflight&lt;/code&gt;.&lt;/strong&gt; This generates the OPTIONS mock integration automatically. The &lt;code&gt;httpCors&lt;/code&gt; Middy middleware adds the headers to the actual POST response.&lt;/p&gt;

&lt;h2&gt;
  
  
  How do you compose Middy and AI SDK middlewares for streaming?
&lt;/h2&gt;

&lt;p&gt;The handler wires together two middleware systems: AI SDK middlewares (which operate on the LLM stream) and Middy middlewares (which operate on the HTTP response).&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// lambda/index.ts&lt;/span&gt;

&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;middy&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@middy/core&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;httpCors&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@middy/http-cors&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="kd"&gt;type&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;APIGatewayProxyEvent&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;aws-lambda&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;DataStoreMiddleware&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;./ai-middleware/data-store-middleware&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;LoggingMiddleware&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;./ai-middleware/logging-middleware&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;publishLog&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;./middleware/publish-log&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;streamErrorHandler&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;./middleware/stream-error-handler&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;streamingService&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;./streaming-service&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;llmLogDataStore&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;./utils/log-data-store&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;type&lt;/span&gt; &lt;span class="nx"&gt;HttpStreamResponse&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;body&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;ReadableStream&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nb"&gt;Uint8Array&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Record&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;statusCode&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;logger&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;info&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(...&lt;/span&gt;&lt;span class="na"&gt;args&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Array&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;unknown&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;[INFO]&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;...&lt;/span&gt;&lt;span class="nx"&gt;args&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="na"&gt;warn&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(...&lt;/span&gt;&lt;span class="na"&gt;args&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Array&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;unknown&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;warn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;[WARN]&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;...&lt;/span&gt;&lt;span class="nx"&gt;args&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="na"&gt;error&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(...&lt;/span&gt;&lt;span class="na"&gt;args&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Array&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;unknown&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;[ERROR]&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;...&lt;/span&gt;&lt;span class="nx"&gt;args&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;onError&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;streamErrorHandler&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;// AI SDK middlewares -- operate on the LanguageModel stream&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;aiMiddlewares&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
  &lt;span class="nc"&gt;LoggingMiddleware&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="nc"&gt;DataStoreMiddleware&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;llmLogDataStore&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="kd"&gt;set&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;];&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;streamHandler&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;APIGatewayProxyEvent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;HttpStreamResponse&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;body&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;parse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;{}&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="na"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;
    &lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Summarize the benefits of serverless architecture&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;streamingService&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;prompt&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="nx"&gt;onError&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;aiMiddlewares&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;toTextStreamResponse&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;body&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="dl"&gt;''&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Object&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fromEntries&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;entries&lt;/span&gt;&lt;span class="p"&gt;()),&lt;/span&gt;
    &lt;span class="na"&gt;statusCode&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;status&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;};&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;

&lt;span class="c1"&gt;// Middy middleware chain:&lt;/span&gt;
&lt;span class="c1"&gt;//   1. httpCors         -- adds CORS headers&lt;/span&gt;
&lt;span class="c1"&gt;//   2. publishLog       -- before: clears store&lt;/span&gt;
&lt;span class="c1"&gt;//                          after (streaming): wraps body with TransformStream&lt;/span&gt;
&lt;span class="c1"&gt;//                          after (non-streaming): publishes immediately&lt;/span&gt;
&lt;span class="c1"&gt;//&lt;/span&gt;
&lt;span class="c1"&gt;// Data flow for a streaming response:&lt;/span&gt;
&lt;span class="c1"&gt;//&lt;/span&gt;
&lt;span class="c1"&gt;//   streamText()&lt;/span&gt;
&lt;span class="c1"&gt;//     -&amp;gt; AI SDK stream (LanguageModelV3StreamPart)&lt;/span&gt;
&lt;span class="c1"&gt;//       -&amp;gt; LoggingMiddleware.wrapStream    (flush: logs completion)&lt;/span&gt;
&lt;span class="c1"&gt;//       -&amp;gt; DataStoreMiddleware.wrapStream  (flush: stores data)&lt;/span&gt;
&lt;span class="c1"&gt;//     -&amp;gt; toTextStreamResponse()&lt;/span&gt;
&lt;span class="c1"&gt;//       -&amp;gt; ReadableStream&amp;lt;Uint8Array&amp;gt;&lt;/span&gt;
&lt;span class="c1"&gt;//         -&amp;gt; publishLog TransformStream    (flush: publishes stored data)&lt;/span&gt;
&lt;span class="c1"&gt;//           -&amp;gt; Middy pipes to Lambda responseStream&lt;/span&gt;
&lt;span class="c1"&gt;//             -&amp;gt; API Gateway streams to client&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;handler&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;middy&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;APIGatewayProxyEvent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;HttpStreamResponse&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;streamifyResponse&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;
  &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;use&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;httpCors&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;origins&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;*&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;}))&lt;/span&gt;
  &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;use&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;publishLog&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;llmLogDataStore&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
  &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;handler&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;streamHandler&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The data flow comment is the map of the entire pipeline. Read it bottom to top to see how data reaches the client. Read it top to bottom to see where each &lt;code&gt;flush()&lt;/code&gt; fires.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;flowchart LR
    A["streamText()"] --&amp;gt; B["LoggingMiddleware\n(TransformStream)"]
    B --&amp;gt; C["DataStoreMiddleware\n(TransformStream)"]
    C --&amp;gt; D["toTextStreamResponse()"]
    D --&amp;gt; E["publishLog\n(TransformStream)"]
    E --&amp;gt; F["Lambda\nresponseStream"]
    F --&amp;gt; G["API Gateway\n→ Client"]

    B -. "flush: log completion" .-&amp;gt; B
    C -. "flush: store data" .-&amp;gt; C
    E -. "flush: publish log" .-&amp;gt; E

    style B fill:#e1f5fe
    style C fill:#e1f5fe
    style E fill:#fff3e0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Three &lt;code&gt;TransformStream&lt;/code&gt; instances are chained together:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;LoggingMiddleware&lt;/strong&gt;: wraps the AI SDK's raw &lt;code&gt;LanguageModelV3StreamPart&lt;/code&gt; stream. On flush, it logs the chunk count and model ID.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;DataStoreMiddleware&lt;/strong&gt;: wraps the same stream (after logging). On flush, it writes the full request params and response data into the in-memory store.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;publishLog&lt;/strong&gt;: wraps the final &lt;code&gt;ReadableStream&amp;lt;Uint8Array&amp;gt;&lt;/code&gt; (after &lt;code&gt;toTextStreamResponse()&lt;/code&gt; converts the AI SDK stream to bytes). On flush, it reads the store and publishes the log.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The in-memory &lt;code&gt;llmLogDataStore&lt;/code&gt; is the bridge between the two middleware systems. AI SDK middlewares write to it. The Middy middleware reads from it.&lt;/p&gt;

&lt;p&gt;Both are connected through &lt;code&gt;flush()&lt;/code&gt; ordering: the AI SDK flushes fire first (they are upstream in the pipeline), then the Middy flush fires last (it is downstream, wrapping the byte stream).&lt;/p&gt;

&lt;h2&gt;
  
  
  Testing the Pipeline
&lt;/h2&gt;

&lt;p&gt;We test three units independently: the stream transform utility, the publish log middleware, and the stream error handler.&lt;/p&gt;

&lt;h3&gt;
  
  
  createStreamTransform
&lt;/h3&gt;

&lt;p&gt;This is the generic utility that wraps any &lt;code&gt;ReadableStream&lt;/code&gt; with a &lt;code&gt;TransformStream&lt;/code&gt; and calls a callback on flush.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// test/create-stream-transform.test.ts&lt;/span&gt;

&lt;span class="nf"&gt;describe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;createStreamTransform&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nf"&gt;it&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;passes all chunks through unchanged&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;source&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nx"&gt;ReadableStream&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
      &lt;span class="nf"&gt;start&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;controller&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="k"&gt;void&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nx"&gt;controller&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;enqueue&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;hello&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="nx"&gt;controller&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;enqueue&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt; &lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="nx"&gt;controller&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;enqueue&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;world&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="nx"&gt;controller&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;close&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
      &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;onFlush&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;vi&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fn&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;transformed&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;createStreamTransform&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;source&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;onFlush&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;readStream&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;transformed&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="nf"&gt;expect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;toEqual&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;hello&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt; &lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;world&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;

  &lt;span class="nf"&gt;it&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;calls onFlush with all accumulated chunks after stream ends&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;source&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nx"&gt;ReadableStream&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
      &lt;span class="nf"&gt;start&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;controller&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="k"&gt;void&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nx"&gt;controller&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;enqueue&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;a&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="nx"&gt;controller&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;enqueue&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;b&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="nx"&gt;controller&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;enqueue&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;c&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="nx"&gt;controller&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;close&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
      &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;onFlush&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;vi&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fn&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;transformed&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;createStreamTransform&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;source&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;onFlush&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;readStream&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;transformed&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="nf"&gt;expect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;onFlush&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;toHaveBeenCalledOnce&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="nf"&gt;expect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;onFlush&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;toHaveBeenCalledWith&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;a&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;b&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;c&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;

  &lt;span class="nf"&gt;it&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;swallows errors thrown in onFlush&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;source&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nx"&gt;ReadableStream&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
      &lt;span class="nf"&gt;start&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;controller&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="k"&gt;void&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nx"&gt;controller&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;enqueue&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;data&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="nx"&gt;controller&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;close&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
      &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;onFlush&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;vi&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fn&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;mockRejectedValue&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;publish failed&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;transformed&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;createStreamTransform&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;source&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;onFlush&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;readStream&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;transformed&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="nf"&gt;expect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;toEqual&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;data&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;
    &lt;span class="nf"&gt;expect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;onFlush&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;toHaveBeenCalledOnce&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;

  &lt;span class="nf"&gt;it&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;does not call onFlush when stream is cancelled&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;source&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nx"&gt;ReadableStream&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
      &lt;span class="nf"&gt;start&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;controller&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="k"&gt;void&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nx"&gt;controller&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;enqueue&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;first&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="c1"&gt;// Stream stays open -- simulates an ongoing LLM stream&lt;/span&gt;
      &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;onFlush&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;vi&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fn&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;transformed&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;createStreamTransform&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;source&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;onFlush&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;reader&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;transformed&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getReader&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;reader&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt; &lt;span class="c1"&gt;// Read 'first'&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;reader&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;cancel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;client disconnected&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="nf"&gt;expect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;onFlush&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nx"&gt;not&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;toHaveBeenCalled&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Four behaviors, four tests. The critical ones are the last two: flush swallows errors (observability never breaks the client stream), and flush does not fire on cancellation (a client disconnect should not trigger publication of incomplete data).&lt;/p&gt;

&lt;h3&gt;
  
  
  publishLog
&lt;/h3&gt;

&lt;p&gt;This tests the dual-path behavior of the Middy middleware: immediate publication for non-streaming responses, deferred publication for streaming responses.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// test/publish-log.test.ts&lt;/span&gt;

&lt;span class="nf"&gt;describe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;publishLog middleware&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nf"&gt;it&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;publishes immediately for non-streaming responses&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;store&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;createLlmLogDataStore&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;logger&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;createMockLogger&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;middleware&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;publishLog&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;store&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="nx"&gt;store&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
      &lt;span class="na"&gt;llmParam&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;test&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="na"&gt;llmResult&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;hello&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;request&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;response&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="na"&gt;body&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;result&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;ok&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;}),&lt;/span&gt;
        &lt;span class="na"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{},&lt;/span&gt;
        &lt;span class="na"&gt;statusCode&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;};&lt;/span&gt;

    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;middleware&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;after&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;request&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nx"&gt;never&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="nf"&gt;expect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;info&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;toHaveBeenCalledWith&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
      &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;LLM log data (non-streaming)&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="nx"&gt;expect&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;objectContaining&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;expect&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;any&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;Object&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;}),&lt;/span&gt;
    &lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;

  &lt;span class="nf"&gt;it&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;defers publication to flush() for streaming responses&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;store&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;createLlmLogDataStore&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;logger&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;createMockLogger&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;middleware&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;publishLog&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;store&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;sourceStream&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nx"&gt;ReadableStream&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nb"&gt;Uint8Array&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
      &lt;span class="nf"&gt;start&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;controller&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="k"&gt;void&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nx"&gt;controller&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;enqueue&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;TextEncoder&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;encode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;chunk1&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
        &lt;span class="nx"&gt;controller&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;enqueue&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;TextEncoder&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;encode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;chunk2&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
        &lt;span class="nx"&gt;controller&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;close&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
      &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="nx"&gt;store&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
      &lt;span class="na"&gt;llmParam&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;test&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="na"&gt;llmResult&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;tokens&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;42&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;request&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;response&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="na"&gt;body&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;sourceStream&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;content-type&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;text/plain&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="na"&gt;statusCode&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;};&lt;/span&gt;

    &lt;span class="c1"&gt;// After hook wraps the stream -- does NOT publish yet&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;middleware&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;after&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;request&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nx"&gt;never&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="nf"&gt;expect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;info&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nx"&gt;not&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;toHaveBeenCalled&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

    &lt;span class="c1"&gt;// Consume the wrapped stream -- flush() fires and publishes&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;body&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;readStream&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
      &lt;span class="nx"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nx"&gt;ReadableStream&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nb"&gt;Uint8Array&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="nf"&gt;expect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;toBe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;chunk1chunk2&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="nf"&gt;expect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;info&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;toHaveBeenCalledWith&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
      &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;LLM log data (streaming)&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="nx"&gt;expect&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;objectContaining&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;expect&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;any&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;Object&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;}),&lt;/span&gt;
    &lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The key assertion is in the streaming test: after the &lt;code&gt;after&lt;/code&gt; hook runs, &lt;code&gt;logger.info&lt;/code&gt; has &lt;strong&gt;not&lt;/strong&gt; been called. It is only called after we consume the stream with &lt;code&gt;readStream()&lt;/code&gt;. This proves that the deferred execution model works.&lt;/p&gt;

&lt;h3&gt;
  
  
  streamErrorHandler
&lt;/h3&gt;

&lt;p&gt;Streaming introduces error types you do not see in request/response handlers. The &lt;code&gt;streamErrorHandler&lt;/code&gt; classifies them.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// test/stream-error-handler.test.ts&lt;/span&gt;

&lt;span class="nf"&gt;describe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;streamErrorHandler&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nf"&gt;it&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;logs AbortError at warn level&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;logger&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;createMockLogger&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;handler&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;streamErrorHandler&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;error&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;The operation was aborted&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;AbortError&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="nf"&gt;handler&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;error&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="nf"&gt;expect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;warn&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;toHaveBeenCalledOnce&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="nf"&gt;expect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nx"&gt;not&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;toHaveBeenCalled&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;

  &lt;span class="nf"&gt;it&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;logs ERR_INVALID_STATE as abort at warn level&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;logger&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;createMockLogger&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;handler&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;streamErrorHandler&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;error&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;TypeError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Invalid state: Reader released&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;error&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nx"&gt;unknown&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;code&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt; &lt;span class="p"&gt;}).&lt;/span&gt;&lt;span class="nx"&gt;code&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;ERR_INVALID_STATE&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="nf"&gt;handler&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;error&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="nf"&gt;expect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;warn&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;toHaveBeenCalledOnce&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="nf"&gt;expect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nx"&gt;not&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;toHaveBeenCalled&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;

  &lt;span class="nf"&gt;it&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;detects nested AbortError in cause chain&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;logger&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;createMockLogger&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;handler&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;streamErrorHandler&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;cause&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;aborted&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="nx"&gt;cause&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;AbortError&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;error&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;AI SDK error&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;cause&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="nf"&gt;handler&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;error&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="nf"&gt;expect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;warn&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;toHaveBeenCalledOnce&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="nf"&gt;expect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nx"&gt;not&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;toHaveBeenCalled&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;

  &lt;span class="nf"&gt;it&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;logs non-abort errors at error level&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;logger&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;createMockLogger&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;handler&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;streamErrorHandler&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="nf"&gt;handler&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;error&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;connection timeout&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="nf"&gt;expect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;toHaveBeenCalledOnce&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="nf"&gt;expect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;warn&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nx"&gt;not&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;toHaveBeenCalled&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;AbortError&lt;/code&gt; and &lt;code&gt;ERR_INVALID_STATE&lt;/code&gt; are normal in streaming. They mean the client disconnected. The AI SDK wraps these in its own errors, so we walk the &lt;code&gt;cause&lt;/code&gt; chain.&lt;/p&gt;

&lt;p&gt;These go to &lt;code&gt;warn&lt;/code&gt;, not &lt;code&gt;error&lt;/code&gt;, because they are not actionable failures. Everything else goes to &lt;code&gt;error&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  The awslambda mock fixture
&lt;/h3&gt;

&lt;p&gt;Testing &lt;code&gt;streamifyResponse&lt;/code&gt; handlers outside the Lambda runtime requires mocking the &lt;code&gt;awslambda&lt;/code&gt; global. The test fixture creates a minimal implementation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// test/fixtures/streaming.fixtures.ts&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;setupAwsLambdaStreamingMock&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;():&lt;/span&gt; &lt;span class="k"&gt;void&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;globalThis&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nx"&gt;unknown&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;awslambda&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;unknown&lt;/span&gt; &lt;span class="p"&gt;}).&lt;/span&gt;&lt;span class="nx"&gt;awslambda&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;HttpResponseStream&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;from&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="na"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;LambdaResponseStream&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;HttpResponseMetadata&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nx"&gt;LambdaResponseStream&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nx"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;__httpResponseMetadata&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
      &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="na"&gt;streamifyResponse&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;T&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="na"&gt;handler&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;T&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nx"&gt;T&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;handler&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;};&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;streamifyResponse&lt;/code&gt; is a pass-through (it returns the handler as-is). &lt;code&gt;HttpResponseStream.from&lt;/code&gt; attaches metadata to the writable stream. This is enough to test the full Middy chain locally without deploying.&lt;/p&gt;

&lt;h2&gt;
  
  
  Deployment and Verification
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Prerequisites
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Node.js 22+&lt;/li&gt;
&lt;li&gt;AWS CLI configured with credentials&lt;/li&gt;
&lt;li&gt;AWS CDK CLI (&lt;code&gt;npm install -g aws-cdk&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;An AWS account with &lt;a href="https://docs.aws.amazon.com/bedrock/latest/userguide/model-access.html" rel="noopener noreferrer"&gt;Bedrock model access&lt;/a&gt; enabled for Claude 3.5 Sonnet v2&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Deploy
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cd &lt;/span&gt;step-4
npm &lt;span class="nb"&gt;install
&lt;/span&gt;npx cdk deploy
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;CDK will output the API URL:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Outputs:
StreamingLlmStack.ApiUrl = https://abc123.execute-api.us-east-1.amazonaws.com/v1/stream
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Test with curl
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-i&lt;/span&gt; &lt;span class="nt"&gt;--no-buffer&lt;/span&gt; &lt;span class="nt"&gt;-X&lt;/span&gt; POST &lt;span class="se"&gt;\&lt;/span&gt;
  https://&amp;lt;api-id&amp;gt;.execute-api.&amp;lt;region&amp;gt;.amazonaws.com/v1/stream &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s1"&gt;'Content-Type: application/json'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{"prompt": "Explain the benefits of serverless architecture"}'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;--no-buffer&lt;/code&gt; flag tells curl to display chunks as they arrive instead of buffering the entire response. You should see:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Headers arrive immediately.&lt;/strong&gt; &lt;code&gt;Transfer-Encoding: chunked&lt;/code&gt;, &lt;code&gt;Content-Type: text/plain; charset=utf-8&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;A 1-2 second pause.&lt;/strong&gt; This is Lambda's initial buffering, not a bug. Lambda buffers roughly the first 6 KB before beginning to stream.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Chunks arriving progressively.&lt;/strong&gt; JSON fragments flowing in as Bedrock generates them.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The connection closes&lt;/strong&gt; when the stream is complete.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Verify observability
&lt;/h3&gt;

&lt;p&gt;Open CloudWatch Logs for the Lambda function. After the response finishes streaming, you should see two log entries:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;[INFO] LLM streaming call completed&lt;/code&gt;: from the &lt;code&gt;LoggingMiddleware&lt;/code&gt; flush, with chunk count and model ID.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;[INFO] LLM log data (streaming)&lt;/code&gt;: from the &lt;code&gt;publishLog&lt;/code&gt; flush, with the full request parameters and response data.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Both entries appear &lt;strong&gt;after&lt;/strong&gt; the last byte has been sent to the client. This is the whole point.&lt;/p&gt;

&lt;p&gt;If we had relied on Middy's &lt;code&gt;after&lt;/code&gt; hook, these logs would appear before any streaming started, with no data to report.&lt;/p&gt;

&lt;h3&gt;
  
  
  Run the tests locally
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;test&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Tear down
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx cdk destroy
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  What are the production considerations for streaming Lambda responses?
&lt;/h2&gt;

&lt;p&gt;The companion code is intentionally minimal. Here is what you would add before shipping to production.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Structured logging.&lt;/strong&gt; Replace the &lt;code&gt;console.log&lt;/code&gt; wrapper with &lt;a href="https://docs.powertools.aws.dev/lambda/typescript/latest/" rel="noopener noreferrer"&gt;Powertools for AWS Lambda (TypeScript)&lt;/a&gt;. It gives you structured JSON logs, correlation IDs, and log sampling out of the box.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Event publishing.&lt;/strong&gt; The &lt;code&gt;publishLog&lt;/code&gt; middleware currently writes to CloudWatch. In production, replace &lt;code&gt;logger.info('LLM log data (streaming)', { data })&lt;/code&gt; with a publish to SNS, Kinesis Data Firehose, or whatever event bus your platform uses. The &lt;code&gt;flush()&lt;/code&gt; pattern works the same regardless of the destination.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Authentication.&lt;/strong&gt; Add a Middy &lt;code&gt;before&lt;/code&gt; hook that validates a JWT or calls an authorizer. It runs before the handler, so it has no interaction with the streaming pipeline.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Request validation.&lt;/strong&gt; Add a Zod schema and a parser middleware to validate the incoming body. Reject bad input before you invoke Bedrock.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;VPC considerations.&lt;/strong&gt; &lt;a href="https://docs.aws.amazon.com/lambda/latest/dg/configuration-response-streaming.html" rel="noopener noreferrer"&gt;Lambda Function URLs do not support response streaming when the function is in a VPC&lt;/a&gt;. API Gateway REST does, because it uses the &lt;code&gt;InvokeWithResponseStream&lt;/code&gt; API from its own managed infrastructure.&lt;/p&gt;

&lt;p&gt;If your Lambda needs VPC access (to reach an RDS database, for example), API Gateway REST is your path.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Timeout alignment.&lt;/strong&gt; Set the API Gateway integration timeout equal to or greater than the Lambda function timeout. In our stack, both are 60 seconds. If the API Gateway timeout is shorter, it will close the connection while Lambda is still streaming, and the client will get a truncated response with no error signal.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pricing and quotas.&lt;/strong&gt; Lambda response streaming uses &lt;code&gt;InvokeWithResponseStream&lt;/code&gt;, which is billed identically to standard &lt;code&gt;Invoke&lt;/code&gt; (duration plus memory). However, &lt;a href="https://docs.aws.amazon.com/lambda/latest/dg/configuration-response-streaming.html" rel="noopener noreferrer"&gt;streaming responses are not interrupted when the client disconnects&lt;/a&gt;. You are billed for the full function duration even if the client closes the connection mid-stream.&lt;/p&gt;

&lt;p&gt;API Gateway REST has a hard 10 MB payload limit for streamed responses. Lambda supports up to 200 MB via &lt;code&gt;InvokeWithResponseStream&lt;/code&gt;. For LLM responses (typically tens of KB), neither limit is a concern.&lt;/p&gt;

&lt;p&gt;The default &lt;a href="https://docs.aws.amazon.com/apigateway/latest/developerguide/api-gateway-execution-service-limits-table.html" rel="noopener noreferrer"&gt;integration timeout of 29 seconds&lt;/a&gt; can be increased for regional and private APIs through Service Quotas.&lt;/p&gt;

&lt;h2&gt;
  
  
  Series Recap
&lt;/h2&gt;

&lt;p&gt;Four posts, one problem, one solution.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://dev.to/pabloalbaladejo/observable-ai-streaming-on-aws-part-1-api-gateway-rest-with-lambda-595a"&gt;Part 1&lt;/a&gt;: The Architecture.&lt;/strong&gt; API Gateway REST can stream Lambda responses using &lt;code&gt;ResponseTransferMode: STREAM&lt;/code&gt; and the &lt;code&gt;response-streaming-invocations&lt;/code&gt; URI. Middy's &lt;code&gt;streamifyResponse&lt;/code&gt; option preserves the full middleware chain. The AI SDK provides a clean interface to Bedrock with &lt;code&gt;streamText&lt;/code&gt;, &lt;code&gt;Output.object()&lt;/code&gt;, and &lt;code&gt;wrapLanguageModel&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://dev.to/pabloalbaladejo/observable-ai-streaming-on-aws-part-2-the-middy-after-hook-problem-3b7c"&gt;Part 2&lt;/a&gt;: The Problem.&lt;/strong&gt; Middy's &lt;code&gt;after&lt;/code&gt; hook fires when the handler returns, not when the stream body has been consumed. For streaming responses, the handler returns a &lt;code&gt;ReadableStream&lt;/code&gt; that has not been read yet. Any observability work in &lt;code&gt;after&lt;/code&gt; runs against empty data.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://dev.to/pabloalbaladejo/observable-ai-streaming-on-aws-part-3-the-transformstream-pipeline-4p4j"&gt;Part 3&lt;/a&gt;: The Solution.&lt;/strong&gt; The WHATWG Streams &lt;code&gt;TransformStream&lt;/code&gt; has a &lt;code&gt;flush()&lt;/code&gt; callback that fires after all chunks have passed through. We use this as a deferred execution point: accumulate data during &lt;code&gt;transform()&lt;/code&gt;, publish it during &lt;code&gt;flush()&lt;/code&gt;. The AI SDK's &lt;code&gt;wrapStream&lt;/code&gt; middleware pattern lets us do the same at the LLM stream level.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://dev.to/pabloalbaladejo/observable-ai-streaming-on-aws-part-4-complete-cdk-stack-38cj"&gt;Part 4&lt;/a&gt;: The Implementation.&lt;/strong&gt; A complete CDK stack, a composed Lambda handler, and a test suite that verifies every behavior.&lt;/p&gt;

&lt;p&gt;The key takeaway: &lt;strong&gt;when working with streaming responses in serverless, attach your observability to the data pipeline, not to lifecycle hooks&lt;/strong&gt;. Middleware frameworks were designed for request/response. Streams are a different paradigm.&lt;/p&gt;

&lt;p&gt;The WHATWG Streams spec gives you &lt;code&gt;flush()&lt;/code&gt; as a guaranteed execution point after stream completion. Use it.&lt;/p&gt;

&lt;p&gt;The complete source code is available in the &lt;a href="https://github.com/pablo-albaladejo/streaming-lambda-ai-sdk/tree/step-4/complete-cdk-stack" rel="noopener noreferrer"&gt;&lt;code&gt;step-4&lt;/code&gt; branch&lt;/a&gt; of the companion repository.&lt;/p&gt;

&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://aws.amazon.com/blogs/compute/building-responsive-apis-with-amazon-api-gateway-response-streaming/" rel="noopener noreferrer"&gt;Building responsive APIs with Amazon API Gateway response streaming&lt;/a&gt;: AWS announcement (Nov 2025)&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://docs.aws.amazon.com/lambda/latest/dg/configuration-response-streaming.html" rel="noopener noreferrer"&gt;Response streaming for Lambda functions&lt;/a&gt;: AWS Lambda Developer Guide&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://docs.aws.amazon.com/bedrock/latest/userguide/model-access.html" rel="noopener noreferrer"&gt;Bedrock model access configuration&lt;/a&gt;: enabling Claude in your account&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://streams.spec.whatwg.org/#transformstream" rel="noopener noreferrer"&gt;WHATWG Streams Standard&lt;/a&gt;: TransformStream specification&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://ai-sdk.dev/docs/reference/ai-sdk-core/stream-text" rel="noopener noreferrer"&gt;AI SDK &lt;code&gt;streamText&lt;/code&gt; reference&lt;/a&gt;: Vercel AI SDK&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://ai-sdk.dev/docs/ai-sdk-core/middleware" rel="noopener noreferrer"&gt;AI SDK Middleware&lt;/a&gt;: LanguageModelMiddleware&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://ai-sdk.dev/providers/ai-sdk-providers/amazon-bedrock" rel="noopener noreferrer"&gt;AI SDK Amazon Bedrock Provider&lt;/a&gt;: Bedrock integration&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://middy.js.org/docs/integrations/streaming" rel="noopener noreferrer"&gt;Middy streaming integration&lt;/a&gt;: Middy documentation&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://docs.powertools.aws.dev/lambda/typescript/latest/" rel="noopener noreferrer"&gt;Powertools for AWS Lambda (TypeScript)&lt;/a&gt;: structured logging&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://aws.amazon.com/blogs/compute/running-code-after-returning-a-response-from-an-aws-lambda-function/" rel="noopener noreferrer"&gt;Running code after returning a response from Lambda&lt;/a&gt;: post-response execution caveats&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>aws</category>
      <category>serverless</category>
      <category>typescript</category>
      <category>cdk</category>
    </item>
    <item>
      <title>Deferred Observability for Streaming Lambda with TransformStream flush()</title>
      <dc:creator>Pablo Albaladejo</dc:creator>
      <pubDate>Sun, 15 Feb 2026 11:50:12 +0000</pubDate>
      <link>https://forem.com/pabloalbaladejo/observable-ai-streaming-on-aws-part-3-the-transformstream-pipeline-4p4j</link>
      <guid>https://forem.com/pabloalbaladejo/observable-ai-streaming-on-aws-part-3-the-transformstream-pipeline-4p4j</guid>
      <description>&lt;p&gt;In &lt;a href="https://dev.to/pabloalbaladejo/observable-ai-streaming-on-aws-part-2-the-middy-after-hook-problem-3b7c"&gt;Part 2&lt;/a&gt;, we diagnosed a fundamental timing problem: Middy's &lt;code&gt;after&lt;/code&gt; hook fires before the stream body is consumed by the Lambda runtime. The LLM data we need to publish simply does not exist yet when the hook runs. No amount of clever async orchestration can fix this. The data is not late; it has not been produced.&lt;/p&gt;

&lt;p&gt;That diagnosis led us to a core insight: &lt;strong&gt;observability must follow data, not lifecycle&lt;/strong&gt;. HTTP middleware frameworks model request handling as a sequence of phases: before, handler, after. That model works when the response is a finished value. It breaks when the response is a stream that has not been read yet.&lt;/p&gt;

&lt;p&gt;So we stop relying on lifecycle hooks for publication. Instead, we attach observability to the data pipeline itself.&lt;/p&gt;

&lt;p&gt;The mechanism is a &lt;code&gt;TransformStream&lt;/code&gt; whose &lt;code&gt;flush()&lt;/code&gt; callback fires at exactly the right moment: after every byte has flowed through, right before the stream signals completion to the consumer. This is not an implementation detail we stumbled into. It is a guarantee defined in the WHATWG Streams Standard.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;TL;DR:&lt;/strong&gt; Use &lt;code&gt;TransformStream.flush()&lt;/code&gt; to defer observability work until after a stream completes. The WHATWG Streams spec guarantees &lt;code&gt;flush()&lt;/code&gt; fires only after all chunks have passed through. We build two layers: AI SDK middlewares (&lt;code&gt;wrapStream&lt;/code&gt;) collect data during streaming, and a Middy middleware wraps the HTTP response with a final &lt;code&gt;TransformStream&lt;/code&gt; that publishes everything on &lt;code&gt;flush()&lt;/code&gt;. An in-memory store bridges the two middleware systems (we call this the Dual Middleware Bridge pattern).&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  How does TransformStream flush() guarantee deferred execution?
&lt;/h2&gt;

&lt;p&gt;The &lt;a href="https://developer.mozilla.org/en-US/docs/Web/API/Streams_API" rel="noopener noreferrer"&gt;Web Streams API&lt;/a&gt; ships a &lt;a href="https://developer.mozilla.org/en-US/docs/Web/API/TransformStream" rel="noopener noreferrer"&gt;&lt;code&gt;TransformStream&lt;/code&gt;&lt;/a&gt; class that sits between a readable and a writable side. You give it a transformer object with up to three callbacks:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;transform(chunk, controller)&lt;/code&gt;&lt;/strong&gt;: called for every chunk that enters the writable side. You process the chunk (or pass it through unchanged) and enqueue it to the readable side via &lt;code&gt;controller.enqueue()&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;flush(controller)&lt;/code&gt;&lt;/strong&gt;: called once, after the writable side closes successfully and every &lt;code&gt;transform()&lt;/code&gt; promise has resolved.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;cancel(reason)&lt;/code&gt;&lt;/strong&gt;: called if the readable side is cancelled by the consumer.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The &lt;code&gt;flush()&lt;/code&gt; contract comes directly from the &lt;a href="https://streams.spec.whatwg.org/#transformstream" rel="noopener noreferrer"&gt;WHATWG Streams Standard&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;flush()&lt;/code&gt; is called only after all calls to &lt;code&gt;transform()&lt;/code&gt; have completed successfully and the writable side has been closed.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;When does &lt;code&gt;flush()&lt;/code&gt; &lt;em&gt;not&lt;/em&gt; fire?&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Scenario&lt;/th&gt;
&lt;th&gt;
&lt;code&gt;flush()&lt;/code&gt; called?&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;All chunks processed, stream closed normally&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;transform()&lt;/code&gt; rejects (throws)&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Writable side is aborted&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Readable side is cancelled (e.g., client disconnect)&lt;/td&gt;
&lt;td&gt;No. &lt;code&gt;cancel()&lt;/code&gt; fires instead&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;This is exactly the semantic we need. &lt;code&gt;flush()&lt;/code&gt; gives us a guaranteed callback that fires once, after all data has passed through, and only if the stream completed successfully.&lt;/p&gt;

&lt;p&gt;If the client disconnects and the stream aborts, &lt;code&gt;flush()&lt;/code&gt; never fires, which is correct because partial LLM output is not worth publishing.&lt;/p&gt;

&lt;h2&gt;
  
  
  The createStreamTransform Utility
&lt;/h2&gt;

&lt;p&gt;We start with a small, generic utility. All the code in this post comes from &lt;a href="https://github.com/pablo-albaladejo/streaming-lambda-ai-sdk/tree/step-3/transformstream-observability" rel="noopener noreferrer"&gt;step-3 of the companion repo&lt;/a&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// lambda/utils/create-stream-transform.ts&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;createStreamTransform&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;T&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="nx"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;ReadableStream&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;T&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;onFlush&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Array&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;T&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="k"&gt;void&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="k"&gt;void&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nx"&gt;ReadableStream&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;T&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="na"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Array&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;T&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[];&lt;/span&gt;

  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;pipeThrough&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nx"&gt;TransformStream&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;T&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;T&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
      &lt;span class="nf"&gt;transform&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="na"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;T&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;controller&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="k"&gt;void&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nx"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;push&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="nx"&gt;controller&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;enqueue&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
      &lt;span class="p"&gt;},&lt;/span&gt;

      &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="nf"&gt;flush&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="k"&gt;void&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
          &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;onFlush&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
          &lt;span class="c1"&gt;// Swallow errors: observability failures must never&lt;/span&gt;
          &lt;span class="c1"&gt;// break the stream pipeline reaching the client.&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
      &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;}),&lt;/span&gt;
  &lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is a &lt;strong&gt;tap&lt;/strong&gt;: it observes the stream without modifying it. Every chunk passes through to the next stage unchanged, while being accumulated in an internal array. When the stream completes, &lt;code&gt;onFlush&lt;/code&gt; receives the full array of chunks.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;try/catch&lt;/code&gt; in &lt;code&gt;flush()&lt;/code&gt; is critical. If &lt;code&gt;onFlush&lt;/code&gt; throws (say, an SNS publish fails or a CloudWatch API call times out), the error is swallowed.&lt;/p&gt;

&lt;p&gt;If we let it propagate, the WHATWG spec says the readable side enters an error state, which would break the stream for the client. The rule is simple: observability failures must never become user-facing failures.&lt;/p&gt;

&lt;p&gt;Note the generic type parameter &lt;code&gt;&amp;lt;T&amp;gt;&lt;/code&gt;. This utility works with &lt;code&gt;LanguageModelV3StreamPart&lt;/code&gt; (AI SDK's internal stream type) and &lt;code&gt;Uint8Array&lt;/code&gt; (the serialized HTTP response bytes) alike. We will use it at both levels.&lt;/p&gt;

&lt;h2&gt;
  
  
  Two Middleware Systems
&lt;/h2&gt;

&lt;p&gt;Our architecture involves two independent middleware systems that need to cooperate:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;AI SDK middleware&lt;/strong&gt; (&lt;a href="https://ai-sdk.dev/docs/ai-sdk-core/middleware" rel="noopener noreferrer"&gt;&lt;code&gt;LanguageModelMiddleware&lt;/code&gt;&lt;/a&gt; with &lt;code&gt;wrapStream&lt;/code&gt;): operates on the raw LLM stream. Each chunk is a &lt;code&gt;LanguageModelV3StreamPart&lt;/code&gt;: a typed union of token deltas, usage metadata, finish reasons, and so on.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Middy middleware&lt;/strong&gt;: operates on the HTTP response after &lt;code&gt;toTextStreamResponse()&lt;/code&gt; has serialized the AI SDK stream into a &lt;code&gt;ReadableStream&amp;lt;Uint8Array&amp;gt;&lt;/code&gt;. At this level, chunks are raw bytes.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;These two systems run at different layers and at different times. The AI SDK middlewares wrap the model before &lt;code&gt;streamText()&lt;/code&gt; is called. The Middy middleware wraps the response body after the handler returns.&lt;/p&gt;

&lt;p&gt;They need a bridge.&lt;/p&gt;

&lt;p&gt;The bridge is an in-memory store: a simple &lt;code&gt;Map&lt;/code&gt; that the AI SDK middleware writes to (during its &lt;code&gt;flush()&lt;/code&gt;) and the Middy middleware reads from (during its own &lt;code&gt;flush()&lt;/code&gt;):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// lambda/utils/log-data-store.ts&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;type&lt;/span&gt; &lt;span class="nx"&gt;LlmLogData&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;
  &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;llmParam&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Record&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;PropertyKey&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;unknown&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
      &lt;span class="nl"&gt;llmResult&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Record&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;PropertyKey&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;unknown&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="kc"&gt;undefined&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;type&lt;/span&gt; &lt;span class="nx"&gt;LlmLogDataStore&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;clear&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="k"&gt;void&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;get&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;LlmLogData&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;set&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="na"&gt;data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;LlmLogData&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="k"&gt;void&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;createLlmLogDataStore&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;():&lt;/span&gt; &lt;span class="nx"&gt;LlmLogDataStore&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;storage&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nb"&gt;Map&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;LlmLogData&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;KEY&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;llmLogData&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;clear&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;():&lt;/span&gt; &lt;span class="k"&gt;void&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;storage&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;clear&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="na"&gt;get&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;():&lt;/span&gt; &lt;span class="nx"&gt;LlmLogData&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;storage&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;KEY&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="na"&gt;set&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="na"&gt;data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;LlmLogData&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="k"&gt;void&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;storage&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;};&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;llmLogDataStore&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;createLlmLogDataStore&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A &lt;code&gt;Map&lt;/code&gt; is safe here because Lambda is single-threaded. One request at a time, one reader and one writer, no race conditions. The &lt;code&gt;before&lt;/code&gt; hook clears the store at the start of each invocation.&lt;/p&gt;

&lt;h2&gt;
  
  
  AI SDK Middlewares
&lt;/h2&gt;

&lt;p&gt;We build two AI SDK middlewares, both using &lt;code&gt;createStreamTransform&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  LoggingMiddleware
&lt;/h3&gt;

&lt;p&gt;Logs model response metadata (chunk count, model ID) when the stream completes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// lambda/ai-middleware/logging-middleware.ts&lt;/span&gt;

&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="kd"&gt;type&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;LanguageModelMiddleware&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;ai&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;createStreamTransform&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;../utils/create-stream-transform&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;type&lt;/span&gt; &lt;span class="nx"&gt;Logger&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;info&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(...&lt;/span&gt;&lt;span class="na"&gt;args&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Array&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;unknown&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="k"&gt;void&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;LoggingMiddleware&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;Logger&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nx"&gt;LanguageModelMiddleware&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;specificationVersion&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;v3&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

  &lt;span class="na"&gt;wrapGenerate&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;doGenerate&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;doGenerate&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="nx"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;LLM call completed&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;usage&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;usage&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;modelId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;modelId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;

  &lt;span class="na"&gt;wrapStream&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;doStream&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;doStream&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="p"&gt;...&lt;/span&gt;&lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;createStreamTransform&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;streamParts&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nx"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;LLM streaming call completed&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
          &lt;span class="na"&gt;chunkCount&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;streamParts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;});&lt;/span&gt;
      &lt;span class="p"&gt;}),&lt;/span&gt;
    &lt;span class="p"&gt;};&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  DataStoreMiddleware
&lt;/h3&gt;

&lt;p&gt;Captures both the input parameters and the streaming result, then writes them to the shared store:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// lambda/ai-middleware/data-store-middleware.ts&lt;/span&gt;

&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="kd"&gt;type&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;LanguageModelMiddleware&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;ai&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;createStreamTransform&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;../utils/create-stream-transform&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="kd"&gt;type&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;LlmLogData&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;../utils/log-data-store&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;DataStoreMiddleware&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="nx"&gt;setData&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;LlmLogData&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="k"&gt;void&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nx"&gt;LanguageModelMiddleware&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;specificationVersion&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;v3&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

  &lt;span class="na"&gt;wrapGenerate&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;doGenerate&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;params&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;doGenerate&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="nf"&gt;setData&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
      &lt;span class="na"&gt;llmParam&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;params&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nb"&gt;Record&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;PropertyKey&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;unknown&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;llmResult&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nb"&gt;Record&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;PropertyKey&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;unknown&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;

  &lt;span class="na"&gt;wrapStream&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;doStream&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;params&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;doStream&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="p"&gt;...&lt;/span&gt;&lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;createStreamTransform&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;streamParts&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nf"&gt;setData&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
          &lt;span class="na"&gt;llmParam&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;params&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nb"&gt;Record&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;PropertyKey&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;unknown&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="na"&gt;llmResult&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="na"&gt;request&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;request&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="na"&gt;response&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="nx"&gt;streamParts&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;});&lt;/span&gt;
      &lt;span class="p"&gt;}),&lt;/span&gt;
    &lt;span class="p"&gt;};&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Both middlewares follow the same shape. In &lt;code&gt;wrapGenerate&lt;/code&gt;, the data is available synchronously after &lt;code&gt;doGenerate()&lt;/code&gt; resolves, so no TransformStream is needed.&lt;/p&gt;

&lt;p&gt;In &lt;code&gt;wrapStream&lt;/code&gt;, data only becomes available after the entire stream is consumed, so we use &lt;code&gt;createStreamTransform&lt;/code&gt; to defer our work to &lt;code&gt;flush()&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The middlewares are composed onto the model via &lt;a href="https://ai-sdk.dev/docs/reference/ai-sdk-core/wrap-language-model" rel="noopener noreferrer"&gt;&lt;code&gt;wrapLanguageModel()&lt;/code&gt;&lt;/a&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;aiMiddlewares&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
  &lt;span class="nc"&gt;LoggingMiddleware&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="nc"&gt;DataStoreMiddleware&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;llmLogDataStore&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="kd"&gt;set&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;];&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;wrapLanguageModel&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;baseModel&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;middleware&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;aiMiddlewares&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  How do you publish observability data after a stream completes?
&lt;/h2&gt;

&lt;p&gt;This is the Middy middleware that actually solves the timing problem from &lt;a href="https://dev.to/pabloalbaladejo/observable-ai-streaming-on-aws-part-2-the-middy-after-hook-problem-3b7c"&gt;Part 2&lt;/a&gt;. It is a dual-path middleware: it handles both streaming and non-streaming responses.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// lambda/middleware/publish-log.ts&lt;/span&gt;

&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="kd"&gt;type&lt;/span&gt; &lt;span class="nx"&gt;middy&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@middy/core&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="kd"&gt;type&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;LlmLogDataStore&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;../utils/log-data-store&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;type&lt;/span&gt; &lt;span class="nx"&gt;StreamResponse&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;body&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;ReadableStream&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nb"&gt;Uint8Array&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Record&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;statusCode&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;

&lt;span class="kd"&gt;type&lt;/span&gt; &lt;span class="nx"&gt;Logger&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;info&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(...&lt;/span&gt;&lt;span class="na"&gt;args&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Array&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;unknown&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="k"&gt;void&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;error&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(...&lt;/span&gt;&lt;span class="na"&gt;args&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Array&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;unknown&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="k"&gt;void&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;isStreamResponse&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;unknown&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="k"&gt;is&lt;/span&gt; &lt;span class="nx"&gt;StreamResponse&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;
  &lt;span class="k"&gt;typeof&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;object&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt;
  &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;!==&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt;
  &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;body&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt;
  &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nx"&gt;StreamResponse&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt; &lt;span class="k"&gt;instanceof&lt;/span&gt; &lt;span class="nx"&gt;ReadableStream&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;publishLog&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="nx"&gt;llmLogDataStore&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;LlmLogDataStore&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;Logger&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nx"&gt;middy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;MiddlewareObj&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;publishImmediately&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="k"&gt;void&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;llmLogDataStore&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;LLM log data (non-streaming)&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Failed to publish LLM log data&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;error&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;finally&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;llmLogDataStore&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;clear&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;};&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;publishAfterStreamConsumption&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="na"&gt;request&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;middy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Request&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="k"&gt;void&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nx"&gt;StreamResponse&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;publishTransform&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nx"&gt;TransformStream&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nb"&gt;Uint8Array&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;Uint8Array&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
      &lt;span class="nf"&gt;transform&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="na"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Uint8Array&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;controller&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="k"&gt;void&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nx"&gt;controller&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;enqueue&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
      &lt;span class="p"&gt;},&lt;/span&gt;

      &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="nf"&gt;flush&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="k"&gt;void&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
          &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;llmLogDataStore&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
          &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

          &lt;span class="nx"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;LLM log data (streaming)&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
          &lt;span class="nx"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Failed to publish LLM log data&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;error&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;finally&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
          &lt;span class="nx"&gt;llmLogDataStore&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;clear&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
      &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nx"&gt;StreamResponse&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;
      &lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;pipeThrough&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;publishTransform&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;};&lt;/span&gt;

  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;before&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;():&lt;/span&gt; &lt;span class="k"&gt;void&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;llmLogDataStore&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;clear&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;

    &lt;span class="na"&gt;after&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="na"&gt;request&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;middy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Request&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="k"&gt;void&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;isStreamResponse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nf"&gt;publishAfterStreamConsumption&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;request&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;

      &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;publishImmediately&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;

    &lt;span class="na"&gt;onError&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="k"&gt;void&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;publishImmediately&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;};&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Read the &lt;code&gt;after&lt;/code&gt; hook carefully. For streaming responses, it does &lt;strong&gt;not&lt;/strong&gt; publish. It &lt;strong&gt;wraps&lt;/strong&gt; the response body with a new TransformStream.&lt;/p&gt;

&lt;p&gt;The actual publication happens inside that TransformStream's &lt;code&gt;flush()&lt;/code&gt;, which fires later, after the Lambda runtime has consumed every byte.&lt;/p&gt;

&lt;p&gt;For non-streaming responses (e.g., a handler that uses &lt;code&gt;generateObject&lt;/code&gt; instead of &lt;code&gt;streamText&lt;/code&gt;), the data is already available, so we publish immediately. The &lt;code&gt;isStreamResponse&lt;/code&gt; type guard decides which path to take.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;before&lt;/code&gt; hook clears the store at the start of each invocation.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;onError&lt;/code&gt; hook publishes whatever data exists if the handler throws before streaming starts. Partial data can still be valuable for debugging failures.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Complete Pipeline
&lt;/h2&gt;

&lt;p&gt;When the Lambda runtime consumes the response body, data flows through a chain of TransformStreams:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;streamText()
  -&amp;gt; AI SDK stream (LanguageModelV3StreamPart)
    -&amp;gt; LoggingMiddleware.wrapStream    (flush: logs completion)
    -&amp;gt; DataStoreMiddleware.wrapStream  (flush: stores data in Map)
  -&amp;gt; toTextStreamResponse()
    -&amp;gt; ReadableStream&amp;lt;Uint8Array&amp;gt;
      -&amp;gt; publishLog TransformStream    (flush: reads Map, publishes)
        -&amp;gt; Middy pipes to Lambda responseStream
          -&amp;gt; API Gateway streams to client
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here is the same pipeline as a visual diagram:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;flowchart LR
    A["streamText()"] --&amp;gt; B["LoggingMiddleware\n(TransformStream)"]
    B --&amp;gt; C["DataStoreMiddleware\n(TransformStream)"]
    C --&amp;gt; D["toTextStreamResponse()"]
    D --&amp;gt; E["publishLog\n(TransformStream)"]
    E --&amp;gt; F["Lambda\nresponseStream"]
    F --&amp;gt; G["API Gateway\n→ Client"]

    B -. "flush: log completion" .-&amp;gt; B
    C -. "flush: store data" .-&amp;gt; C
    E -. "flush: publish log" .-&amp;gt; E

    style B fill:#e1f5fe
    style C fill:#e1f5fe
    style E fill:#fff3e0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The three colored boxes are &lt;code&gt;TransformStream&lt;/code&gt; instances. The blue ones are AI SDK middlewares operating on &lt;code&gt;LanguageModelV3StreamPart&lt;/code&gt; chunks. The orange one is the Middy middleware operating on &lt;code&gt;Uint8Array&lt;/code&gt; bytes after &lt;code&gt;toTextStreamResponse()&lt;/code&gt; serialization.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;flush()&lt;/code&gt; callbacks execute in pipeline order. This is guaranteed by how &lt;code&gt;pipeThrough&lt;/code&gt; chains work: the upstream TransformStream must close its readable side before the downstream TransformStream's writable side closes, which means upstream &lt;code&gt;flush()&lt;/code&gt; completes before downstream &lt;code&gt;flush()&lt;/code&gt; is invoked.&lt;/p&gt;

&lt;p&gt;The sequence:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;LoggingMiddleware.flush()&lt;/strong&gt;: logs the LLM streaming result to CloudWatch.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;DataStoreMiddleware.flush()&lt;/strong&gt;: writes &lt;code&gt;llmParam&lt;/code&gt; and &lt;code&gt;llmResult&lt;/code&gt; into the in-memory store.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;publishLog.flush()&lt;/strong&gt;: reads from the store and publishes to the observability backend (SNS, Kinesis, CloudWatch, or whatever you need).&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Step 2 sets the data. Step 3 reads it. This ordering is not a coincidence we hope holds. It is a structural guarantee of pipeThrough chains in the WHATWG spec.&lt;/p&gt;

&lt;p&gt;Upstream flushes complete before downstream flushes begin.&lt;/p&gt;

&lt;p&gt;Here is how the handler wires everything together:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// lambda/index.ts&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;aiMiddlewares&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
  &lt;span class="nc"&gt;LoggingMiddleware&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="nc"&gt;DataStoreMiddleware&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;llmLogDataStore&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="kd"&gt;set&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;];&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;streamHandler&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;APIGatewayProxyEvent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;HttpStreamResponse&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;body&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;parse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;{}&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="na"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Summarize the benefits of serverless architecture&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;streamingService&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;prompt&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="nx"&gt;onError&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;aiMiddlewares&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;toTextStreamResponse&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;body&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="dl"&gt;''&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Object&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fromEntries&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;entries&lt;/span&gt;&lt;span class="p"&gt;()),&lt;/span&gt;
    &lt;span class="na"&gt;statusCode&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;status&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;};&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;handler&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;middy&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;APIGatewayProxyEvent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;HttpStreamResponse&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;streamifyResponse&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;
  &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;use&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;httpCors&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;origins&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;*&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;}))&lt;/span&gt;
  &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;use&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;publishLog&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;llmLogDataStore&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
  &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;handler&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;streamHandler&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Nothing in this handler blocks on observability. The stream starts flowing to the client immediately. The TransformStreams execute their &lt;code&gt;flush()&lt;/code&gt; callbacks transparently as part of the pipe chain, after every byte has been delivered.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why is TransformStream flush() better than AI SDK onFinish?
&lt;/h2&gt;

&lt;p&gt;AI SDK provides an &lt;code&gt;onFinish&lt;/code&gt; callback in &lt;code&gt;streamText()&lt;/code&gt;. You might wonder why we did not just use that.&lt;/p&gt;

&lt;p&gt;The problem is that &lt;code&gt;onFinish&lt;/code&gt; is fire-and-forget. The AI SDK calls your callback but does not await the returned promise. From the AI SDK source:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Simplified from AI SDK internals&lt;/span&gt;
&lt;span class="nx"&gt;onFinish&lt;/span&gt;&lt;span class="p"&gt;?.({&lt;/span&gt; &lt;span class="nx"&gt;usage&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;object&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;...&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt; &lt;span class="c1"&gt;// No await&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;If your &lt;code&gt;onFinish&lt;/code&gt; handler does async work (publishing to SNS, writing to a database), the Lambda runtime may freeze the execution context before that work completes.&lt;/li&gt;
&lt;li&gt;There is no backpressure. The stream closes and the response ends regardless of whether your callback finished.&lt;/li&gt;
&lt;li&gt;There is no error propagation. If &lt;code&gt;onFinish&lt;/code&gt; throws, nobody catches it.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Our &lt;code&gt;flush()&lt;/code&gt; approach is fundamentally different:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;flush()&lt;/code&gt; is part of the pipeline.&lt;/strong&gt; It executes as a step in the pipe chain. The downstream consumer does not see the stream as "closed" until &lt;code&gt;flush()&lt;/code&gt; resolves. The Lambda runtime will not freeze the context while &lt;code&gt;flush()&lt;/code&gt; is awaiting an SNS publish.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;flush()&lt;/code&gt; only fires on success.&lt;/strong&gt; If the stream aborts (client disconnect, timeout), &lt;code&gt;flush()&lt;/code&gt; never runs. We do not publish partial, potentially misleading data.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;flush()&lt;/code&gt; ordering is deterministic.&lt;/strong&gt; In a chain of &lt;code&gt;pipeThrough&lt;/code&gt; calls, upstream flushes complete before downstream flushes begin. We rely on this to ensure the data store is populated before we read from it.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The difference is not subtle. &lt;code&gt;onFinish&lt;/code&gt; is a notification. &lt;code&gt;flush()&lt;/code&gt; is a pipeline stage. One is best-effort. The other is structurally guaranteed.&lt;/p&gt;

&lt;h2&gt;
  
  
  Recap
&lt;/h2&gt;

&lt;p&gt;The pattern we built in this post can be summarized in one sentence: attach observability to the data pipeline using TransformStream &lt;code&gt;flush()&lt;/code&gt;, instead of relying on middleware lifecycle hooks that fire at the wrong time.&lt;/p&gt;

&lt;p&gt;Three properties make this work:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;WHATWG &lt;code&gt;flush()&lt;/code&gt; guarantee&lt;/strong&gt;: fires once, after all &lt;code&gt;transform()&lt;/code&gt; promises resolve and the writable side closes successfully. This is not a Node.js quirk. It is a web standard implemented consistently across runtimes.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Pipeline ordering&lt;/strong&gt;: in a &lt;code&gt;pipeThrough&lt;/code&gt; chain, upstream &lt;code&gt;flush()&lt;/code&gt; completes before downstream &lt;code&gt;flush()&lt;/code&gt; begins. This lets us write data in one TransformStream and read it in another, with guaranteed ordering.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Error isolation&lt;/strong&gt;: &lt;code&gt;try/catch&lt;/code&gt; inside &lt;code&gt;flush()&lt;/code&gt; ensures observability failures never break the user-facing stream. The client gets their complete response regardless of whether our publish succeeded.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The &lt;code&gt;createStreamTransform&lt;/code&gt; utility is 24 lines of code. The &lt;code&gt;publishLog&lt;/code&gt; middleware is under 90. Together, they solve a problem that has no clean solution in traditional middleware architectures.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;p&gt;We have the pattern. We have the code. But how do we know it actually works? How do we verify that &lt;code&gt;flush()&lt;/code&gt; fires in the right order, that errors are properly swallowed, that aborted streams skip publication?&lt;/p&gt;

&lt;p&gt;In &lt;a href="https://dev.to/pabloalbaladejo/observable-ai-streaming-on-aws-part-4-complete-cdk-stack-38cj"&gt;Part 4&lt;/a&gt;, we put it all together: a complete CDK stack you can deploy in 5 minutes, with tests that verify the TransformStream timing guarantees.&lt;/p&gt;

&lt;p&gt;We will write unit tests that prove the flush ordering, integration tests that confirm end-to-end observability, and cover the edge cases: client disconnects, LLM timeouts, and publish failures.&lt;/p&gt;

&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://streams.spec.whatwg.org/#transformstream" rel="noopener noreferrer"&gt;WHATWG Streams Standard: TransformStream&lt;/a&gt;: the specification that guarantees &lt;code&gt;flush()&lt;/code&gt; behavior&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://developer.mozilla.org/en-US/docs/Web/API/TransformStream" rel="noopener noreferrer"&gt;MDN: TransformStream&lt;/a&gt;: API reference&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://developer.mozilla.org/en-US/docs/Web/API/ReadableStream/pipeThrough" rel="noopener noreferrer"&gt;MDN: ReadableStream.pipeThrough()&lt;/a&gt;: chaining TransformStreams&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://nodejs.org/api/webstreams.html" rel="noopener noreferrer"&gt;Node.js Web Streams API&lt;/a&gt;: Node.js 22 implementation&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://ai-sdk.dev/docs/ai-sdk-core/middleware" rel="noopener noreferrer"&gt;AI SDK Middleware&lt;/a&gt;: &lt;code&gt;LanguageModelMiddleware&lt;/code&gt; with &lt;code&gt;wrapStream&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://ai-sdk.dev/docs/reference/ai-sdk-core/wrap-language-model" rel="noopener noreferrer"&gt;AI SDK &lt;code&gt;wrapLanguageModel&lt;/code&gt;&lt;/a&gt;: composing model with middlewares&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://developers.cloudflare.com/workers/runtime-apis/streams/transformstream/" rel="noopener noreferrer"&gt;Cloudflare Workers TransformStream&lt;/a&gt;: validation of the "tap" pattern in production&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://newrelic.com/blog/how-to-relic/new-relic-serverless-response-streaming-with-ai-observability" rel="noopener noreferrer"&gt;New Relic: Serverless response streaming with AI observability&lt;/a&gt;: similar generator proxy pattern&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://community.vercel.com/t/is-onfinish-fire-and-forget/30552" rel="noopener noreferrer"&gt;Is &lt;code&gt;onFinish&lt;/code&gt; fire-and-forget?&lt;/a&gt;: Vercel community confirmation&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>aws</category>
      <category>serverless</category>
      <category>typescript</category>
      <category>webstreams</category>
    </item>
    <item>
      <title>The Middy After Hook Problem: Why Streaming Lambda Observability Is Broken</title>
      <dc:creator>Pablo Albaladejo</dc:creator>
      <pubDate>Sun, 15 Feb 2026 11:50:08 +0000</pubDate>
      <link>https://forem.com/pabloalbaladejo/observable-ai-streaming-on-aws-part-2-the-middy-after-hook-problem-3b7c</link>
      <guid>https://forem.com/pabloalbaladejo/observable-ai-streaming-on-aws-part-2-the-middy-after-hook-problem-3b7c</guid>
      <description>&lt;p&gt;In &lt;a href="https://dev.to/pabloalbaladejo/observable-ai-streaming-on-aws-part-1-api-gateway-rest-with-lambda-595a"&gt;Part 1&lt;/a&gt; we built a streaming Lambda that pipes an LLM response through API Gateway to the client using Middy and the AI SDK. It works. The client sees tokens arrive in real time, structured JSON builds incrementally in the browser, and the architecture is clean.&lt;/p&gt;

&lt;p&gt;Now we want observability. We want to log the full LLM response, publish token usage metrics, and record the structured output for auditing. The natural place for this in a Middy-based Lambda is an &lt;code&gt;after&lt;/code&gt; hook. That is exactly what we tried. And it is exactly what does not work.&lt;/p&gt;

&lt;p&gt;The problem is a timing issue between Middy's middleware lifecycle and the streaming response lifecycle. It is not documented in the Middy docs, the AWS Lambda streaming docs, or any community resource we could find.&lt;/p&gt;

&lt;p&gt;We spent significant time diagnosing it, and this post is the documentation that should have existed.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;TL;DR:&lt;/strong&gt; Middy's &lt;code&gt;after&lt;/code&gt; hook fires at T1 (when the handler returns the &lt;code&gt;ReadableStream&lt;/code&gt;), but the stream body is consumed at T2-T4. Any observability code in &lt;code&gt;after&lt;/code&gt; (logging, metrics, publishing) sees empty data because the stream has not been read yet. &lt;code&gt;onFinish&lt;/code&gt; is fire-and-forget and Lambda post-response execution is unreliable. The solution is in &lt;a href="https://dev.to/pabloalbaladejo/observable-ai-streaming-on-aws-part-3-the-transformstream-pipeline-4p4j"&gt;Part 3&lt;/a&gt;: &lt;code&gt;TransformStream.flush()&lt;/code&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Why does Middy's after hook see empty data when streaming?
&lt;/h2&gt;

&lt;p&gt;Here is what you would naturally write. A Middy &lt;code&gt;after&lt;/code&gt; hook that reads from a data store populated by the LLM call and publishes it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;publishLlmLog&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;middy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;MiddlewareObj&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;after&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;request&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;payload&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;llmLogDataStore&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;publishToSns&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
      &lt;span class="na"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="na"&gt;llmParam&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;llmParam&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;llmResult&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;llmResult&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="na"&gt;projectName&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;my-service&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="nx"&gt;llmLogDataStore&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;clear&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This pattern works perfectly for non-streaming responses. The handler calls the LLM, waits for the full result, stores it, and returns a JSON body.&lt;/p&gt;

&lt;p&gt;Middy's &lt;code&gt;after&lt;/code&gt; hook fires after the handler returns, reads the store, publishes the data. Everything is sequential and deterministic.&lt;/p&gt;

&lt;p&gt;For a streaming Lambda, the handler looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;streamHandler&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;APIGatewayProxyEvent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;HttpStreamResponse&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;body&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;parse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;{}&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="na"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Summarize the benefits of serverless architecture&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;streamingService&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;prompt&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="nx"&gt;onError&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;toTextStreamResponse&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;body&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="dl"&gt;''&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Object&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fromEntries&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;entries&lt;/span&gt;&lt;span class="p"&gt;()),&lt;/span&gt;
    &lt;span class="na"&gt;statusCode&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;status&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;};&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The handler returns a &lt;code&gt;ReadableStream&lt;/code&gt; as the body. It does not wait for the stream to finish. The LLM has barely started generating tokens. The data store is empty.&lt;/p&gt;

&lt;h2&gt;
  
  
  When does Middy's after hook fire relative to stream consumption?
&lt;/h2&gt;

&lt;p&gt;This is the exact sequence of events when Middy processes a streaming response:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;T0  Handler returns { body: ReadableStream, statusCode: 200 }
    |
T1  Middy after hook fires             &amp;lt;-- stream is UNREAD
    |                                       llmLogDataStore.get() === undefined
    |
T2  Middy pipes response.body to Lambda responseStream
    |
T3  Client receives chunks             &amp;lt;-- tokens flowing
    |   ...
    |   ...
T4  Stream ends                         &amp;lt;-- data is HERE
        DataStoreMiddleware.flush() populates the store
        But the after hook already ran at T1.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The after hook runs at T1. The data you need only exists at T4. The gap between T1 and T4 can be seconds or minutes, depending on how long the LLM takes to generate the full response.&lt;/p&gt;

&lt;p&gt;Here is another way to see it. In a non-streaming Lambda, the timeline is linear:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Handler executes -&amp;gt; LLM completes -&amp;gt; Store populated -&amp;gt; Handler returns -&amp;gt; After hook fires
                                                                           (store has data)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In a streaming Lambda, the handler returns before the LLM completes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Handler executes -&amp;gt; LLM starts -&amp;gt; Handler returns -&amp;gt; After hook fires -&amp;gt; LLM streams -&amp;gt; LLM completes
                                                      (store is empty)                   (store has data)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The after hook and the stream consumption are on different timelines. The handler returning does not mean the work is done. It means the work has barely started.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Tests Don't Catch It
&lt;/h2&gt;

&lt;p&gt;This is the part that delayed our diagnosis. The unit tests passed. Every time.&lt;/p&gt;

&lt;p&gt;In a typical unit test for this handler, you mock the AI SDK service:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="nx"&gt;vi&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;mock&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;./streaming-service&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;streamingService&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;toTextStreamResponse&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Response&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;{"summary":"test"}&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="p"&gt;}),&lt;/span&gt;
&lt;span class="p"&gt;}));&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The mock returns a resolved &lt;code&gt;Response&lt;/code&gt; with a complete body. When the test calls the handler, the mock populates the data store synchronously. By the time the after hook runs in the test, the store has data. The test passes.&lt;/p&gt;

&lt;p&gt;In production, &lt;code&gt;streamText()&lt;/code&gt; returns immediately with an unresolved stream. The AI SDK creates a &lt;code&gt;ReadableStream&lt;/code&gt; backed by an HTTP connection to the LLM. The data flows asynchronously through internal &lt;code&gt;TransformStream&lt;/code&gt; chains as the model generates tokens.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;DataStoreMiddleware&lt;/code&gt; uses a &lt;code&gt;TransformStream&lt;/code&gt; with a &lt;code&gt;flush()&lt;/code&gt; callback that only fires after the last chunk passes through:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// This TransformStream wraps the LLM's internal stream.&lt;/span&gt;
&lt;span class="c1"&gt;// flush() fires only when the stream closes -- at T4.&lt;/span&gt;
&lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="p"&gt;...&lt;/span&gt;&lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;createStreamTransform&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;streamParts&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nf"&gt;setData&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
      &lt;span class="na"&gt;llmParam&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;params&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;llmResult&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="na"&gt;request&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;request&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;response&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="nx"&gt;streamParts&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="p"&gt;}),&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;flush()&lt;/code&gt; callback runs at T4. The after hook runs at T1. In the mock, both happen at T0. The test collapses the entire timeline into a single tick and hides the race condition.&lt;/p&gt;

&lt;p&gt;This is not a testing mistake. It is a fundamental limitation of mocking streams. The mock correctly simulates the interface but eliminates the temporal behavior that causes the bug.&lt;/p&gt;

&lt;h2&gt;
  
  
  Three Approaches That Don't Work
&lt;/h2&gt;

&lt;p&gt;Before we found the solution (covered in &lt;a href="https://dev.to/pabloalbaladejo/observable-ai-streaming-on-aws-part-3-the-transformstream-pipeline-4p4j"&gt;Part 3&lt;/a&gt;), we explored three approaches that each fail for a different reason.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Move publishing to the AI SDK's &lt;code&gt;onFinish&lt;/code&gt; callback
&lt;/h3&gt;

&lt;p&gt;The AI SDK provides an &lt;code&gt;onFinish&lt;/code&gt; callback on &lt;code&gt;streamText()&lt;/code&gt;. It fires when the stream completes. That sounds like exactly what we need:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="nf"&gt;streamText&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="nx"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;output&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;Output&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;object&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;schema&lt;/span&gt; &lt;span class="p"&gt;}),&lt;/span&gt;
  &lt;span class="na"&gt;onFinish&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;object&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;usage&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;publishToSns&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;llmResult&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;object&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;usage&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The problem: &lt;code&gt;onFinish&lt;/code&gt; is fire-and-forget. The AI SDK calls it but does not &lt;code&gt;await&lt;/code&gt; the returned promise. The promise resolves (or rejects) silently in the background.&lt;/p&gt;

&lt;p&gt;In Lambda, the execution environment can freeze or terminate after &lt;code&gt;responseStream.end()&lt;/code&gt;. If &lt;code&gt;onFinish&lt;/code&gt; triggers an async operation like an SNS publish, there is no guarantee it completes before the environment is reclaimed.&lt;/p&gt;

&lt;p&gt;This is a known limitation. The Vercel community has &lt;a href="https://community.vercel.com/t/is-onfinish-fire-and-forget/30552" rel="noopener noreferrer"&gt;discussed it&lt;/a&gt; in the context of Next.js serverless functions.&lt;/p&gt;

&lt;p&gt;On Vercel's platform, &lt;code&gt;waitUntil()&lt;/code&gt; can extend the function lifetime. On Lambda, no such mechanism exists for streaming functions.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Await the stream in the after hook, then reconstruct it
&lt;/h3&gt;

&lt;p&gt;You could consume the &lt;code&gt;ReadableStream&lt;/code&gt; in the after hook to extract the data:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;afterHook&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;middy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;MiddlewareObj&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;after&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;request&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nx"&gt;HttpStreamResponse&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt; &lt;span class="k"&gt;instanceof&lt;/span&gt; &lt;span class="nx"&gt;ReadableStream&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;reader&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getReader&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="na"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Array&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nb"&gt;Uint8Array&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[];&lt;/span&gt;
    &lt;span class="k"&gt;while &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;done&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;value&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;reader&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
      &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;done&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;break&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
      &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;value&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="nx"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;push&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;value&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="c1"&gt;// Now we have the data... but the stream is consumed.&lt;/span&gt;
    &lt;span class="c1"&gt;// We need to give Middy a new stream to pipe to the client.&lt;/span&gt;
    &lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;ReadableStream&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
      &lt;span class="nf"&gt;start&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;controller&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;chunk&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
          &lt;span class="nx"&gt;controller&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;enqueue&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="nx"&gt;controller&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;close&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
      &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;publishData&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This technically works, but it defeats the purpose of streaming. You buffer the entire response in memory before the client receives a single byte. The TTFB becomes the full LLM generation time.&lt;/p&gt;

&lt;p&gt;You have rebuilt a buffered response with extra steps.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Use Lambda post-response execution
&lt;/h3&gt;

&lt;p&gt;Lambda continues executing code after &lt;code&gt;responseStream.end()&lt;/code&gt; is called. You could set a flag when the stream completes and run your publish logic after:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// After the stream ends, Lambda still has execution time&lt;/span&gt;
&lt;span class="k"&gt;void &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="k"&gt;await &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;chunk&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;textStream&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;passThrough&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="nx"&gt;passThrough&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;end&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

  &lt;span class="c1"&gt;// Post-stream work&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;llmLogDataStore&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;publishToSns&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;})();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;a href="https://aws.amazon.com/blogs/compute/running-code-after-returning-a-response-from-an-aws-lambda-function/" rel="noopener noreferrer"&gt;AWS documentation&lt;/a&gt; explicitly warns against relying on this:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"The runtime does not wait for asynchronous work to complete after the response stream has errored or been destroyed."&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;If the client disconnects mid-stream (a common event in production), the response stream is destroyed. Any async work scheduled after that point may not complete. Your publish call might execute, or it might not. There is no guarantee.&lt;/p&gt;

&lt;p&gt;This approach also moves your observability logic into the handler's streaming IIFE, coupling it to the handler code and bypassing Middy's middleware chain entirely.&lt;/p&gt;

&lt;h2&gt;
  
  
  How do you handle errors in streaming Lambda responses?
&lt;/h2&gt;

&lt;p&gt;Before we move to the solution, we need to address a related problem: error handling in a streaming Lambda has two distinct phases, and your error handler must account for both.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pre-stream errors
&lt;/h3&gt;

&lt;p&gt;Before the handler returns a &lt;code&gt;ReadableStream&lt;/code&gt;, errors behave normally. A validation failure, an auth error, a missing parameter: these throw before any bytes are written.&lt;/p&gt;

&lt;p&gt;Middy catches them, runs the &lt;code&gt;onError&lt;/code&gt; chain, and returns a standard HTTP error response with an appropriate status code. This is the familiar path.&lt;/p&gt;

&lt;h3&gt;
  
  
  Mid-stream errors
&lt;/h3&gt;

&lt;p&gt;Once the handler returns a &lt;code&gt;ReadableStream&lt;/code&gt; and Middy starts piping it to the client, the HTTP status code (200) and headers are already sent. If the LLM fails mid-generation, or the client disconnects, or a timeout fires, the error manifests as a stream destruction.&lt;/p&gt;

&lt;p&gt;There is no way to change the status code retroactively.&lt;/p&gt;

&lt;p&gt;The AI SDK wraps these errors in nested cause chains. A client disconnect might surface as:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Error: Stream processing failed
  cause: TypeError [ERR_INVALID_STATE]: The reader is not attached to a stream
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or as a direct &lt;code&gt;AbortError&lt;/code&gt; from the &lt;code&gt;AbortController&lt;/code&gt;. Or as an AI SDK wrapper error with a nested &lt;code&gt;AbortError&lt;/code&gt; in the &lt;code&gt;cause&lt;/code&gt; property.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;streamErrorHandler&lt;/code&gt; in the &lt;a href="https://github.com/pablo-albaladejo/streaming-lambda-ai-sdk/tree/step-2/stream-error-handling" rel="noopener noreferrer"&gt;step-2 companion code&lt;/a&gt; handles this by recursively checking the cause chain:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;isAbortError&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Error&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nx"&gt;boolean&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;AbortError&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;TypeError&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt;
    &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;code&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="nx"&gt;error&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt;
    &lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;code&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;ERR_INVALID_STATE&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
  &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;cause&lt;/span&gt; &lt;span class="k"&gt;instanceof&lt;/span&gt; &lt;span class="nb"&gt;Error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;isAbortError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;cause&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This function handles three cases:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Direct AbortError&lt;/strong&gt;: the &lt;code&gt;AbortController&lt;/code&gt; signal fired (timeout or explicit abort).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ERR_INVALID_STATE&lt;/strong&gt;: a &lt;code&gt;TypeError&lt;/code&gt; thrown when reading from a detached stream reader. This happens when the client disconnects and the underlying stream is destroyed while a &lt;code&gt;for await&lt;/code&gt; loop is still reading from it.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Nested cause chains&lt;/strong&gt;: the AI SDK wraps errors. A stream error from the HTTP connection gets wrapped in an AI SDK error, which may itself be wrapped in another error by the streaming pipeline. The &lt;code&gt;AbortError&lt;/code&gt; can be several levels deep in the &lt;code&gt;cause&lt;/code&gt; chain.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The log level strategy follows from the semantics: AbortErrors are expected in production (users close tabs, navigate away, connections drop) and get logged at &lt;code&gt;warn&lt;/code&gt; level. All other errors are genuine failures and get logged at &lt;code&gt;error&lt;/code&gt; level.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;streamErrorHandler&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;return &lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;error&lt;/span&gt; &lt;span class="p"&gt;}:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;error&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;unknown&lt;/span&gt; &lt;span class="p"&gt;}):&lt;/span&gt; &lt;span class="k"&gt;void&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;error&lt;/span&gt; &lt;span class="k"&gt;instanceof&lt;/span&gt; &lt;span class="nb"&gt;Error&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Unknown streaming error&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;error&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
      &lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;cause&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;extractCause&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;isAbortError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;warn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Stream aborted (client disconnect or timeout)&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nx"&gt;cause&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;errorType&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;AbortError&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;message&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="p"&gt;});&lt;/span&gt;
      &lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="nx"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Streaming error&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;cause&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;errorType&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;StreamError&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;message&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;stack&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;stack&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="p"&gt;};&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We also extract AI SDK-specific properties (&lt;code&gt;text&lt;/code&gt;, &lt;code&gt;usage&lt;/code&gt;) from the cause when present. The &lt;code&gt;NoObjectGeneratedError&lt;/code&gt;, for example, includes the partial text the model generated before failing and the token usage at the time of failure.&lt;/p&gt;

&lt;p&gt;These are valuable for debugging and cost tracking.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Documentation Gap
&lt;/h2&gt;

&lt;p&gt;We searched for this problem in every place you would expect to find it:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://middy.js.org/docs/integrations/streaming" rel="noopener noreferrer"&gt;Middy documentation&lt;/a&gt;&lt;/strong&gt;: The streaming section describes &lt;code&gt;streamifyResponse: true&lt;/code&gt; and the PassThrough/ReadableStream pattern. It does not mention the after hook timing issue.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://docs.aws.amazon.com/lambda/latest/dg/configuration-response-streaming.html" rel="noopener noreferrer"&gt;AWS Lambda streaming docs&lt;/a&gt;&lt;/strong&gt;: Cover &lt;code&gt;awslambda.streamifyResponse&lt;/code&gt;, response format, and limits. No mention of middleware timing.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://ai-sdk.dev/docs/reference/ai-sdk-core/stream-text" rel="noopener noreferrer"&gt;AI SDK documentation&lt;/a&gt;&lt;/strong&gt;: Covers &lt;code&gt;onFinish&lt;/code&gt;, &lt;code&gt;streamText&lt;/code&gt;, and stream protocols. Does not address the fire-and-forget nature of &lt;code&gt;onFinish&lt;/code&gt; in serverless contexts.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Community posts and Stack Overflow&lt;/strong&gt;: Multiple posts about Lambda streaming setup. None about middleware lifecycle interaction with streams.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The combination of Middy's streaming mode, the AI SDK's stream lifecycle, and Lambda's execution model creates a timing problem that each project documents only their part of. No single source connects the three.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;p&gt;The problem is clear: Middy's after hook fires before the stream is consumed. The data we need for observability only exists after the stream ends. The three obvious workarounds each fail for a different structural reason.&lt;/p&gt;

&lt;p&gt;The solution exists, and it is elegant. In &lt;a href="https://dev.to/pabloalbaladejo/observable-ai-streaming-on-aws-part-3-the-transformstream-pipeline-4p4j"&gt;Part 3&lt;/a&gt;, we will show how &lt;code&gt;TransformStream&lt;/code&gt;'s &lt;code&gt;flush()&lt;/code&gt; callback gives us exactly what we need: a guaranteed execution point after the entire stream is consumed, running inline with the stream pipeline, before the Lambda execution environment can freeze.&lt;/p&gt;

&lt;p&gt;It works within Middy's after hook, preserves the streaming behavior for the client, and requires no changes to the handler code.&lt;/p&gt;

&lt;p&gt;The stream itself becomes the mechanism for its own observability.&lt;/p&gt;

&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://middy.js.org/docs/integrations/streaming" rel="noopener noreferrer"&gt;Middy streaming integration&lt;/a&gt;: Middy documentation&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://docs.aws.amazon.com/lambda/latest/dg/configuration-response-streaming.html" rel="noopener noreferrer"&gt;Response streaming for Lambda functions&lt;/a&gt;: AWS Lambda Developer Guide&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://aws.amazon.com/blogs/compute/running-code-after-returning-a-response-from-an-aws-lambda-function/" rel="noopener noreferrer"&gt;Running code after returning a response from Lambda&lt;/a&gt;: AWS blog on post-response execution&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://ai-sdk.dev/docs/reference/ai-sdk-core/stream-text" rel="noopener noreferrer"&gt;AI SDK &lt;code&gt;streamText&lt;/code&gt; reference&lt;/a&gt;: Vercel AI SDK&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://community.vercel.com/t/is-onfinish-fire-and-forget/30552" rel="noopener noreferrer"&gt;Is &lt;code&gt;onFinish&lt;/code&gt; fire-and-forget?&lt;/a&gt;: Vercel community discussion&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/nodejs/node/issues/40692" rel="noopener noreferrer"&gt;Node.js AbortError is not DOMException&lt;/a&gt;: Node.js issue on AbortError behavior&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/vercel/ai/issues/4101" rel="noopener noreferrer"&gt;AI SDK Issue #4101: Invalid state after controller closed&lt;/a&gt;: AI SDK error wrapping behavior&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://betterstack.com/community/guides/scaling-nodejs/understanding-abortcontroller/" rel="noopener noreferrer"&gt;Understanding AbortController in Node.js&lt;/a&gt;: AbortSignal patterns&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>aws</category>
      <category>serverless</category>
      <category>typescript</category>
      <category>middy</category>
    </item>
    <item>
      <title>Streaming LLM Responses Through API Gateway REST with Lambda</title>
      <dc:creator>Pablo Albaladejo</dc:creator>
      <pubDate>Sun, 15 Feb 2026 11:33:39 +0000</pubDate>
      <link>https://forem.com/pabloalbaladejo/observable-ai-streaming-on-aws-part-1-api-gateway-rest-with-lambda-595a</link>
      <guid>https://forem.com/pabloalbaladejo/observable-ai-streaming-on-aws-part-1-api-gateway-rest-with-lambda-595a</guid>
      <description>&lt;h2&gt;
  
  
  The problem with buffered LLM responses
&lt;/h2&gt;

&lt;p&gt;When you call an LLM from a Lambda function and wait for the full response before returning it, the user stares at a spinner for 5 to 15 seconds. The model is generating tokens the entire time, but the client receives nothing until the very last one. For short completions this is tolerable. For structured output (summaries, extracted entities, multi-field JSON), the delay kills the experience.&lt;/p&gt;

&lt;p&gt;The fix is response streaming: send tokens to the client as the model produces them. The user sees content appearing incrementally. Time-to-first-byte drops from seconds to hundreds of milliseconds. The perceived latency improves dramatically even though the total generation time stays the same.&lt;/p&gt;

&lt;p&gt;In &lt;a href="https://aws.amazon.com/blogs/compute/building-responsive-apis-with-amazon-api-gateway-response-streaming/" rel="noopener noreferrer"&gt;November 2025&lt;/a&gt;, AWS added native response streaming support to API Gateway REST APIs. Before that, streaming from Lambda required Function URLs (limited auth, no VPC support) or WebSocket APIs (more infrastructure, more complexity). Now we get streaming through the same REST API most teams already have in production.&lt;/p&gt;

&lt;p&gt;This is the first post in a four-part series:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The infrastructure and handler setup (this post)&lt;/li&gt;
&lt;li&gt;A subtle middleware timing bug that breaks observability when streaming (&lt;a href="https://dev.to/pabloalbaladejo/observable-ai-streaming-on-aws-part-2-the-middy-after-hook-problem-3b7c"&gt;Part 2&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;A deferred observability pattern that solves it (&lt;a href="https://dev.to/pabloalbaladejo/observable-ai-streaming-on-aws-part-3-the-transformstream-pipeline-4p4j"&gt;Part 3&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;Production-grade testing for the whole thing (&lt;a href="https://dev.to/pabloalbaladejo/observable-ai-streaming-on-aws-part-4-complete-cdk-stack-38cj"&gt;Part 4&lt;/a&gt;)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Every post has a companion step in the repository with working code you can deploy.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;TL;DR:&lt;/strong&gt; Stream structured LLM responses from Bedrock Claude through API Gateway REST and Lambda using Middy's &lt;code&gt;streamifyResponse&lt;/code&gt; and AI SDK v6 &lt;code&gt;streamText()&lt;/code&gt; with &lt;code&gt;Output.object()&lt;/code&gt;. The CDK stack uses a &lt;code&gt;CfnMethod&lt;/code&gt; escape hatch to enable &lt;code&gt;ResponseTransferMode: STREAM&lt;/code&gt;. Time-to-first-byte drops from seconds to hundreds of milliseconds. Full working code included.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  What this post covers
&lt;/h2&gt;

&lt;p&gt;We will wire up a complete streaming pipeline: API Gateway REST receives a request, invokes a Lambda function in streaming mode, the Lambda uses Middy and the Vercel AI SDK to call Bedrock Claude via &lt;code&gt;streamText&lt;/code&gt;, and the structured JSON streams back to the client chunk by chunk.&lt;/p&gt;

&lt;p&gt;By the end, you will have a deployed endpoint you can &lt;code&gt;curl&lt;/code&gt; and watch tokens arrive in real time.&lt;/p&gt;

&lt;h2&gt;
  
  
  How do you stream LLM responses through API Gateway REST?
&lt;/h2&gt;

&lt;p&gt;Here is the data flow:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Client (curl / browser)
  |
  | POST /stream
  v
API Gateway REST (responseTransferMode: STREAM)
  |
  | response-streaming-invocations
  v
Lambda (Middy, streamifyResponse: true)
  |
  | streamText()
  v
Bedrock Claude 3.5 Sonnet
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The client sends a POST request. API Gateway forwards it to Lambda using the streaming invocation endpoint. Lambda calls Bedrock Claude through the AI SDK's &lt;code&gt;streamText&lt;/code&gt;, which returns a &lt;code&gt;ReadableStream&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Middy detects the stream in the response body and pipes it to Lambda's response stream. API Gateway forwards each chunk to the client as it arrives.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why API Gateway REST instead of Function URLs?&lt;/strong&gt; Three reasons:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;VPC support.&lt;/strong&gt; &lt;a href="https://docs.aws.amazon.com/lambda/latest/dg/configuration-response-streaming.html" rel="noopener noreferrer"&gt;Function URLs do not support response streaming from within a VPC&lt;/a&gt;. If your Lambda runs in a VPC to reach a database or private service, Function URLs are not an option.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Built-in auth and policies.&lt;/strong&gt; REST APIs support IAM and Cognito authorizers, request validation, and usage plans that most teams already depend on.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Battle-tested CORS.&lt;/strong&gt; CORS handling through mock integrations and gateway responses is battle-tested. Function URLs require you to handle CORS entirely in your Lambda code.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  How do you configure API Gateway REST for response streaming?
&lt;/h2&gt;

&lt;p&gt;Two properties on the API Gateway method integration enable streaming:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;&lt;code&gt;ResponseTransferMode: STREAM&lt;/code&gt;&lt;/strong&gt; tells API Gateway to forward response bytes to the client as they arrive, rather than buffering the entire response.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;The streaming invocation URI&lt;/strong&gt; replaces the standard Lambda invoke path. Instead of &lt;code&gt;.../invocations&lt;/code&gt;, you use &lt;code&gt;.../response-streaming-invocations&lt;/code&gt;. This tells API Gateway to use the &lt;code&gt;InvokeWithResponseStream&lt;/code&gt; API instead of the standard &lt;code&gt;Invoke&lt;/code&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Wire format
&lt;/h3&gt;

&lt;p&gt;When Lambda streams through an API Gateway proxy integration, the stream must follow a specific wire format:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;JSON metadata&lt;/strong&gt;: a valid JSON object containing &lt;code&gt;statusCode&lt;/code&gt;, &lt;code&gt;headers&lt;/code&gt;, and optionally &lt;code&gt;cookies&lt;/code&gt;. This can be as minimal as &lt;code&gt;{}&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;An 8-null-byte delimiter&lt;/strong&gt;: eight &lt;code&gt;0x00&lt;/code&gt; bytes that separate metadata from body.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The response payload&lt;/strong&gt;: the actual content chunks.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The delimiter must appear within the first 16 KB of the stream. If you use Middy with &lt;code&gt;streamifyResponse: true&lt;/code&gt;, it handles this format automatically. You never write the delimiter yourself.&lt;/p&gt;

&lt;h3&gt;
  
  
  Timeouts and bandwidth
&lt;/h3&gt;

&lt;p&gt;API Gateway REST streaming supports timeouts up to 15 minutes. There are two idle timeouts to be aware of:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Regional and private endpoints&lt;/strong&gt;: 5 minutes of idle time before the connection terminates.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Edge-optimized endpoints&lt;/strong&gt;: 30 seconds. For LLM streaming, avoid edge-optimized endpoints.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Bandwidth is throttled at two layers:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;Uncapped window&lt;/th&gt;
&lt;th&gt;Throttled rate&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Lambda&lt;/td&gt;
&lt;td&gt;First 6 MB&lt;/td&gt;
&lt;td&gt;2 MB/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;API Gateway&lt;/td&gt;
&lt;td&gt;First 10 MB&lt;/td&gt;
&lt;td&gt;2 MB/s&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;For LLM responses (typically tens of KB), you will never hit these limits. They matter for large file downloads, not token streaming.&lt;/p&gt;

&lt;p&gt;There are additional &lt;a href="https://docs.aws.amazon.com/apigateway/latest/developerguide/api-gateway-execution-service-limits-table.html" rel="noopener noreferrer"&gt;quotas&lt;/a&gt; to be aware of when using API Gateway REST with streaming:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Quota&lt;/th&gt;
&lt;th&gt;Value&lt;/th&gt;
&lt;th&gt;Notes&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Max streamed payload (API Gateway)&lt;/td&gt;
&lt;td&gt;10 MB&lt;/td&gt;
&lt;td&gt;Hard limit, not increasable&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Max streamed payload (Lambda)&lt;/td&gt;
&lt;td&gt;200 MB&lt;/td&gt;
&lt;td&gt;Via &lt;code&gt;InvokeWithResponseStream&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Integration timeout (default)&lt;/td&gt;
&lt;td&gt;29 seconds&lt;/td&gt;
&lt;td&gt;
&lt;a href="https://docs.aws.amazon.com/apigateway/latest/developerguide/api-gateway-execution-service-limits-table.html" rel="noopener noreferrer"&gt;Increasable&lt;/a&gt; for regional and private APIs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Idle connection timeout&lt;/td&gt;
&lt;td&gt;310 seconds&lt;/td&gt;
&lt;td&gt;General API Gateway limit (separate from the streaming idle timeouts above)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;For LLM streaming, the integration timeout is the one that matters most. The default 29-second limit may be too short for complex prompts. You can request an increase through Service Quotas, or use a longer timeout in your CDK stack (we use 60 seconds).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pricing.&lt;/strong&gt; Lambda response streaming uses &lt;code&gt;InvokeWithResponseStream&lt;/code&gt;, which is billed identically to standard &lt;code&gt;Invoke&lt;/code&gt;: duration plus memory.&lt;/p&gt;

&lt;p&gt;However, &lt;a href="https://docs.aws.amazon.com/lambda/latest/dg/configuration-response-streaming.html" rel="noopener noreferrer"&gt;streaming responses are not interrupted when the client disconnects&lt;/a&gt;. You are billed for the full function duration even if the client closes the connection mid-stream. Keep this in mind when setting timeouts.&lt;/p&gt;

&lt;h3&gt;
  
  
  CDK infrastructure
&lt;/h3&gt;

&lt;p&gt;CDK's L2 &lt;code&gt;LambdaIntegration&lt;/code&gt; construct does not expose &lt;code&gt;ResponseTransferMode&lt;/code&gt; or the streaming invocation URI. We use the &lt;code&gt;CfnMethod&lt;/code&gt; escape hatch to override the underlying CloudFormation resource directly.&lt;/p&gt;

&lt;p&gt;Here is the full stack. Read it top to bottom. We will break down the important parts after.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// lib/streaming-stack.ts&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nx"&gt;cdk&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;aws-cdk-lib&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nx"&gt;apigateway&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;aws-cdk-lib/aws-apigateway&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nx"&gt;iam&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;aws-cdk-lib/aws-iam&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nx"&gt;lambda&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;aws-cdk-lib/aws-lambda&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nx"&gt;logs&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;aws-cdk-lib/aws-logs&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="kd"&gt;type&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;Construct&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;constructs&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nx"&gt;path&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;node:path&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;StreamingStack&lt;/span&gt; &lt;span class="kd"&gt;extends&lt;/span&gt; &lt;span class="nc"&gt;cdk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Stack&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nf"&gt;constructor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;scope&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;Construct&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;props&lt;/span&gt;&lt;span class="p"&gt;?:&lt;/span&gt; &lt;span class="nx"&gt;cdk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;StackProps&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;super&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;scope&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;props&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;fn&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nx"&gt;lambda&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Function&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;StreamingHandler&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;runtime&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;lambda&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Runtime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;NODEJS_22_X&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;handler&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;index.handler&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;code&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;lambda&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Code&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fromAsset&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;__dirname&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;..&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;lambda&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="na"&gt;bundling&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
          &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;lambda&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Runtime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;NODEJS_22_X&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;bundlingImage&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="na"&gt;command&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;bash&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;-c&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;npx esbuild index.ts --bundle --platform=node --target=node22 --outfile=/asset-output/index.mjs --format=esm --external:@aws-sdk/* &amp;amp;&amp;amp; cp /asset-output/index.mjs /asset-output/index.js&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="p"&gt;}),&lt;/span&gt;
      &lt;span class="na"&gt;memorySize&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;512&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;timeout&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;cdk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Duration&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;seconds&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;60&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
      &lt;span class="na"&gt;logRetention&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;logs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;RetentionDays&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ONE_WEEK&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="nx"&gt;fn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;addToRolePolicy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
      &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nx"&gt;iam&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;PolicyStatement&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
        &lt;span class="na"&gt;actions&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
          &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;bedrock:InvokeModel&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;bedrock:InvokeModelWithResponseStream&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="na"&gt;resources&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;arn:aws:bedrock:*::foundation-model/*&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
      &lt;span class="p"&gt;}),&lt;/span&gt;
    &lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;api&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nx"&gt;apigateway&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;RestApi&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;StreamingApi&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;restApiName&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;streaming-llm-api&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;deployOptions&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;stageName&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;v1&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;streamResource&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;api&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;root&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;addResource&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;stream&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="nx"&gt;streamResource&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;addCorsPreflight&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
      &lt;span class="na"&gt;allowOrigins&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;apigateway&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Cors&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ALL_ORIGINS&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;allowMethods&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;POST&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;OPTIONS&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
      &lt;span class="na"&gt;allowHeaders&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Content-Type&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;postMethod&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;streamResource&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;addMethod&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
      &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;POST&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nx"&gt;apigateway&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;LambdaIntegration&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;fn&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;proxy&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt; &lt;span class="p"&gt;}),&lt;/span&gt;
    &lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="c1"&gt;// Escape hatch: override CloudFormation for streaming&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;cfnMethod&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;postMethod&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;node&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;defaultChild&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nx"&gt;apigateway&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;CfnMethod&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="nx"&gt;cfnMethod&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;addPropertyOverride&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
      &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Integration.ResponseTransferMode&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;STREAM&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="nx"&gt;cfnMethod&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;addPropertyOverride&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
      &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Integration.TimeoutInMillis&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;60&lt;/span&gt;&lt;span class="nx"&gt;_000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="nx"&gt;cfnMethod&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;addPropertyOverride&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
      &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Integration.Uri&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="nx"&gt;cdk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;Fn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sub&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;arn:aws:apigateway:${AWS::Region}:lambda:path/2021-11-15/functions/${FnArn}/response-streaming-invocations&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;FnArn&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;fn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;functionArn&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nx"&gt;cdk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;CfnOutput&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;ApiUrl&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;value&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;api&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;stream`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The three &lt;code&gt;addPropertyOverride&lt;/code&gt; calls are the entire difference between a buffered API and a streaming one. Everything else is standard CDK. Note that we also grant &lt;code&gt;bedrock:InvokeModelWithResponseStream&lt;/code&gt; because Bedrock needs this permission for streaming model calls, not just &lt;code&gt;InvokeModel&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lambda handler with Middy
&lt;/h2&gt;

&lt;p&gt;The Lambda handler is where the streaming behavior is configured on the compute side. Middy's &lt;a href="https://middy.js.org/docs/integrations/streaming" rel="noopener noreferrer"&gt;&lt;code&gt;streamifyResponse: true&lt;/code&gt;&lt;/a&gt; option wraps the handler with &lt;code&gt;awslambda.streamifyResponse()&lt;/code&gt; internally, so we do not call the native AWS streaming API ourselves.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// lambda/index.ts&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;middy&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@middy/core&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="nx"&gt;httpCors&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@middy/http-cors&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="kd"&gt;type&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;APIGatewayProxyEvent&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;aws-lambda&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;streamingService&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;./streaming-service&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;type&lt;/span&gt; &lt;span class="nx"&gt;HttpStreamResponse&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;body&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;ReadableStream&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nb"&gt;Uint8Array&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Record&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="nl"&gt;statusCode&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;streamHandler&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;APIGatewayProxyEvent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;HttpStreamResponse&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;body&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;parse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;{}&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="na"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;
    &lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Summarize the benefits of serverless architecture&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;streamingService&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;prompt&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;toTextStreamResponse&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;body&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="dl"&gt;''&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Object&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fromEntries&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;entries&lt;/span&gt;&lt;span class="p"&gt;()),&lt;/span&gt;
    &lt;span class="na"&gt;statusCode&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;status&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;};&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;handler&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;middy&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;APIGatewayProxyEvent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;HttpStreamResponse&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;streamifyResponse&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;
  &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;use&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;httpCors&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;origins&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;*&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;}))&lt;/span&gt;
  &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;handler&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;streamHandler&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The handler returns an object with &lt;code&gt;{ body, headers, statusCode }&lt;/code&gt;. When &lt;code&gt;body&lt;/code&gt; is a &lt;code&gt;ReadableStream&amp;lt;Uint8Array&amp;gt;&lt;/code&gt;, Middy detects this and pipes it to Lambda's response stream. When &lt;code&gt;body&lt;/code&gt; is a plain string, Middy writes it as a buffered response.&lt;/p&gt;

&lt;p&gt;This is the key difference from the native &lt;code&gt;awslambda.streamifyResponse()&lt;/code&gt; pattern, where you receive a &lt;code&gt;Writable&lt;/code&gt; stream as a handler argument and write to it directly.&lt;/p&gt;

&lt;p&gt;The Middy pattern is valuable because it preserves the middleware chain. Middlewares like &lt;code&gt;httpCors&lt;/code&gt; run in the &lt;code&gt;before&lt;/code&gt; phase (setting up CORS headers) and the &lt;code&gt;after&lt;/code&gt; phase (attaching them to the response).&lt;/p&gt;

&lt;p&gt;The handler remains a pure function that takes an event and returns a response object; the streaming mechanics are abstracted away.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;HttpStreamResponse&lt;/code&gt; type accepts either &lt;code&gt;ReadableStream&amp;lt;Uint8Array&amp;gt;&lt;/code&gt; or &lt;code&gt;string&lt;/code&gt; for the body. This matters: if your handler needs to return an error before starting the stream (validation failure, auth error), it returns a plain string body with the appropriate status code. The stream is only used for the happy path.&lt;/p&gt;

&lt;h2&gt;
  
  
  How do you use AI SDK streamText with Bedrock Claude on Lambda?
&lt;/h2&gt;

&lt;p&gt;The service layer calls Bedrock Claude through the AI SDK's &lt;a href="https://ai-sdk.dev/docs/reference/ai-sdk-core/stream-text" rel="noopener noreferrer"&gt;&lt;code&gt;streamText&lt;/code&gt;&lt;/a&gt; function with &lt;code&gt;Output.object()&lt;/code&gt; for structured generation. This sends the prompt to the model and returns a streaming result that produces structured JSON validated against a Zod schema.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// lambda/streaming-service.ts&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;bedrock&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@ai-sdk/amazon-bedrock&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;Output&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;streamText&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;ai&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;zod&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;responseSchema&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;object&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;summary&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;string&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;describe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;A concise summary of the topic&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="na"&gt;keywords&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;array&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;string&lt;/span&gt;&lt;span class="p"&gt;()).&lt;/span&gt;&lt;span class="nf"&gt;describe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Relevant keywords&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="kd"&gt;type&lt;/span&gt; &lt;span class="nx"&gt;StreamParams&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;streamingService&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;params&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;StreamParams&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;bedrock&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;anthropic.claude-3-5-sonnet-20241022-v2:0&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;streamText&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="nx"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;params&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;output&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;Output&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;object&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;schema&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;responseSchema&lt;/span&gt; &lt;span class="p"&gt;}),&lt;/span&gt;
    &lt;span class="na"&gt;system&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
      &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;You are a helpful assistant. Respond with a structured JSON object containing a summary and keywords about the topic the user asks about.&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;maxOutputTokens&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;2000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;temperature&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.7&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;streamText&lt;/code&gt; with &lt;code&gt;Output.object()&lt;/code&gt; does three things for us. It instructs the model to produce JSON matching our Zod schema. It streams the response token by token. And it exposes &lt;code&gt;toTextStreamResponse()&lt;/code&gt;, which wraps the token stream in a standard Web &lt;code&gt;Response&lt;/code&gt; object with a &lt;code&gt;ReadableStream&lt;/code&gt; body.&lt;/p&gt;

&lt;p&gt;That &lt;code&gt;Response&lt;/code&gt; is what we destructure in the handler to extract &lt;code&gt;body&lt;/code&gt;, &lt;code&gt;headers&lt;/code&gt;, and &lt;code&gt;status&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why &lt;code&gt;Output.object()&lt;/code&gt; instead of plain &lt;code&gt;streamText&lt;/code&gt;?&lt;/strong&gt; We want structured JSON output. The &lt;code&gt;output: Output.object({ schema })&lt;/code&gt; option instructs the model to generate a JSON object conforming to our Zod schema, and the AI SDK validates the final result against it.&lt;/p&gt;

&lt;p&gt;Without the &lt;code&gt;output&lt;/code&gt; option, &lt;code&gt;streamText&lt;/code&gt; returns free-form text. For structured output use cases (data extraction, form generation, classification results), &lt;code&gt;Output.object()&lt;/code&gt; is the right tool.&lt;/p&gt;

&lt;p&gt;Note that partial objects streamed during generation are not validated against the schema. Only the final complete object is. If you need validation during streaming, you would validate partial results yourself. For most UI rendering use cases, the partial objects are good enough to display incrementally.&lt;/p&gt;

&lt;h2&gt;
  
  
  Testing the endpoint
&lt;/h2&gt;

&lt;p&gt;Deploy the stack and grab the API URL from the CloudFormation output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cd &lt;/span&gt;step-1
npm &lt;span class="nb"&gt;install
&lt;/span&gt;npx cdk deploy
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then test with &lt;code&gt;curl&lt;/code&gt;. The &lt;code&gt;--no-buffer&lt;/code&gt; flag is critical. Without it, &lt;code&gt;curl&lt;/code&gt; buffers output and you will not see the streaming effect:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-i&lt;/span&gt; &lt;span class="nt"&gt;--no-buffer&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-X&lt;/span&gt; POST &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{"prompt": "Explain the CAP theorem"}'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  https://your-api-id.execute-api.us-east-1.amazonaws.com/v1/stream
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You will see the JSON object build up incrementally:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;HTTP/2 200
content-type: text/plain; charset=utf-8
access-control-allow-origin: *

{"summary":"The CAP theorem, also known as Brewer's theorem, states
that a distributed data store can only guarantee two of the following
three properties simultaneously...","keywords":["CAP theorem",
"consistency","availability","partition tolerance",...]}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;A note on first-byte latency.&lt;/strong&gt; You may notice a 1 to 2 second delay before the first chunk arrives, even though streaming is working correctly. This is &lt;a href="https://betterdev.blog/lambda-response-streaming-flush-content/" rel="noopener noreferrer"&gt;expected Lambda behavior&lt;/a&gt;: Lambda buffers chunks smaller than approximately 10 KB before flushing them to the client.&lt;/p&gt;

&lt;p&gt;The first few tokens from the model are small, so they accumulate in the buffer. Once enough data builds up or the buffer timeout triggers, the first chunk flushes and subsequent chunks flow smoothly. This is not a bug in your code or configuration.&lt;/p&gt;

&lt;h2&gt;
  
  
  What we built
&lt;/h2&gt;

&lt;p&gt;We now have a working streaming pipeline. A POST request hits API Gateway, which invokes Lambda in streaming mode. Middy wraps the handler with &lt;code&gt;streamifyResponse&lt;/code&gt;, the AI SDK calls Bedrock Claude with &lt;code&gt;streamText&lt;/code&gt; and &lt;code&gt;Output.object()&lt;/code&gt;, and the structured JSON streams back to the client as it generates.&lt;/p&gt;

&lt;p&gt;The key pieces are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;API Gateway&lt;/strong&gt;: &lt;code&gt;ResponseTransferMode: STREAM&lt;/code&gt; and the &lt;code&gt;response-streaming-invocations&lt;/code&gt; URI, set through CDK escape hatches.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Lambda/Middy&lt;/strong&gt;: &lt;code&gt;streamifyResponse: true&lt;/code&gt; with a handler that returns &lt;code&gt;{ body: ReadableStream }&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AI SDK&lt;/strong&gt;: &lt;code&gt;streamText&lt;/code&gt; with &lt;code&gt;Output.object()&lt;/code&gt; and a Zod schema, converted to a Web &lt;code&gt;Response&lt;/code&gt; via &lt;code&gt;toTextStreamResponse()&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What is next
&lt;/h2&gt;

&lt;p&gt;This clean architecture hides a subtle timing bug. When Middy streams a response, the &lt;code&gt;after&lt;/code&gt; middleware hook fires at the wrong moment: before the stream has finished writing. Any observability logic you put in &lt;code&gt;after&lt;/code&gt; (logging LLM metrics, recording token counts, publishing to CloudWatch) will execute with incomplete data or not at all.&lt;/p&gt;

&lt;p&gt;Nobody has documented this behavior, and it breaks every middleware-based observability pattern you would naturally reach for.&lt;/p&gt;

&lt;p&gt;In the &lt;a href="https://dev.to/pabloalbaladejo/observable-ai-streaming-on-aws-part-2-the-middy-after-hook-problem-3b7c"&gt;next post&lt;/a&gt;, we will reproduce this bug, trace it through Middy's source code, and understand exactly why it happens.&lt;/p&gt;




&lt;p&gt;All code for this series is available in the &lt;a href="https://github.com/pablo-albaladejo/streaming-lambda-ai-sdk" rel="noopener noreferrer"&gt;companion repository&lt;/a&gt;. Each post corresponds to a &lt;code&gt;step-N/*&lt;/code&gt; branch with a self-contained, deployable CDK project.&lt;/p&gt;

&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://aws.amazon.com/blogs/compute/building-responsive-apis-with-amazon-api-gateway-response-streaming/" rel="noopener noreferrer"&gt;Building responsive APIs with Amazon API Gateway response streaming&lt;/a&gt;: AWS announcement (Nov 2025)&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://docs.aws.amazon.com/lambda/latest/dg/configuration-response-streaming.html" rel="noopener noreferrer"&gt;Response streaming for Lambda functions&lt;/a&gt;: AWS Lambda Developer Guide&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://aws.amazon.com/blogs/compute/introducing-aws-lambda-response-streaming/" rel="noopener noreferrer"&gt;Introducing AWS Lambda response streaming&lt;/a&gt;: original launch (Apr 2023)&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://aws.amazon.com/blogs/compute/serverless-strategies-for-streaming-llm-responses/" rel="noopener noreferrer"&gt;Serverless strategies for streaming LLM responses&lt;/a&gt;: AWS comparison of streaming approaches (Nov 2025)&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://middy.js.org/docs/integrations/streaming" rel="noopener noreferrer"&gt;Middy streaming integration&lt;/a&gt;: Middy documentation&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://ai-sdk.dev/docs/reference/ai-sdk-core/stream-text" rel="noopener noreferrer"&gt;AI SDK &lt;code&gt;streamText&lt;/code&gt; reference&lt;/a&gt;: Vercel AI SDK&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://ai-sdk.dev/docs/ai-sdk-core/generating-structured-data" rel="noopener noreferrer"&gt;AI SDK Structured Output&lt;/a&gt;: Vercel AI SDK&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://ai-sdk.dev/providers/ai-sdk-providers/amazon-bedrock" rel="noopener noreferrer"&gt;AI SDK Amazon Bedrock Provider&lt;/a&gt;: Vercel AI SDK&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://betterdev.blog/lambda-response-streaming-flush-content/" rel="noopener noreferrer"&gt;Lambda response streaming flush behavior&lt;/a&gt;: chunk buffering analysis&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>aws</category>
      <category>serverless</category>
      <category>typescript</category>
      <category>ai</category>
    </item>
    <item>
      <title>Stop Chatting, Start Specifying: Spec-Driven Design with Kiro IDE</title>
      <dc:creator>Pablo Albaladejo</dc:creator>
      <pubDate>Mon, 01 Dec 2025 21:36:29 +0000</pubDate>
      <link>https://forem.com/kirodotdev/stop-chatting-start-specifying-spec-driven-design-with-kiro-ide-3b3o</link>
      <guid>https://forem.com/kirodotdev/stop-chatting-start-specifying-spec-driven-design-with-kiro-ide-3b3o</guid>
      <description>&lt;h2&gt;
  
  
  Building Kaiord with Kiro: From Specs to Production
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;How Spec-Driven Development and Test-Driven Development Changed the Way I Code&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  

  &lt;iframe src="https://www.youtube.com/embed/La_QDmY4zfI"&gt;
  &lt;/iframe&gt;



&lt;/h2&gt;

&lt;h2&gt;
  
  
  The Problem with Chat-First Development
&lt;/h2&gt;

&lt;p&gt;We've all been there. You open your AI assistant and start chatting:&lt;/p&gt;

&lt;p&gt;&lt;em&gt;"Can you help me convert FIT files to JSON?"&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The AI immediately writes code. But wait, that's not exactly what you wanted. So you clarify:&lt;/p&gt;

&lt;p&gt;&lt;em&gt;"Actually, I need it to preserve all the workout data..."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;More code appears. Still not quite right. Another message:&lt;/p&gt;

&lt;p&gt;&lt;em&gt;"No, I meant it should also handle repetition blocks..."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;This back-and-forth conversation fills up the context window. By the time you get working code, there's no space left for the AI to help with the details. &lt;strong&gt;The problem: we're asking AI to write code before understanding what we need.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Enter Spec-Driven Development
&lt;/h2&gt;

&lt;p&gt;When I discovered Kiro and its spec-driven approach, everything changed. Instead of jumping straight to code, I learned to &lt;strong&gt;think first, code later&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Here's how it works:&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 1: Create a Specification with Kiro
&lt;/h3&gt;

&lt;p&gt;Before writing any code, Kiro helps me create three documents by asking me questions:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;requirements.md&lt;/strong&gt; - What the feature needs to do&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;design.md&lt;/strong&gt; - How it will work&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;tasks.md&lt;/strong&gt; - Step-by-step implementation plan&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;I tell Kiro "I need to convert FIT files to KRD format," and Kiro starts asking questions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;em&gt;"What data needs to be preserved?"&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;"What are the acceptance criteria?"&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;"What tolerances are acceptable for round-trip conversion?"&lt;/em&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Based on my answers, Kiro generates the specification documents. Let me show you a real example from Kaiord. When I needed to convert FIT files (Garmin's binary format) to KRD (my JSON format), Kiro created this spec:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# fit-to-krd-conversion/requirements.md&lt;/span&gt;

&lt;span class="gu"&gt;## User Story&lt;/span&gt;
As a cyclist or runner, I want to convert FIT workout files to KRD format, 
so that I can edit workout structures in a human-readable format and validate 
them against a standard schema.

&lt;span class="gu"&gt;## Acceptance Criteria&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; WHEN the converter receives a valid FIT file
&lt;span class="p"&gt;-&lt;/span&gt; THEN it SHALL produce a valid KRD document
&lt;span class="p"&gt;-&lt;/span&gt; AND it SHALL preserve all workout steps
&lt;span class="p"&gt;-&lt;/span&gt; AND it SHALL validate against the KRD schema
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can see the full spec in &lt;a href="https://github.com/pablo-albaladejo/kaiord/blob/main/.kiro/specs/core/fit-to-krd-conversion/requirements.md" rel="noopener noreferrer"&gt;&lt;code&gt;.kiro/specs/core/fit-to-krd-conversion/requirements.md&lt;/code&gt;&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why this matters:&lt;/strong&gt; The spec becomes Kiro's guide. Instead of guessing what I want, Kiro has a clear document that captures my requirements through our conversation.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2: Kiro Defines the Architecture
&lt;/h3&gt;

&lt;p&gt;After understanding my requirements, Kiro proposes the architecture in the design document. It asks me questions like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;em&gt;"Should we use hexagonal architecture with ports and adapters?"&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;"What external libraries will we use?"&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;"What are the key design goals?"&lt;/em&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Based on my answers, Kiro generates the design document describing how the conversion will work:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# design.md&lt;/span&gt;

&lt;span class="gu"&gt;## Architecture&lt;/span&gt;

We'll create:
&lt;span class="p"&gt;-&lt;/span&gt; Port interface: FitReader (domain/ports)
&lt;span class="p"&gt;-&lt;/span&gt; Adapter implementation: garmin-fitsdk (adapters/fit)
&lt;span class="p"&gt;-&lt;/span&gt; Use case: convertFitToKrd (application/use-cases)

&lt;span class="gu"&gt;### Key Design Goals&lt;/span&gt;
&lt;span class="p"&gt;
1.&lt;/span&gt; &lt;span class="gs"&gt;**Round-trip safety**&lt;/span&gt;: FIT → KRD → FIT preserves data within tolerances
&lt;span class="p"&gt;2.&lt;/span&gt; &lt;span class="gs"&gt;**Hexagonal architecture**&lt;/span&gt;: Domain logic isolated from external dependencies
&lt;span class="p"&gt;3.&lt;/span&gt; &lt;span class="gs"&gt;**Dependency injection**&lt;/span&gt;: Swappable FIT SDK implementations via ports
&lt;span class="p"&gt;4.&lt;/span&gt; &lt;span class="gs"&gt;**Schema validation**&lt;/span&gt;: All KRD output validated against JSON schema
&lt;span class="p"&gt;5.&lt;/span&gt; &lt;span class="gs"&gt;**Type safety**&lt;/span&gt;: Strict TypeScript with no implicit &lt;span class="sb"&gt;`any`&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This planning phase saved me hours of refactoring later. The code structure was clear before Kiro wrote a single line.&lt;/p&gt;

&lt;p&gt;See &lt;a href="https://github.com/pablo-albaladejo/kaiord/blob/main/.kiro/steering/architecture.md" rel="noopener noreferrer"&gt;&lt;code&gt;.kiro/steering/architecture.md&lt;/code&gt;&lt;/a&gt; for the full architecture rules.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 3: Kiro Breaks Down Tasks
&lt;/h3&gt;

&lt;p&gt;Finally, Kiro breaks the work into small, testable pieces in the tasks document:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# tasks.md&lt;/span&gt;
&lt;span class="p"&gt;
-&lt;/span&gt; [ ] Define FitReader port interface
&lt;span class="p"&gt;-&lt;/span&gt; [ ] Implement Garmin FIT SDK adapter
&lt;span class="p"&gt;-&lt;/span&gt; [ ] Create convertFitToKrd use case
&lt;span class="p"&gt;-&lt;/span&gt; [ ] Write unit tests (AAA pattern)
&lt;span class="p"&gt;-&lt;/span&gt; [ ] Add round-trip validation
&lt;span class="p"&gt;-&lt;/span&gt; [ ] Document API examples
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each task is small enough for Kiro to implement in one sitting, making progress visible and momentum sustainable. I review each task, and Kiro implements it following TDD.&lt;/p&gt;




&lt;h2&gt;
  
  
  Steering Documents: Teaching Kiro My Rules
&lt;/h2&gt;

&lt;p&gt;Before Kiro writes any code, I need to teach it how I want things done. This is where &lt;strong&gt;steering documents&lt;/strong&gt; come in.&lt;/p&gt;

&lt;p&gt;Steering documents are markdown files in &lt;code&gt;.kiro/steering/&lt;/code&gt; that define the rules Kiro must follow. Think of them as a style guide, architecture manual, and best practices document all in one—but for AI.&lt;/p&gt;

&lt;p&gt;Here's what I defined for Kaiord:&lt;/p&gt;

&lt;h3&gt;
  
  
  My Steering Documents
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;.kiro/steering/
├── architecture.md       # Hexagonal architecture rules
├── tdd.md               # Test-driven development workflow
├── code-style.md        # Code quality standards
├── zod-patterns.md      # Schema-first patterns
├── testing.md           # Testing strategies
├── error-patterns.md    # Error handling patterns
└── ... 18 documents total
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each document tells Kiro:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;What patterns to follow&lt;/strong&gt; (e.g., "Use hexagonal architecture")&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;What to avoid&lt;/strong&gt; (e.g., "Never use &lt;code&gt;any&lt;/code&gt; types")&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;How to structure code&lt;/strong&gt; (e.g., "Files ≤ 100 lines, functions &amp;lt; 40 LOC")&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;When to apply rules&lt;/strong&gt; (e.g., "Write tests first, then implementation")&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The key insight:&lt;/strong&gt; Kiro reads these documents automatically while working. I don't need to repeat myself in every conversation. The rules are always there, guiding every decision Kiro makes.&lt;/p&gt;

&lt;h3&gt;
  
  
  Example: Architecture Steering Document
&lt;/h3&gt;

&lt;p&gt;Here's a snippet from my &lt;a href="https://github.com/pablo-albaladejo/kaiord/blob/main/.kiro/steering/architecture.md" rel="noopener noreferrer"&gt;&lt;code&gt;.kiro/steering/architecture.md&lt;/code&gt;&lt;/a&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# Architecture — Hexagonal &amp;amp; DI&lt;/span&gt;

Layers:
&lt;span class="p"&gt;
-&lt;/span&gt; &lt;span class="gs"&gt;**domain/**&lt;/span&gt; — pure KRD types &amp;amp; rules
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="gs"&gt;**application/**&lt;/span&gt; — use-cases; depends on &lt;span class="sb"&gt;`ports/`&lt;/span&gt; only
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="gs"&gt;**ports/**&lt;/span&gt; — I/O contracts (Fit/Tcx/Zwift Reader/Writer)
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="gs"&gt;**adapters/**&lt;/span&gt; — concrete implementations (e.g., @garmin/fitsdk)
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="gs"&gt;**cli/**&lt;/span&gt; — end-user commands; depends on &lt;span class="sb"&gt;`application`&lt;/span&gt;

Rules:
&lt;span class="p"&gt;
-&lt;/span&gt; &lt;span class="sb"&gt;`domain`&lt;/span&gt; depends on no one
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="sb"&gt;`application`&lt;/span&gt; MUST NOT import external libs nor &lt;span class="sb"&gt;`adapters/`&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="sb"&gt;`adapters`&lt;/span&gt; implement &lt;span class="sb"&gt;`ports`&lt;/span&gt; and may use external libs
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When Kiro generates code, it automatically follows these architecture rules. If I accidentally try to import an adapter in a use case, Kiro warns me about the violation. I never have to remind Kiro about architectural boundaries—it's all in the steering document.&lt;/p&gt;




&lt;h2&gt;
  
  
  Growing Together: Managing Codebase Complexity
&lt;/h2&gt;

&lt;p&gt;As a codebase grows, it's easy for things to get messy—especially when an AI can generate hundreds of lines in seconds. To keep our collaboration smooth and productive, I found that Kiro works best when we have clear processes and healthy boundaries.&lt;/p&gt;

&lt;p&gt;Think of it as setting up guardrails so we can run fast without falling off a cliff.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. The Safety Net: Test-Driven Development (TDD)
&lt;/h3&gt;

&lt;p&gt;For me, TDD isn't just a methodology; it's how I communicate my intent to Kiro. It acts as our safety net, allowing us to build features and refactor code with confidence.&lt;/p&gt;

&lt;p&gt;With TDD rules defined in my steering documents, Kiro follows this workflow automatically:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Kiro writes the test first&lt;/strong&gt; (it fails - red)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Kiro writes code to pass the test&lt;/strong&gt; (it passes - green)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;We refactor together&lt;/strong&gt; (keep tests green - refactor)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Here's a real example from &lt;a href="https://github.com/pablo-albaladejo/kaiord/blob/main/packages/core/src/application/use-cases/convert-fit-to-krd.ts" rel="noopener noreferrer"&gt;&lt;code&gt;packages/core/src/application/use-cases/convert-fit-to-krd.ts&lt;/code&gt;&lt;/a&gt;:&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 1: Kiro Writes the Test (Red)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// packages/core/src/application/use-cases/convert-fit-to-krd.test.ts&lt;/span&gt;
&lt;span class="nf"&gt;describe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;convertFitToKrd&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nf"&gt;it&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;should convert FIT workout to KRD format&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// Arrange&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;fitBuffer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;loadFitFixture&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;WorkoutIndividualSteps.fit&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;fitReader&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;createMockFitReader&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;validator&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;createSchemaValidator&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

    &lt;span class="c1"&gt;// Act&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;krd&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;convertFitToKrd&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;fitReader&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;validator&lt;/span&gt;&lt;span class="p"&gt;)({&lt;/span&gt;
      &lt;span class="nx"&gt;fitBuffer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="c1"&gt;// Assert&lt;/span&gt;
    &lt;span class="nf"&gt;expect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;krd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;version&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;toBe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;1.0&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="nf"&gt;expect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;krd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="kd"&gt;type&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;toBe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;workout&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="nf"&gt;expect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;krd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;extensions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;workout&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;steps&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;toHaveLength&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This test fails at first because &lt;code&gt;convertFitToKrd&lt;/code&gt; doesn't exist yet.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2: Kiro Writes Minimal Code (Green)
&lt;/h3&gt;

&lt;p&gt;Only after writing the test does Kiro implement the actual conversion function:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// packages/core/src/application/use-cases/convert-fit-to-krd.ts&lt;/span&gt;
&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;convertFitToKrd&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;
  &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;fitReader&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;FitReader&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;validator&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;SchemaValidator&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;
  &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;params&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;ConvertFitToKrdParams&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;KRD&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// Read FIT file&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;krd&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fitReader&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;params&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;fitBuffer&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="c1"&gt;// Validate result&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;errors&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;validator&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;validate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;krd&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;errors&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;KrdValidationError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Validation failed&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;errors&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="c1"&gt;// Return validated KRD&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;krd&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The test passes! But we're not done yet.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 3: We Refactor Together
&lt;/h3&gt;

&lt;p&gt;Now Kiro and I improve the code while keeping tests green. Kiro might suggest:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Extracting helper functions&lt;/li&gt;
&lt;li&gt;Improving error messages&lt;/li&gt;
&lt;li&gt;Adding edge case handling&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I review the suggestions, and Kiro applies the changes—all while the tests continue to pass.&lt;/p&gt;

&lt;h3&gt;
  
  
  The AAA Pattern
&lt;/h3&gt;

&lt;p&gt;All our tests follow the &lt;strong&gt;AAA pattern&lt;/strong&gt; (Arrange-Act-Assert):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="nf"&gt;it&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;should validate KRD against schema&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="c1"&gt;// Arrange&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;validator&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;createSchemaValidator&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;mockLogger&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;invalidKrd&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;version&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;1.0&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt; &lt;span class="c1"&gt;// missing required fields&lt;/span&gt;

  &lt;span class="c1"&gt;// Act&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;errors&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;validator&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;validate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;invalidKrd&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="c1"&gt;// Assert&lt;/span&gt;
  &lt;span class="nf"&gt;expect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;errors&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;toHaveLength&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="nf"&gt;expect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;errors&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nx"&gt;field&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;toBe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;type&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="nf"&gt;expect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;errors&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nx"&gt;field&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;toBe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;metadata&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This pattern makes tests easy to read and understand. The Arrange section sets up test data, the Act section calls the function, and the Assert section checks the result.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The result:&lt;/strong&gt; The code works the first time. No debugging sessions trying to figure out why something breaks.&lt;/p&gt;

&lt;p&gt;You can see the TDD guidelines I defined in &lt;a href="https://github.com/pablo-albaladejo/kaiord/blob/main/.kiro/steering/tdd.md" rel="noopener noreferrer"&gt;&lt;code&gt;.kiro/steering/tdd.md&lt;/code&gt;&lt;/a&gt;. Kiro follows these rules for every function in Kaiord.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Healthy Boundaries: Small Files, Happy Code
&lt;/h3&gt;

&lt;p&gt;Just like us, Kiro does better work when tasks are bite-sized. I've set up some helpful constraints that encourage us to keep things simple:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Keep files small&lt;/strong&gt; (≤ 100 lines)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Keep functions focused&lt;/strong&gt; (&amp;lt; 40 LOC)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Be specific&lt;/strong&gt; (NO &lt;code&gt;any&lt;/code&gt; types)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These aren't just rigid rules; they actually help Kiro write better, more maintainable code. When a file starts getting too long, Kiro notices and suggests, &lt;em&gt;"Hey, maybe we should split this up?"&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example:&lt;/strong&gt; When we were building the FIT converter, Kiro naturally organized the code into small, focused modules instead of one giant file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;adapters/fit/
├── garmin-fitsdk.ts          # Main entry (&amp;lt; 100 lines)
├── duration.mapper.ts         # Duration mapping (&amp;lt; 100 lines)
├── target.converter.ts        # Target conversion (&amp;lt; 100 lines)
└── workout.mapper.ts          # Workout mapping (&amp;lt; 100 lines)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each file stays under 100 lines, making the code easy to understand for both of us.&lt;/p&gt;

&lt;p&gt;These automated checks run in the background, gently nudging us back to the right path if we stray. &lt;strong&gt;The result:&lt;/strong&gt; A codebase that feels clean and manageable, no matter how much feature work we do.&lt;/p&gt;




&lt;h3&gt;
  
  
  Architecture Enforcement
&lt;/h3&gt;

&lt;p&gt;I defined hexagonal architecture rules in steering docs that Kiro enforces. The key rule:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Domain depends on nothing. Application depends only on domain and ports. Adapters implement ports.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;When Kiro generates code, it automatically:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Creates port interfaces (contracts)&lt;/li&gt;
&lt;li&gt;Implements adapters (external libraries)&lt;/li&gt;
&lt;li&gt;Writes use cases (business logic)&lt;/li&gt;
&lt;li&gt;Wires everything in providers (dependency injection)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If I accidentally try to import an adapter in a use case, Kiro warns me:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;⚠️ Architecture violation detected
   Application layer should not import adapters
   Import the port interface instead
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This keeps the codebase maintainable and testable. See &lt;a href="https://github.com/pablo-albaladejo/kaiord/blob/main/.kiro/steering/architecture.md" rel="noopener noreferrer"&gt;&lt;code&gt;.kiro/steering/architecture.md&lt;/code&gt;&lt;/a&gt; for the full rules I defined.&lt;/p&gt;




&lt;h2&gt;
  
  
  Real Examples from Kaiord
&lt;/h2&gt;

&lt;p&gt;Let me show you three real examples where spec-driven development with Kiro made a huge difference:&lt;/p&gt;

&lt;h3&gt;
  
  
  Example 1: FIT to KRD Conversion
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;The challenge:&lt;/strong&gt; Convert Garmin's binary FIT format to my JSON format without losing data.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The spec:&lt;/strong&gt; &lt;a href="https://github.com/pablo-albaladejo/kaiord/blob/main/.kiro/specs/core/fit-to-krd-conversion/requirements.md" rel="noopener noreferrer"&gt;&lt;code&gt;.kiro/specs/core/fit-to-krd-conversion/requirements.md&lt;/code&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What Kiro generated:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Port interface in &lt;a href="https://github.com/pablo-albaladejo/kaiord/blob/main/packages/core/src/ports/fit-reader.ts" rel="noopener noreferrer"&gt;&lt;code&gt;packages/core/src/ports/fit-reader.ts&lt;/code&gt;&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Adapter implementation in &lt;a href="https://github.com/pablo-albaladejo/kaiord/blob/main/packages/core/src/adapters/fit/garmin-fitsdk.ts" rel="noopener noreferrer"&gt;&lt;code&gt;packages/core/src/adapters/fit/garmin-fitsdk.ts&lt;/code&gt;&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Use case in &lt;a href="https://github.com/pablo-albaladejo/kaiord/blob/main/packages/core/src/application/use-cases/convert-fit-to-krd.ts" rel="noopener noreferrer"&gt;&lt;code&gt;packages/core/src/application/use-cases/convert-fit-to-krd.ts&lt;/code&gt;&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Complete test suite with 90%+ coverage&lt;/li&gt;
&lt;li&gt;Round-trip validation to ensure no data loss&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Time saved:&lt;/strong&gt; What would have taken me 2 weeks of trial-and-error took 3 days with clear specifications.&lt;/p&gt;

&lt;h3&gt;
  
  
  Example 2: Duration Types with Discriminated Unions
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;The challenge:&lt;/strong&gt; Support 14 different duration types (time, distance, calories, heart rate conditions, power conditions, etc.)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The spec:&lt;/strong&gt; &lt;a href="https://github.com/pablo-albaladejo/kaiord/blob/main/.kiro/specs/core/zod-schema-migration/design.md" rel="noopener noreferrer"&gt;&lt;code&gt;.kiro/specs/core/zod-schema-migration/design.md&lt;/code&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What Kiro generated:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// From https://github.com/pablo-albaladejo/kaiord/blob/main/packages/core/src/domain/schemas/duration.ts&lt;/span&gt;
&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;durationSchema&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;discriminatedUnion&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;type&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
  &lt;span class="c1"&gt;// Simple durations&lt;/span&gt;
  &lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;object&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;literal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;time&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="na"&gt;seconds&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;number&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;}),&lt;/span&gt;
  &lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;object&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;literal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;distance&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="na"&gt;meters&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;number&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;}),&lt;/span&gt;
  &lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;object&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;literal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;calories&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="na"&gt;calories&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;number&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;}),&lt;/span&gt;

  &lt;span class="c1"&gt;// Conditional durations&lt;/span&gt;
  &lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;object&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;literal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;heart_rate_less_than&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="na"&gt;heartRate&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;number&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="na"&gt;seconds&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;number&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;optional&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
  &lt;span class="p"&gt;}),&lt;/span&gt;
  &lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;object&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;literal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;power_greater_than&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="na"&gt;power&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;number&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="na"&gt;seconds&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;number&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;optional&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
  &lt;span class="p"&gt;}),&lt;/span&gt;

  &lt;span class="c1"&gt;// ... 14 types total&lt;/span&gt;
&lt;span class="p"&gt;]);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;The benefit:&lt;/strong&gt; TypeScript knows exactly which fields exist for each duration type. No more runtime errors from accessing wrong fields!&lt;/p&gt;

&lt;h3&gt;
  
  
  Example 3: Round-Trip Validation
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;The challenge:&lt;/strong&gt; Ensure conversions preserve data (FIT → KRD → FIT should match original).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The implementation:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// packages/core/src/application/use-cases/validate-round-trip.ts&lt;/span&gt;
&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;validateRoundTrip&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="nx"&gt;fitReader&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;FitReader&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;fitWriter&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;FitWriter&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;checker&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;ToleranceChecker&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;Logger&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;params&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;ValidateRoundTripParams&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="k"&gt;void&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="c1"&gt;// Convert FIT → KRD → FIT&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;krd&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fitReader&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;params&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;fitBuffer&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;convertedFit&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fitWriter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;krd&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="c1"&gt;// Check tolerances&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;deviations&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;checker&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;checkTolerances&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="nx"&gt;params&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;fitBuffer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nx"&gt;convertedFit&lt;/span&gt;
  &lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;deviations&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;ToleranceExceededError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Tolerances exceeded&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;deviations&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Tolerances:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Time: ±1 second&lt;/li&gt;
&lt;li&gt;Power: ±1 watt or ±1% FTP&lt;/li&gt;
&lt;li&gt;Heart Rate: ±1 bpm&lt;/li&gt;
&lt;li&gt;Cadence: ±1 rpm&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This ensures data integrity across all conversions.&lt;/p&gt;




&lt;h2&gt;
  
  
  Automation with Agent Hooks
&lt;/h2&gt;

&lt;p&gt;One of Kiro's most powerful features is agent hooks - automated workflows that trigger based on events.&lt;/p&gt;

&lt;p&gt;I created a hook that automatically generates GitHub pull requests from my specs. Here's how it works:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Hook configuration:&lt;/strong&gt; &lt;a href="https://github.com/pablo-albaladejo/kaiord/blob/main/.kiro/hooks/create-github-pr.kiro.hook" rel="noopener noreferrer"&gt;&lt;code&gt;.kiro/hooks/create-github-pr.kiro.hook&lt;/code&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;When I finish a feature, the hook:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Reads my spec from &lt;code&gt;.kiro/specs/[feature-name]/&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Gets the requirements, design, and tasks&lt;/li&gt;
&lt;li&gt;Generates a PR title from the main goal&lt;/li&gt;
&lt;li&gt;Fills the PR template with spec content&lt;/li&gt;
&lt;li&gt;Creates the PR using GitHub MCP (Model Context Protocol)&lt;/li&gt;
&lt;li&gt;Adds appropriate labels&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Time saved:&lt;/strong&gt; Creating a PR went from 10 minutes to 30 seconds.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bonus:&lt;/strong&gt; Every PR has complete documentation because it's generated from the spec!&lt;/p&gt;




&lt;h2&gt;
  
  
  The Numbers Tell the Story
&lt;/h2&gt;

&lt;p&gt;After building Kaiord with Kiro's spec-driven approach:&lt;/p&gt;

&lt;h3&gt;
  
  
  Code Quality
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;86.5% test coverage&lt;/strong&gt; - More than most production apps&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;300+ unit tests&lt;/strong&gt; - TDD from day one&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;55+ E2E tests&lt;/strong&gt; - Playwright for integration testing&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;0 &lt;code&gt;any&lt;/code&gt; types&lt;/strong&gt; - Complete type safety&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;550+ TypeScript files&lt;/strong&gt; - All following architecture rules&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Test Coverage by Package
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Core Package:&lt;/strong&gt; 88% (target: ≥80%)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Converters/Mappers:&lt;/strong&gt; 94% (target: ≥90%)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CLI Package:&lt;/strong&gt; 82% (target: ≥70%)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Web SPA:&lt;/strong&gt; 86.5% (target: ≥70%)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Kiro Integration
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;30+ specifications&lt;/strong&gt; in &lt;a href="https://github.com/pablo-albaladejo/kaiord/tree/main/.kiro/specs" rel="noopener noreferrer"&gt;&lt;code&gt;.kiro/specs/&lt;/code&gt;&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;18 steering documents&lt;/strong&gt; in &lt;a href="https://github.com/pablo-albaladejo/kaiord/tree/main/.kiro/steering" rel="noopener noreferrer"&gt;&lt;code&gt;.kiro/steering/&lt;/code&gt;&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;8 agent hooks&lt;/strong&gt; in &lt;a href="https://github.com/pablo-albaladejo/kaiord/tree/main/.kiro/hooks" rel="noopener noreferrer"&gt;&lt;code&gt;.kiro/hooks/&lt;/code&gt;&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;100% of code&lt;/strong&gt; generated with Kiro's help&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Production Ready
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;3 npm packages published:&lt;/strong&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://www.npmjs.com/package/@kaiord/core" rel="noopener noreferrer"&gt;@kaiord/core&lt;/a&gt; - Core library&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.npmjs.com/package/@kaiord/cli" rel="noopener noreferrer"&gt;@kaiord/cli&lt;/a&gt; - Command-line tool&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;strong&gt;1 web app deployed:&lt;/strong&gt; &lt;a href="https://pablo-albaladejo.github.io/kaiord/" rel="noopener noreferrer"&gt;Live demo&lt;/a&gt;
&lt;/li&gt;

&lt;li&gt;

&lt;strong&gt;5 CI/CD workflows&lt;/strong&gt; - Automated testing, linting, deployment&lt;/li&gt;

&lt;/ul&gt;




&lt;h2&gt;
  
  
  What I Learned
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Planning Saves Time
&lt;/h3&gt;

&lt;p&gt;Writing specs feels slow at first. But it's much faster than:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Rewriting code because requirements changed&lt;/li&gt;
&lt;li&gt;Debugging code you don't understand&lt;/li&gt;
&lt;li&gt;Explaining to the AI what you already explained 5 times&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Example:&lt;/strong&gt; The FIT conversion spec took 2 hours to write. It saved me 2 weeks of confusion.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Tests Give Confidence
&lt;/h3&gt;

&lt;p&gt;Following TDD means I know my code works. When I add features, the tests tell me immediately if I broke something.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example:&lt;/strong&gt; When I added support for swimming workouts, my existing tests caught 3 bugs before I even ran the app.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Architecture Matters
&lt;/h3&gt;

&lt;p&gt;Using hexagonal architecture (ports and adapters) made my code:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Easy to test (mock the ports)&lt;/li&gt;
&lt;li&gt;Easy to extend (add new adapters)&lt;/li&gt;
&lt;li&gt;Easy to understand (clear boundaries)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Example:&lt;/strong&gt; Adding TCX format support required zero changes to existing code - just a new adapter.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. AI Works Better with Structure
&lt;/h3&gt;

&lt;p&gt;Kiro's spec-driven approach gives AI the context it needs to help effectively. Instead of guessing, it implements what's documented.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Specs Save Time
&lt;/h3&gt;

&lt;p&gt;Clear specs mean:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Less confusion during implementation&lt;/li&gt;
&lt;li&gt;Fewer bugs from misunderstood requirements&lt;/li&gt;
&lt;li&gt;Better code reviews&lt;/li&gt;
&lt;li&gt;Easier onboarding for new developers&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  6. TDD Prevents Bugs
&lt;/h3&gt;

&lt;p&gt;Writing tests first catches bugs early. When we write a test, we think about:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What should happen in normal cases?&lt;/li&gt;
&lt;li&gt;What should happen in edge cases?&lt;/li&gt;
&lt;li&gt;What should happen when errors occur?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This thinking prevents many bugs before they're written.&lt;/p&gt;




&lt;h2&gt;
  
  
  Try It Yourself
&lt;/h2&gt;

&lt;p&gt;Want to see spec-driven development in action? Check out Kaiord:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Source code:&lt;/strong&gt; &lt;a href="https://github.com/pablo-albaladejo/kaiord" rel="noopener noreferrer"&gt;github.com/pablo-albaladejo/kaiord&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Live demo:&lt;/strong&gt; &lt;a href="https://pablo-albaladejo.github.io/kaiord/" rel="noopener noreferrer"&gt;pablo-albaladejo.github.io/kaiord&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Specifications:&lt;/strong&gt; Browse &lt;a href="https://github.com/pablo-albaladejo/kaiord/tree/main/.kiro/specs" rel="noopener noreferrer"&gt;&lt;code&gt;.kiro/specs/&lt;/code&gt;&lt;/a&gt; to see real examples&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Steering docs:&lt;/strong&gt; Read &lt;a href="https://github.com/pablo-albaladejo/kaiord/tree/main/.kiro/steering" rel="noopener noreferrer"&gt;&lt;code&gt;.kiro/steering/&lt;/code&gt;&lt;/a&gt; for architecture rules&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Key Files to Explore
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Specifications (see how Kiro plans):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://github.com/pablo-albaladejo/kaiord/blob/main/.kiro/specs/core/fit-to-krd-conversion/requirements.md" rel="noopener noreferrer"&gt;&lt;code&gt;.kiro/specs/core/fit-to-krd-conversion/requirements.md&lt;/code&gt;&lt;/a&gt; - How Kiro planned FIT conversion&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/pablo-albaladejo/kaiord/blob/main/.kiro/specs/core/zod-schema-migration/design.md" rel="noopener noreferrer"&gt;&lt;code&gt;.kiro/specs/core/zod-schema-migration/design.md&lt;/code&gt;&lt;/a&gt; - Schema-first approach&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Steering Docs (see the rules Kiro follows):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://github.com/pablo-albaladejo/kaiord/blob/main/.kiro/steering/architecture.md" rel="noopener noreferrer"&gt;&lt;code&gt;.kiro/steering/architecture.md&lt;/code&gt;&lt;/a&gt; - Hexagonal architecture rules&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/pablo-albaladejo/kaiord/blob/main/.kiro/steering/tdd.md" rel="noopener noreferrer"&gt;&lt;code&gt;.kiro/steering/tdd.md&lt;/code&gt;&lt;/a&gt; - Test-driven development workflow&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/pablo-albaladejo/kaiord/blob/main/.kiro/steering/zod-patterns.md" rel="noopener noreferrer"&gt;&lt;code&gt;.kiro/steering/zod-patterns.md&lt;/code&gt;&lt;/a&gt; - Schema-first patterns&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Implementation (see the code Kiro generated):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://github.com/pablo-albaladejo/kaiord/blob/main/packages/core/src/domain/schemas/duration.ts" rel="noopener noreferrer"&gt;&lt;code&gt;packages/core/src/domain/schemas/duration.ts&lt;/code&gt;&lt;/a&gt; - Discriminated unions with Zod&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/pablo-albaladejo/kaiord/blob/main/packages/core/src/application/use-cases/convert-fit-to-krd.ts" rel="noopener noreferrer"&gt;&lt;code&gt;packages/core/src/application/use-cases/convert-fit-to-krd.ts&lt;/code&gt;&lt;/a&gt; - Clean use case&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/pablo-albaladejo/kaiord/blob/main/packages/core/src/ports/fit-reader.ts" rel="noopener noreferrer"&gt;&lt;code&gt;packages/core/src/ports/fit-reader.ts&lt;/code&gt;&lt;/a&gt; - Port interface&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/pablo-albaladejo/kaiord/blob/main/packages/core/src/adapters/fit/garmin-fitsdk.ts" rel="noopener noreferrer"&gt;&lt;code&gt;packages/core/src/adapters/fit/garmin-fitsdk.ts&lt;/code&gt;&lt;/a&gt; - Adapter implementation&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;Building Kaiord taught me that &lt;strong&gt;AI doesn't replace thinking - it amplifies it&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;With Kiro's spec-driven approach:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;I think more clearly about what I need&lt;/li&gt;
&lt;li&gt;I document decisions before coding&lt;/li&gt;
&lt;li&gt;I test continuously (TDD)&lt;/li&gt;
&lt;li&gt;I maintain clean architecture&lt;/li&gt;
&lt;li&gt;AI implements what I specify&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The result? Production-ready code that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Works reliably (86.5% test coverage)&lt;/li&gt;
&lt;li&gt;Makes sense (clean architecture)&lt;/li&gt;
&lt;li&gt;Can grow (extensible design)&lt;/li&gt;
&lt;li&gt;Helps others (open source)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Spec-driven development isn't slower - it's smarter.&lt;/strong&gt; The time you spend planning pays back 10x in code that works the first time.&lt;/p&gt;




&lt;h2&gt;
  
  
  Resources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Kaiord Repository:&lt;/strong&gt; &lt;a href="https://github.com/pablo-albaladejo/kaiord" rel="noopener noreferrer"&gt;github.com/pablo-albaladejo/kaiord&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Live Web App:&lt;/strong&gt; &lt;a href="https://pablo-albaladejo.github.io/kaiord/" rel="noopener noreferrer"&gt;pablo-albaladejo.github.io/kaiord&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;npm Packages:&lt;/strong&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.npmjs.com/package/@kaiord/core" rel="noopener noreferrer"&gt;@kaiord/core&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.npmjs.com/package/@kaiord/cli" rel="noopener noreferrer"&gt;@kaiord/cli&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;strong&gt;Kiro IDE:&lt;/strong&gt; &lt;a href="https://kiro.dev" rel="noopener noreferrer"&gt;kiro.dev&lt;/a&gt;
&lt;/li&gt;

&lt;li&gt;

&lt;strong&gt;Kiroween Hackathon:&lt;/strong&gt; &lt;a href="https://kiroween.devpost.com/" rel="noopener noreferrer"&gt;kiroween.devpost.com&lt;/a&gt;
&lt;/li&gt;

&lt;/ul&gt;




</description>
      <category>kiro</category>
      <category>typescript</category>
      <category>testing</category>
      <category>ai</category>
    </item>
  </channel>
</rss>
