<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Sonia Rahal</title>
    <description>The latest articles on Forem by Sonia Rahal (@soniarahal).</description>
    <link>https://forem.com/soniarahal</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3671321%2F4e3e8b64-a9ba-4665-b02f-a34e6365adbc.jpg</url>
      <title>Forem: Sonia Rahal</title>
      <link>https://forem.com/soniarahal</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/soniarahal"/>
    <language>en</language>
    <item>
      <title>AWS re:Invent 2025 Montreal Recap: 6 Lightning Demos That Actually Change How You Build</title>
      <dc:creator>Sonia Rahal</dc:creator>
      <pubDate>Fri, 23 Jan 2026 08:01:17 +0000</pubDate>
      <link>https://forem.com/soniarahal/aws-reinvent-2025-montreal-recap-6-lightning-demos-that-actually-change-how-you-build-4a0o</link>
      <guid>https://forem.com/soniarahal/aws-reinvent-2025-montreal-recap-6-lightning-demos-that-actually-change-how-you-build-4a0o</guid>
      <description>&lt;p&gt;I went to a local re:Invent recap meetup in Montreal on January 15, expecting a high-level overview of AWS announcements.&lt;/p&gt;

&lt;p&gt;What I got instead was something much better.&lt;/p&gt;

&lt;p&gt;Six speakers each had ten minutes to demo one concrete feature they were genuinely excited about; not slides, not marketing talk, but &lt;em&gt;“here’s what it does and why it changes things.”&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;I’m deeply curious about cloud computing and how modern systems are actually built, so this format really worked for me. It wasn’t a deep dive into internals, but it also wasn’t vague or fluffy. It sat in a sweet spot: specific enough to understand what’s new and why it matters, without needing to already be an AWS specialist.&lt;/p&gt;

&lt;p&gt;Here’s a recap of the six features that stood out most - and how they fit into a much bigger shift AWS is making.&lt;/p&gt;




&lt;h2&gt;
  
  
  1) AWS DevOps Agent; AI That Investigates Incidents With You
&lt;/h2&gt;

&lt;p&gt;The first demo showed AWS DevOps Agent, a new AI-powered operational assistant (currently in preview) designed to help teams investigate incidents and find root causes faster.&lt;/p&gt;

&lt;p&gt;Instead of just alerting you that &lt;em&gt;“something is broken,”&lt;/em&gt; the agent actually tries to understand why.&lt;/p&gt;

&lt;p&gt;In the demo, the speaker intentionally broke a Lambda function by misconfiguring its handler. The DevOps Agent:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Detected errors from logs and metrics&lt;/li&gt;
&lt;li&gt;Pulled configuration history&lt;/li&gt;
&lt;li&gt;Built a timeline of what changed&lt;/li&gt;
&lt;li&gt;Mapped dependencies between services&lt;/li&gt;
&lt;li&gt;Suggested the most likely root cause&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It also builds an application topology; basically a live map of how your Lambdas, databases, pipelines, and services connect. So it can reason about blast radius and downstream impact.&lt;/p&gt;

&lt;p&gt;What made this feel different from normal observability tooling is that you can interact with the investigation:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Ask it follow-up questions&lt;/li&gt;
&lt;li&gt;Tell it where else to look&lt;/li&gt;
&lt;li&gt;Have it post findings to Slack or ServiceNow&lt;/li&gt;
&lt;li&gt;Auto-generate AWS Support cases with context attached&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It feels like AWS is trying to turn operations from &lt;em&gt;“alert + panic + dashboards”&lt;/em&gt; into &lt;em&gt;“alert + guided diagnosis + suggested fix.”&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm9e5radeg0mggjef1mnj.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm9e5radeg0mggjef1mnj.jpeg" alt="Image of the architecture of a DevOps Agent" width="800" height="511"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  2) AWS Transform; AI-Guided Codebase Migration That Isn’t Reckless
&lt;/h2&gt;

&lt;p&gt;The second demo focused on AWS Transform, an AI-powered tool for modernizing large codebases.&lt;/p&gt;

&lt;p&gt;This isn’t just &lt;em&gt;“throw your repo into ChatGPT and pray.”&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;You run it from a CLI, tell it what kind of migration you want (for example: Node.js 16 → Node.js 20, or AWS SDK v1 → v2), and it:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Scans your repository&lt;/li&gt;
&lt;li&gt;Applies a guided refactor across files&lt;/li&gt;
&lt;li&gt;Lets you attach context like:

&lt;ul&gt;
&lt;li&gt;“Don’t break this logging framework”&lt;/li&gt;
&lt;li&gt;“Preserve backward compatibility for this API”&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;Requires a verification command to pass (like &lt;code&gt;npm test&lt;/code&gt; or &lt;code&gt;mvn verify&lt;/code&gt;)
If the tests fail, the migration is considered unsuccessful.&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;What stood out to me was how seriously correctness is treated. This is closer to a controlled migration pipeline than a one-shot AI rewrite.&lt;/p&gt;

&lt;p&gt;The speaker referenced two real AWS case studies:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Air Canada:&lt;/strong&gt; migrated ~1,000 Lambda functions to a new Node.js runtime
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Twitch:&lt;/strong&gt; migrated ~913 Go repositories from AWS SDK v1 → v2
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Saving ~2,800 developer-days.&lt;/p&gt;

&lt;p&gt;The bigger idea here isn’t just faster refactors. It’s compressing years of technical debt cleanup into weeks.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fawhjy4tfqiokke2vvg8j.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fawhjy4tfqiokke2vvg8j.png" alt="Image of comparison of AWS Transform to other competitors" width="800" height="634"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  3) SageMaker Studio; Becoming the Front Door for All Data + AI
&lt;/h2&gt;

&lt;p&gt;The third demo showed the new version of Amazon SageMaker Studio and how AWS is trying to turn it into a single workspace for everything data and AI-related.&lt;/p&gt;

&lt;p&gt;Three concrete things stood out:&lt;/p&gt;

&lt;h3&gt;
  
  
  Built-in Data Catalog + Discovery
&lt;/h3&gt;

&lt;p&gt;Inside Studio, teams can now browse:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Datasets&lt;/li&gt;
&lt;li&gt;Tables&lt;/li&gt;
&lt;li&gt;Models&lt;/li&gt;
&lt;li&gt;Notebooks&lt;/li&gt;
&lt;li&gt;Pipelines&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each asset can include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Documentation&lt;/li&gt;
&lt;li&gt;Auto-generated descriptions (via Amazon Q)&lt;/li&gt;
&lt;li&gt;Metadata&lt;/li&gt;
&lt;li&gt;Data quality indicators&lt;/li&gt;
&lt;li&gt;Lineage info&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This makes it possible to build a real internal &lt;em&gt;“marketplace”&lt;/em&gt; for data and models instead of everything living in random S3 buckets.&lt;/p&gt;

&lt;h3&gt;
  
  
  Querying + Notebooks Without Leaving Studio
&lt;/h3&gt;

&lt;p&gt;You can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Browse tables&lt;/li&gt;
&lt;li&gt;Run SQL queries (powered by Athena)&lt;/li&gt;
&lt;li&gt;Preview datasets&lt;/li&gt;
&lt;li&gt;Open Jupyter notebooks&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All from one UI.&lt;br&gt;&lt;br&gt;
Amazon Q is embedded directly into notebooks. In the demo, the speaker:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Asked Q to generate SQL&lt;/li&gt;
&lt;li&gt;Asked Q to generate Python&lt;/li&gt;
&lt;li&gt;Asked Q to generate a Matplotlib chart&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This turns notebooks into an AI-assisted analysis environment instead of a blank coding surface.&lt;/p&gt;

&lt;h3&gt;
  
  
  Serverless Airflow Built Into Studio
&lt;/h3&gt;

&lt;p&gt;Studio now integrates Amazon Managed Workflows for Apache Airflow in a serverless form.&lt;/p&gt;

&lt;p&gt;That means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;No control plane to manage&lt;/li&gt;
&lt;li&gt;No always-on cluster cost&lt;/li&gt;
&lt;li&gt;Native UI integration&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You can build:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Training pipelines&lt;/li&gt;
&lt;li&gt;Evaluation pipelines&lt;/li&gt;
&lt;li&gt;ML workflows&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Directly inside Studio.&lt;br&gt;&lt;br&gt;
It collapses notebooks, orchestration, and ML tooling into one place.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm2fsv1coefy5t26quong.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm2fsv1coefy5t26quong.jpeg" alt="Image of SageMaker Studio catalog and metadata" width="800" height="619"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  4) Durable Lambda; Serverless That Can Finally Wait
&lt;/h2&gt;

&lt;p&gt;Traditional Lambda breaks down for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Long workflows&lt;/li&gt;
&lt;li&gt;Human approvals&lt;/li&gt;
&lt;li&gt;External callbacks&lt;/li&gt;
&lt;li&gt;Multi-step orchestration&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So people end up wiring together Step Functions + DynamoDB + retry logic.&lt;/p&gt;

&lt;p&gt;AWS now added Durable Lambda primitives:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Wait&lt;/strong&gt;; Pause execution without paying compute (up to one year)
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Checkpoint&lt;/strong&gt;; Persist state so retries resume from the same point
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Wait for Callback&lt;/strong&gt;; Send a token to an external system and resume when it returns&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;How it works in practice:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Create a Durable Lambda function in the AWS console.
&lt;/li&gt;
&lt;li&gt;AWS automatically manages the underlying state storage — no DynamoDB or S3 setup needed.
&lt;/li&gt;
&lt;li&gt;Function runtime can pause and resume at checkpoints or callback points.
&lt;/li&gt;
&lt;li&gt;Retry logic is built-in and safe: the function won’t duplicate payments or actions.
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;In the demo, the workflow looked like this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Reserve inventory
&lt;/li&gt;
&lt;li&gt;Checkpoint
&lt;/li&gt;
&lt;li&gt;Process payment
&lt;/li&gt;
&lt;li&gt;Checkpoint
&lt;/li&gt;
&lt;li&gt;Wait 15 minutes for user payment
&lt;/li&gt;
&lt;li&gt;Resume
&lt;/li&gt;
&lt;li&gt;Ship product&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;No Step Functions. No external state store.&lt;/p&gt;

&lt;p&gt;Retries also become safe:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;No duplicate payments&lt;/li&gt;
&lt;li&gt;No double reservations&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is also perfect for AI workflows:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Waiting for long LLM calls&lt;/li&gt;
&lt;li&gt;Waiting for human-in-the-loop approvals&lt;/li&gt;
&lt;li&gt;Waiting for batch embedding jobs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All without paying for idle compute.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftyusfsdq4aqw2lf4htxv.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftyusfsdq4aqw2lf4htxv.jpeg" alt="Image of Durable Lambda with AWS console" width="800" height="654"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  5) Lambda on EC2 Capacity Providers; Serverless Without Cold Starts
&lt;/h2&gt;

&lt;p&gt;Lambda can now run on &lt;strong&gt;AWS-managed EC2 instances&lt;/strong&gt;, giving you more control and eliminating cold starts.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How it works:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Create a &lt;strong&gt;capacity provider&lt;/strong&gt; in Lambda — AWS provisions and manages EC2 instances for you.
&lt;/li&gt;
&lt;li&gt;Configure instance type, CPU, memory, and architecture (GPU support coming).
&lt;/li&gt;
&lt;li&gt;Lambda functions run on these pre-warmed instances for predictable performance.
&lt;/li&gt;
&lt;li&gt;AWS handles patching, scaling, and lifecycle management — no SSH or instance management needed.
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Benefits:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Always-warm environments&lt;/li&gt;
&lt;li&gt;No cold starts&lt;/li&gt;
&lt;li&gt;Control over instance types, CPU, memory&lt;/li&gt;
&lt;li&gt;Multi-concurrency per vCPU
(GPU support planned)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;AWS still manages the instances - you can’t SSH or patch anything - but you get predictable performance and much better economics at scale.&lt;/p&gt;

&lt;p&gt;Pricing example from the demo:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;100M requests / month
&lt;/li&gt;
&lt;li&gt;20ms runtime
&lt;/li&gt;
&lt;li&gt;Default Lambda: ~$3,000/month
&lt;/li&gt;
&lt;li&gt;Lambda on EC2: ~$431/month&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That’s a massive difference for high-throughput APIs or inference endpoints.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9qemud9fbpba2hgy44lk.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9qemud9fbpba2hgy44lk.jpeg" alt="Image of capacity provider creation demo" width="800" height="537"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  6) S3 Vectors Vector Storage at Object-Store Scale
&lt;/h2&gt;

&lt;p&gt;The last demo started by explaining &lt;strong&gt;what vectors are&lt;/strong&gt; and why they matter for modern AI workflows.  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Vectors are numeric representations of data (like text, images, or embeddings) that let models compute similarity, find nearest neighbors, or perform semantic search.
&lt;/li&gt;
&lt;li&gt;Modern AI applications - RAG pipelines, recommendation systems, search engines - rely heavily on vectors.
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The problem today: most vector databases are expensive, always-on, and operationally heavy.  &lt;/p&gt;

&lt;p&gt;AWS’s solution: &lt;strong&gt;S3 Vector Buckets&lt;/strong&gt;.  &lt;/p&gt;

&lt;p&gt;Vector Buckets are a new type of S3 bucket optimized for storing embeddings. They allow you to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Store embeddings directly in S3
&lt;/li&gt;
&lt;li&gt;Create vector indexes
&lt;/li&gt;
&lt;li&gt;Run approximate nearest-neighbor (ANN) search
&lt;/li&gt;
&lt;li&gt;Use them in RAG pipelines, Bedrock, and SageMaker
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Why S3 Vector Buckets make sense:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Scalability: billions of vectors at object-store scale
&lt;/li&gt;
&lt;li&gt;Cost: much cheaper than always-on vector DBs
&lt;/li&gt;
&lt;li&gt;Durability: inherits S3 reliability
&lt;/li&gt;
&lt;li&gt;Integration: works natively with other AWS services
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Trade-off: higher latency than specialized vector databases like Pinecone or OpenSearch.  &lt;/p&gt;

&lt;p&gt;Ideal use cases:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Knowledge bases
&lt;/li&gt;
&lt;li&gt;Large-scale RAG corpora
&lt;/li&gt;
&lt;li&gt;Offline or batch semantic search&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkloug33pt83hnlsfqidi.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkloug33pt83hnlsfqidi.png" alt="Image of vector bucket creation demo" width="800" height="581"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The Bigger Pattern I Took Away
&lt;/h2&gt;

&lt;p&gt;Across all six demos, a clear pattern emerged.  &lt;/p&gt;

&lt;p&gt;AWS is collapsing entire categories of glue infrastructure.  &lt;/p&gt;

&lt;p&gt;What used to require:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Step Functions
&lt;/li&gt;
&lt;li&gt;DynamoDB state tables
&lt;/li&gt;
&lt;li&gt;Vector databases
&lt;/li&gt;
&lt;li&gt;Orchestration clusters
&lt;/li&gt;
&lt;li&gt;Custom internal catalogs
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Now lives inside:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;SageMaker Studio
&lt;/li&gt;
&lt;li&gt;Durable Lambda
&lt;/li&gt;
&lt;li&gt;S3 Vectors
&lt;/li&gt;
&lt;li&gt;Lambda on EC2
&lt;/li&gt;
&lt;li&gt;Serverless Airflow
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It’s not flashy, but it quietly changes what “simple architecture” even means in 2025.&lt;/p&gt;




&lt;h2&gt;
  
  
  Final Note
&lt;/h2&gt;

&lt;p&gt;The meetup ended with an amazing giveaway.  &lt;/p&gt;

&lt;p&gt;Pure luck, I won. And so did the two people next to me.  &lt;/p&gt;

&lt;p&gt;So maybe that same luck carries over to you reading this: hope one of these features ends up being exactly what unlocks your next project.&lt;/p&gt;

</description>
      <category>aws</category>
      <category>ai</category>
      <category>serverless</category>
      <category>cloud</category>
    </item>
    <item>
      <title>When GPU Compute Moves Closer to Users: Rethinking CPU↔GPU Boundaries in Cloud Architecture</title>
      <dc:creator>Sonia Rahal</dc:creator>
      <pubDate>Tue, 06 Jan 2026 17:01:55 +0000</pubDate>
      <link>https://forem.com/soniarahal/when-gpu-compute-moves-closer-to-users-rethinking-cpu-gpu-boundaries-in-cloud-architecture-2dpm</link>
      <guid>https://forem.com/soniarahal/when-gpu-compute-moves-closer-to-users-rethinking-cpu-gpu-boundaries-in-cloud-architecture-2dpm</guid>
      <description>&lt;h2&gt;
  
  
  Intro
&lt;/h2&gt;

&lt;p&gt;Following my &lt;a href="https://dev.to/soniv/amazon-ec2-g5-instances-now-available-in-asia-pacific-hong-kong-m1b"&gt;previous post&lt;/a&gt; on the availability of GPU cloud instances in new regions (Hong Kong), I became curious about the &lt;strong&gt;bottlenecks and architectural implications&lt;/strong&gt; when GPU compute moves closer to users. As cloud providers expand GPU availability, assumptions about CPU↔GPU boundaries in cloud VMs are starting to break.&lt;/p&gt;

&lt;p&gt;GPU-accelerated cloud compute is expanding rapidly as AI, ML, real-time graphics, and simulations become more central to modern applications. Historically, GPU instances were limited to a few regions, creating a mental model where GPUs were &lt;strong&gt;centralized accelerators&lt;/strong&gt;, and CPU↔GPU interactions were a controlled, high-latency boundary.&lt;/p&gt;

&lt;p&gt;In this post, I’ll explore what changes when GPUs move closer to users, why the CPU↔GPU boundary matters architecturally, and what design considerations engineers should keep in mind.&lt;/p&gt;




&lt;h2&gt;
  
  
  What is the CPU↔GPU Boundary?
&lt;/h2&gt;

&lt;p&gt;At a high level, the CPU↔GPU boundary defines:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;CPU responsibilities:&lt;/strong&gt; control flow, scheduling, orchestration, I/O, system calls&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GPU responsibilities:&lt;/strong&gt; parallel computation, vectorized operations, specialized kernels&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Data transfer:&lt;/strong&gt; CPU memory ↔ GPU memory via PCIe (Peripheral Component Interconnect Express)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Traditionally in cloud VMs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;GPU resources were centralized and scarce&lt;/li&gt;
&lt;li&gt;Workloads were batch-oriented and tolerant of latency&lt;/li&gt;
&lt;li&gt;CPU↔GPU transfers happened infrequently and in large chunks&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This boundary dictated &lt;strong&gt;service decomposition&lt;/strong&gt;, &lt;strong&gt;batching strategies&lt;/strong&gt;, and &lt;strong&gt;elasticity planning&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  How CPU↔GPU Interactions Work (PCIe &amp;amp; Coding Example)
&lt;/h2&gt;

&lt;p&gt;The CPU↔GPU boundary is &lt;strong&gt;implemented via PCIe&lt;/strong&gt;, which moves data between the CPU and GPU memory (VRAM). GPU frameworks like CUDA, PyTorch, or TensorFlow handle these transfers automatically.&lt;/p&gt;

&lt;p&gt;Here’s an example in Python using PyTorch:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;

&lt;span class="c1"&gt;# create data on CPU
&lt;/span&gt;&lt;span class="n"&gt;x_cpu&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;randn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1024&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1024&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# move data to GPU via PCIe
&lt;/span&gt;&lt;span class="n"&gt;x_gpu&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;x_cpu&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cuda&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# computation now happens on GPU
&lt;/span&gt;&lt;span class="n"&gt;y_gpu&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;x_gpu&lt;/span&gt; &lt;span class="o"&gt;@&lt;/span&gt; &lt;span class="n"&gt;x_gpu&lt;/span&gt;  &lt;span class="c1"&gt;# matrix multiplication
&lt;/span&gt;
&lt;span class="c1"&gt;# bring result back to CPU
&lt;/span&gt;&lt;span class="n"&gt;y_cpu&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;y_gpu&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cpu&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;.to("cuda")&lt;/code&gt; triggers the PCIe transfer.
&lt;/li&gt;
&lt;li&gt;GPU computation is fast, but PCIe transfers have &lt;strong&gt;limited bandwidth&lt;/strong&gt; and &lt;strong&gt;non-negligible latency&lt;/strong&gt;.
&lt;/li&gt;
&lt;li&gt;Frequent small transfers can &lt;strong&gt;bottleneck performance&lt;/strong&gt;, especially for interactive workloads.
&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  Why PCIe Can Be a Bottleneck
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Limited bandwidth:&lt;/strong&gt; PCIe Gen 4 tops out around 16 GB/s per lane; fast, but small relative to GPU compute speed.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Latency for interactive workloads:&lt;/strong&gt; Small, frequent transfers amplify CPU↔GPU latency.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multiple GPUs:&lt;/strong&gt; Each GPU has its own PCIe link; scaling horizontally increases potential bottlenecks.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Elastic cloud instances:&lt;/strong&gt; Each new GPU instance defines a &lt;strong&gt;new CPU↔GPU boundary&lt;/strong&gt;, making scheduling more complex.
&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  Why Regional GPU Availability Matters
&lt;/h3&gt;

&lt;p&gt;When cloud providers launch GPUs in more regions:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;GPUs are &lt;strong&gt;physically closer to end-users and storage&lt;/strong&gt;, reducing network latency.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Interactive applications&lt;/strong&gt; (AI inference, simulations, rendering) benefit because network latency no longer dominates total response time.
&lt;/li&gt;
&lt;li&gt;Scaling workloads becomes more flexible; elastic GPU instances can spin up &lt;strong&gt;closer to data&lt;/strong&gt;.
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Architectural implication:&lt;/strong&gt;  &lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The CPU↔GPU boundary is no longer just “how fast PCIe moves data,” but “how far is the data from the CPU↔GPU interface in the first place?”  &lt;/p&gt;
&lt;/blockquote&gt;




&lt;h3&gt;
  
  
  Conceptual Diagram
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;      User / Data Source
             │
             ▼
       Regional Network
             │
    +--------+--------+
    |       CPU       |
    | Control / I/O   |
    +--------+--------+
             │ PCIe transfer
             ▼
    +--------+--------+
    |       GPU       |
    | Parallel Compute|
    +----------------+
             │
           VRAM
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Adding more regions moves the CPU↔GPU block closer to users/data, reducing network latency.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;PCIe remains a bottleneck inside the VM, but overall system latency decreases.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Architectural Implications
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Lower Latency Matters
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Previously, sending data to a distant GPU was negligible for batch workloads.
&lt;/li&gt;
&lt;li&gt;Regional GPUs make &lt;strong&gt;interactive workloads latency-sensitive&lt;/strong&gt;.
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  GPU Workloads Become More Interactive
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Smaller, frequent GPU calls are now feasible.
&lt;/li&gt;
&lt;li&gt;GPUs participate directly in request paths rather than only batch jobs.
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Elasticity Changes Design Choices
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Each new GPU instance introduces a new CPU↔GPU boundary.
&lt;/li&gt;
&lt;li&gt;Architects must ask: move data to GPU or move workload to data?
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Data Locality Becomes Critical
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Moving data across regions may cost more than computation.
&lt;/li&gt;
&lt;li&gt;CPU↔GPU transfers must be considered alongside storage and network placement.
&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  Bottlenecks to Watch
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Bottleneck&lt;/th&gt;
&lt;th&gt;Traditional Model&lt;/th&gt;
&lt;th&gt;Regional GPU Model&lt;/th&gt;
&lt;th&gt;Implication&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;PCIe Bandwidth&lt;/td&gt;
&lt;td&gt;Large infrequent transfers&lt;/td&gt;
&lt;td&gt;Frequent smaller transfers&lt;/td&gt;
&lt;td&gt;May limit interactive performance&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Latency&lt;/td&gt;
&lt;td&gt;Batch-tolerant&lt;/td&gt;
&lt;td&gt;Sensitive, local GPU&lt;/td&gt;
&lt;td&gt;Requires redesigned request paths&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Elasticity&lt;/td&gt;
&lt;td&gt;Rare, long-running&lt;/td&gt;
&lt;td&gt;Frequent scaling&lt;/td&gt;
&lt;td&gt;Complex scheduling and data partitioning&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Data Gravity&lt;/td&gt;
&lt;td&gt;Centralized storage&lt;/td&gt;
&lt;td&gt;Regional GPUs&lt;/td&gt;
&lt;td&gt;Must rethink storage placement and pipeline design&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Redefine the CPU↔GPU contract:&lt;/strong&gt; GPUs are local compute primitives, not just accelerators.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Plan for latency-sensitive workloads:&lt;/strong&gt; Micro-batching, asynchronous pipelines, and request scheduling matter.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Design for dynamic boundaries:&lt;/strong&gt; Elastic GPU instances change how workloads are partitioned.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Consider regional data placement:&lt;/strong&gt; Moving computation to data can outperform moving data to GPUs.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Monitor new bottlenecks:&lt;/strong&gt; PCIe, memory bandwidth, and network congestion may become critical in new architectures.
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Discussion / Next Steps
&lt;/h2&gt;

&lt;p&gt;Regional GPU availability is changing cloud design assumptions. Engineers and architects should ask:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;When does regional GPU placement actually improve performance or reduce cost?
&lt;/li&gt;
&lt;li&gt;Which workloads remain centralized, and which move closer to users?
&lt;/li&gt;
&lt;li&gt;How should elasticity, PCIe, and network bottlenecks factor into architecture diagrams?
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Closing
&lt;/h2&gt;

&lt;p&gt;Cloud GPUs are no longer distant, static resources. As they move closer to users and data, they force us to rethink &lt;strong&gt;how compute is distributed, how workloads are scheduled, and how architectural assumptions evolve&lt;/strong&gt;. Understanding these shifts now will help engineers design &lt;strong&gt;more resilient, scalable, and efficient cloud systems.&lt;/strong&gt;&lt;/p&gt;

</description>
      <category>cloud</category>
      <category>cuda</category>
      <category>gpu</category>
      <category>ai</category>
    </item>
    <item>
      <title>Amazon EC2 G5 Instances Now Available in Asia Pacific (Hong Kong)</title>
      <dc:creator>Sonia Rahal</dc:creator>
      <pubDate>Tue, 06 Jan 2026 04:56:12 +0000</pubDate>
      <link>https://forem.com/soniarahal/amazon-ec2-g5-instances-now-available-in-asia-pacific-hong-kong-m1b</link>
      <guid>https://forem.com/soniarahal/amazon-ec2-g5-instances-now-available-in-asia-pacific-hong-kong-m1b</guid>
      <description>&lt;p&gt;&lt;strong&gt;Today, AWS makes Amazon EC2 G5 instances available in the Asia Pacific (Hong Kong) Region&lt;/strong&gt;, expanding access to &lt;strong&gt;GPU-powered compute&lt;/strong&gt; for customers running &lt;strong&gt;graphics-intensive and machine learning workloads&lt;/strong&gt; in Asia Pacific.&lt;/p&gt;

&lt;p&gt;This post explains what &lt;strong&gt;EC2&lt;/strong&gt; and &lt;strong&gt;G5 instances&lt;/strong&gt; are and shows how to &lt;strong&gt;launch a G5 instance using code&lt;/strong&gt;, along with key details about GPU usage, PCIe, and regional context.&lt;/p&gt;




&lt;h2&gt;
  
  
  GPU Cloud Trends
&lt;/h2&gt;

&lt;p&gt;GPU-accelerated cloud computing is growing rapidly as &lt;strong&gt;AI, machine learning, and real-time graphics workloads&lt;/strong&gt; become central to modern applications. Cloud GPU instances like &lt;strong&gt;EC2 G5&lt;/strong&gt; let teams scale high-performance compute &lt;strong&gt;without owning physical hardware&lt;/strong&gt;, supporting workloads across AI, media, research, simulation, and more.&lt;/p&gt;




&lt;h2&gt;
  
  
  What EC2 Is
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Amazon EC2&lt;/strong&gt; provides virtual machines in the cloud that you control like physical servers. Each instance is defined by:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/AMIs.html" rel="noopener noreferrer"&gt;AMI&lt;/a&gt; (Amazon Machine Image)&lt;/strong&gt; — a template including the operating system, pre-installed software, and default settings
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Instance type&lt;/strong&gt; — CPU, memory, networking, GPU
&lt;/li&gt;
&lt;li&gt;Storage and network configuration
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;EC2 is called &lt;strong&gt;“Elastic”&lt;/strong&gt; because its capacity can &lt;strong&gt;expand or shrink based on demand&lt;/strong&gt;. You can launch many instances when workloads spike, or terminate them when they’re no longer needed. If demand is high, you can instantly scale up — &lt;strong&gt;elastic&lt;/strong&gt;. If demand is steady, you can run a minimal setup — &lt;strong&gt;inelastic&lt;/strong&gt;.  &lt;/p&gt;

&lt;p&gt;For &lt;strong&gt;GPU workloads&lt;/strong&gt;, this flexibility is especially useful:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Spin up G5 instances &lt;strong&gt;on-demand&lt;/strong&gt; for bursty tasks like AI training or video rendering
&lt;/li&gt;
&lt;li&gt;Use &lt;strong&gt;reserved G5 instances&lt;/strong&gt; for continuous workloads like inference or simulations
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Launching a G5 Instance (Example Code)
&lt;/h2&gt;

&lt;p&gt;Instances can be launched via the console or programmatically. Using Python (&lt;code&gt;boto3&lt;/code&gt;):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;

&lt;span class="n"&gt;ec2&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;resource&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ec2&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;instance&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ec2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_instances&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;ImageId&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ami-12345678&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# AMI = Amazon Machine Image (OS + software template)
&lt;/span&gt;    &lt;span class="n"&gt;InstanceType&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;g5.xlarge&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;MinCount&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;MaxCount&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;instance&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here, &lt;code&gt;g5.xlarge&lt;/code&gt; launches a &lt;strong&gt;virtual machine with a GPU attached&lt;/strong&gt;.  &lt;/p&gt;

&lt;p&gt;👉 &lt;a href="https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/LaunchingAndUsingInstances.html" rel="noopener noreferrer"&gt;EC2 Launch Guide&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  What “G5” Means
&lt;/h2&gt;

&lt;p&gt;The &lt;strong&gt;G&lt;/strong&gt; in G5 stands for &lt;strong&gt;GPU / Graphics&lt;/strong&gt;, indicating that these instances are optimized for &lt;strong&gt;GPU-accelerated workloads&lt;/strong&gt;.  &lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;5&lt;/strong&gt; represents the &lt;strong&gt;generation&lt;/strong&gt; of the GPU instance family:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;G4&lt;/strong&gt; = previous generation (NVIDIA T4 GPUs)
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;G5&lt;/strong&gt; = current generation (NVIDIA A10G GPUs), offering &lt;strong&gt;more GPU cores, faster memory, higher network bandwidth, and improved performance&lt;/strong&gt; for machine learning, AI training, and real-time graphics workloads.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;G6 and beyond&lt;/strong&gt; = future generations with updated GPUs, performance improvements, and additional features.
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In short, &lt;strong&gt;G5 = the fifth-generation, high-performance GPU instance line from AWS&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;If the &lt;strong&gt;instance type starts with &lt;code&gt;g5&lt;/code&gt;&lt;/strong&gt;, AWS will:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Attach &lt;strong&gt;NVIDIA A10G Tensor Core GPUs&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Expose them to the OS via &lt;strong&gt;PCIe&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Make them available to GPU-enabled software
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Non-GPU instance types (&lt;code&gt;m&lt;/code&gt;, &lt;code&gt;c&lt;/code&gt;, &lt;code&gt;t&lt;/code&gt;) include &lt;strong&gt;no GPU&lt;/strong&gt;. The difference is decided at &lt;strong&gt;instance creation&lt;/strong&gt;.  &lt;/p&gt;

&lt;p&gt;👉 &lt;a href="https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/accelerated-computing-instances.html" rel="noopener noreferrer"&gt;Accelerated Computing Instances&lt;/a&gt;&lt;br&gt;&lt;br&gt;
👉 &lt;a href="https://aws.amazon.com/ec2/instance-types/g5/" rel="noopener noreferrer"&gt;G5 Instance Types&lt;/a&gt;&lt;/p&gt;


&lt;h2&gt;
  
  
  What PCIe Is (Briefly)
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;PCIe&lt;/strong&gt; is the high-speed interface connecting the GPU to the CPU. You don’t program PCIe directly — frameworks like CUDA, PyTorch, TensorFlow, and OpenGL handle it.&lt;/p&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;

&lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;randn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1024&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1024&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# CPU memory
&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cuda&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;             &lt;span class="c1"&gt;# PCIe transfer to GPU memory
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;All GPU computation after this runs on VRAM, no PCIe involved. Think of PCIe as the &lt;strong&gt;high-speed lane&lt;/strong&gt; moving data between CPU and GPU.&lt;/p&gt;




&lt;h2&gt;
  
  
  EC2 Does Not Automatically Use the GPU
&lt;/h2&gt;

&lt;p&gt;EC2 only exposes the GPU; your code decides how to use it. Typical workflow:  &lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Install NVIDIA drivers
&lt;/li&gt;
&lt;li&gt;Install CUDA or GPU-enabled libraries
&lt;/li&gt;
&lt;li&gt;Run software targeting the GPU
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Verify GPU availability:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;nvidia-smi
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;nvidia-smi&lt;/code&gt; shows attached GPUs, memory usage, and utilization.  &lt;/p&gt;

&lt;p&gt;👉 &lt;a href="https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/install-nvidia-driver.html" rel="noopener noreferrer"&gt;Install NVIDIA Driver&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Why Hong Kong
&lt;/h2&gt;

&lt;p&gt;With &lt;strong&gt;G5 instances now available in Hong Kong&lt;/strong&gt;, GPU compute is closer to the people and teams who need it.  &lt;/p&gt;

&lt;p&gt;This matters because Hong Kong has &lt;strong&gt;high demand for GPU-intensive workloads&lt;/strong&gt; such as:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;AI and machine learning&lt;/strong&gt; — training and inference run faster with local GPUs
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Real-time graphics and simulations&lt;/strong&gt; — rendering, cloud gaming, and design applications benefit from reduced latency
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Rapid experimentation&lt;/strong&gt; — teams can prototype and iterate on GPU-powered applications without relying on distant regions
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;By providing GPU compute locally, AWS enables developers in Hong Kong to &lt;strong&gt;move faster, test more, and deploy GPU-driven projects efficiently&lt;/strong&gt;, making it easier to innovate on compute-heavy workloads.  &lt;/p&gt;

&lt;p&gt;👉 &lt;a href="https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-regions-availability-zones.html" rel="noopener noreferrer"&gt;Regions &amp;amp; Availability Zones&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;EC2&lt;/strong&gt; = virtual machines you control
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Elastic&lt;/strong&gt; = can scale up/down based on demand; relevant for bursty vs constant GPU workloads
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;G5&lt;/strong&gt; = GPU-enabled EC2 instances
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GPU usage&lt;/strong&gt; = controlled by your code, not EC2
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;PCIe&lt;/strong&gt; = the interface that moves data between CPU and GPU
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AMI&lt;/strong&gt; = the template EC2 uses to launch the instance, including OS and software
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Launching a G5 instance today gives you GPU acceleration &lt;strong&gt;through the same APIs and workflows you already know&lt;/strong&gt;, making high-performance computing accessible, scalable, and programmable in the cloud.&lt;/p&gt;

</description>
      <category>aws</category>
      <category>ai</category>
      <category>cloud</category>
      <category>gpu</category>
    </item>
    <item>
      <title>What I Learned at the CNCF Montreal KubeCon NA 2025 Recap</title>
      <dc:creator>Sonia Rahal</dc:creator>
      <pubDate>Sat, 20 Dec 2025 14:59:42 +0000</pubDate>
      <link>https://forem.com/soniarahal/what-i-learned-at-the-cncf-montreal-kubecon-na-2025-recap-13l9</link>
      <guid>https://forem.com/soniarahal/what-i-learned-at-the-cncf-montreal-kubecon-na-2025-recap-13l9</guid>
      <description>&lt;p&gt;On December 10th, the &lt;strong&gt;Cloud Native Montreal&lt;/strong&gt; community hosted a recap of &lt;strong&gt;KubeCon NA 2025 in Atlanta&lt;/strong&gt;. Rather than being a traditional conference, this was a community-driven evening with lightning talks and reflections on where the cloud-native ecosystem is heading.&lt;/p&gt;

&lt;p&gt;Instead of focusing on slides or announcements, the event emphasized &lt;strong&gt;patterns and lessons&lt;/strong&gt; emerging across the ecosystem — from AI agents and observability to GitOps and energy-aware infrastructure.&lt;/p&gt;

&lt;p&gt;Here are the key takeaways that stood out.&lt;/p&gt;




&lt;h2&gt;
  
  
  Cloud Native Is Becoming AI-Native
&lt;/h2&gt;

&lt;p&gt;One recurring theme was that &lt;strong&gt;AI workloads are now first-class citizens&lt;/strong&gt; in cloud-native environments.&lt;/p&gt;

&lt;p&gt;Traditional observability answers questions like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Is the service up?&lt;/li&gt;
&lt;li&gt;Is latency within SLOs?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;AI systems introduce new operational questions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What prompt triggered this behavior?&lt;/li&gt;
&lt;li&gt;Which model call was expensive?&lt;/li&gt;
&lt;li&gt;Why did this agent take a specific action?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Tools such as &lt;strong&gt;OpenLLMetry&lt;/strong&gt; extend OpenTelemetry with instrumentation for LLM and agent workflows, while &lt;strong&gt;OpenCost&lt;/strong&gt; provides visibility into Kubernetes and cloud spend across workloads, teams, and environments.&lt;/p&gt;

&lt;p&gt;The takeaway is clear:&lt;br&gt;&lt;br&gt;
&lt;strong&gt;You can’t scale AI systems you can’t observe or financially understand.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Observability Is Shifting From Dashboards to Agents
&lt;/h2&gt;

&lt;p&gt;Observability is evolving beyond dashboards and alerts toward &lt;strong&gt;agent-assisted operations&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Instead of engineers manually correlating metrics, logs, and recent deployments, emerging tools aim to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Perform root-cause analysis&lt;/li&gt;
&lt;li&gt;Triage alerts&lt;/li&gt;
&lt;li&gt;Recommend remediation steps&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Projects like &lt;strong&gt;k8sgpt&lt;/strong&gt;, &lt;strong&gt;Seraph&lt;/strong&gt;, and newer agentic SRE tools suggest a future where observability systems don’t just surface data — they actively reason over it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Several tools highlighted this shift:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/k8sgpt-ai/k8sgpt" rel="noopener noreferrer"&gt;k8sgpt&lt;/a&gt;&lt;/strong&gt; — AI-native Kubernetes troubleshooting
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://github.com/holmesgpt/holmesgpt" rel="noopener noreferrer"&gt;HolmesGPT&lt;/a&gt;&lt;/strong&gt; / &lt;strong&gt;&lt;a href="https://github.com/seraph-ai/seraph" rel="noopener noreferrer"&gt;Seraph&lt;/a&gt;&lt;/strong&gt; — Automated root cause analysis and alert mitigation
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Emerging Agent-Based Platforms:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href="https://aws.amazon.com/blogs/aws/aws-devops-agent-helps-you-accelerate-incident-response-and-improve-system-reliability-preview/" rel="noopener noreferrer"&gt;AWS DevOps Agent (Preview)&lt;/a&gt;&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href="https://azure.microsoft.com/en-us/products/sre-agent" rel="noopener noreferrer"&gt;Azure SRE Agent (Preview)&lt;/a&gt;&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href="https://cleric.ai/" rel="noopener noreferrer"&gt;Cleric&lt;/a&gt;&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;a href="https://github.com/PatrickKalkman/kube-whisper" rel="noopener noreferrer"&gt;Kube Whisperer&lt;/a&gt;&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These agents correlate &lt;strong&gt;logs, metrics, deployments, and incidents&lt;/strong&gt; to assist on-call engineers and reduce &lt;strong&gt;alert fatigue&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;This doesn’t replace engineers, but it changes the workflow: less time searching for signals, more time making informed decisions.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fg3ffxf5m2xdwb9idja2o.jpg" alt="Image of agentic SRE tools" width="800" height="762"&gt;
&lt;/h2&gt;

&lt;h2&gt;
  
  
  Abstraction Helps — but Security Must Follow
&lt;/h2&gt;

&lt;p&gt;Another major topic was &lt;strong&gt;Cyclops&lt;/strong&gt;, an open-source platform that simplifies Kubernetes by replacing raw YAML with structured, form-based abstractions.&lt;/p&gt;

&lt;p&gt;Cyclops introduces:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Modules&lt;/strong&gt; — logical groupings of all Kubernetes resources an application needs&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Templates&lt;/strong&gt; — mappings that translate module inputs into valid Kubernetes manifests&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;How Cyclops works with Helm:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Helm charts&lt;/strong&gt; define the Kubernetes resources (Deployments, Services, Ingress, etc.) using templated YAML.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Cyclops wraps those Helm charts and &lt;strong&gt;exposes their values as validated forms&lt;/strong&gt; instead of free-text YAML edits.&lt;/li&gt;
&lt;li&gt;Users fill in forms, and Cyclops renders the underlying Helm templates into valid Kubernetes manifests.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Cyclops also supports AI-driven operations through a &lt;strong&gt;Model Context Protocol (MCP) server&lt;/strong&gt;, allowing agents to manage applications using natural language rather than direct cluster access.&lt;/p&gt;

&lt;p&gt;The key lesson here wasn’t blind automation, but caution:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Code generated by AI should be treated as untrusted.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Security risks still apply. As abstraction increases, &lt;strong&gt;guardrails, validation, and testing become even more critical&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  GitOps Works Best When Designed for Teams
&lt;/h2&gt;

&lt;p&gt;A practical GitOps case study highlighted that &lt;strong&gt;repository structure matters as much as tooling&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Key principles discussed:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Align configuration structure with team ownership&lt;/li&gt;
&lt;li&gt;Centralize configuration while keeping environments explicit&lt;/li&gt;
&lt;li&gt;Keep related files close together (“proximity matters”)&lt;/li&gt;
&lt;li&gt;Optimize for developer experience, not just correctness&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Using &lt;strong&gt;ArgoCD&lt;/strong&gt;, deployments become automated, auditable, and consistent — but only when GitOps is treated as both a &lt;strong&gt;technical and organizational design&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F28wlswjoai77dkwt2rbt.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F28wlswjoai77dkwt2rbt.jpg" alt="Image of Before/After Gitops Repository Structure" width="800" height="911"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Energy Efficiency Is Becoming a Platform Concern
&lt;/h2&gt;

&lt;p&gt;The final talk focused on &lt;strong&gt;Kepler&lt;/strong&gt;, a CNCF project designed to expose energy consumption at the container level.&lt;/p&gt;

&lt;p&gt;Kepler provides:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Fine-grained container and process power metrics&lt;/li&gt;
&lt;li&gt;Support for CPUs, GPUs, and heterogeneous hardware&lt;/li&gt;
&lt;li&gt;Low overhead using eBPF&lt;/li&gt;
&lt;li&gt;Integration with existing observability stacks&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;As GPU-heavy and AI workloads grow, energy usage and cooling costs are becoming operational concerns.&lt;/p&gt;

&lt;p&gt;The key message:&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Sustainability is now part of platform engineering, not just hardware planning.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frfpf7df3ifejedzgvi0w.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frfpf7df3ifejedzgvi0w.jpg" alt="Image of the 8 concepts of Kepler Project" width="800" height="908"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Final Reflection
&lt;/h2&gt;

&lt;p&gt;This KubeCon recap wasn’t about memorizing tools — it was about understanding direction.&lt;/p&gt;

&lt;p&gt;Across talks, a consistent shift emerged:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;From reactive monitoring to &lt;strong&gt;AI-assisted operations&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;From raw YAML to &lt;strong&gt;safe, opinionated abstractions&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;From cost surprises to &lt;strong&gt;cost-aware platforms&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;From performance-only metrics to &lt;strong&gt;energy-aware infrastructure&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Community-driven events like this help connect individual technologies into a cohesive mental model of where cloud-native systems are heading next.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>community</category>
      <category>kubernetes</category>
    </item>
    <item>
      <title>AWS Bedrock AgentCore Hands-On Workshop: A Recap</title>
      <dc:creator>Sonia Rahal</dc:creator>
      <pubDate>Sat, 20 Dec 2025 02:16:01 +0000</pubDate>
      <link>https://forem.com/soniarahal/aws-bedrock-agentcore-hands-on-workshop-a-recap-3pap</link>
      <guid>https://forem.com/soniarahal/aws-bedrock-agentcore-hands-on-workshop-a-recap-3pap</guid>
      <description>&lt;p&gt;&lt;strong&gt;Location:&lt;/strong&gt; Montréal AWS User Group &lt;br&gt;
 &lt;strong&gt;Date:&lt;/strong&gt; December 18, 2025&lt;/p&gt;




&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;p&gt;This workshop was a hands-on journey through &lt;strong&gt;Amazon Bedrock AgentCore&lt;/strong&gt; (a platform to run AI agents at scale), covering &lt;strong&gt;Runtime, Gateway, Identity, Memory, Built-in Tools, and Observability&lt;/strong&gt;. Participants learned how to take AI agents from simple PoC (Proof of Concept) to &lt;strong&gt;secure, enterprise-ready applications&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; Each demo shown here is &lt;strong&gt;just one example&lt;/strong&gt;, and the tools mentioned are a subset of what was explored during the workshop, not exhaustive.&lt;/p&gt;




&lt;h2&gt;
  
  
  My Story: Why Cloud and Agents Matter
&lt;/h2&gt;

&lt;p&gt;Getting into cloud development isn’t just about learning services—it’s about &lt;strong&gt;understanding the real problem first&lt;/strong&gt;. Code is a tool for reliability, not the final asset. The bigger picture is knowing &lt;strong&gt;why a company would use Amazon Bedrock AgentCore&lt;/strong&gt;.  &lt;/p&gt;

&lt;p&gt;Enterprises want AI agents that can go from experiments to real-life, &lt;strong&gt;secure, scalable, and observable applications&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;This workshop helped me &lt;strong&gt;connect the dots&lt;/strong&gt;: how modules and tools work together to create agents that are not just smart, but &lt;strong&gt;reliable and trustworthy&lt;/strong&gt;.  &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Target audience:&lt;/strong&gt; Enterprises or developers wanting AI agents without managing all the complex infrastructure themselves. Their goals include building reliable agents, scaling safely, integrating with external systems, and having full visibility (observability) into agent operations.&lt;/p&gt;




&lt;h2&gt;
  
  
  Workshop Modules: A Story Through Examples
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Runtime (Demo: Weather + Calculator Agent)
&lt;/h3&gt;

&lt;p&gt;Imagine you want to create an agent that can tell the weather or perform calculations for users. &lt;strong&gt;Runtime&lt;/strong&gt; is the engine that makes this possible.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;What it is:&lt;/strong&gt; A secure environment that runs your agent (the software that answers questions or performs tasks), handling infrastructure, scaling, and session management.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Why it matters:&lt;/strong&gt; Developers can focus on &lt;strong&gt;what the agent does&lt;/strong&gt; instead of worrying about servers or security.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Example Demo:&lt;/strong&gt; Weather + Calculator agent. Runtime handled all container orchestration and session isolation.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Prompt Example:&lt;/strong&gt; &lt;code&gt;How is the weather?&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tools Used:&lt;/strong&gt; Strands Agent, Elastic Container Registry, Terminal prompts&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Takeaway:&lt;/strong&gt; Runtime is the backbone that turns a prototype into a &lt;strong&gt;production-ready agent&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  Gateway (Demo: Mars Weather Agent)
&lt;/h3&gt;

&lt;p&gt;Imagine your agent needs data from external sources, like NASA’s weather data for Mars. &lt;strong&gt;Gateway&lt;/strong&gt; is what connects your agent to the outside world.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;What it is:&lt;/strong&gt; The integration layer that allows agents to interact with external systems or APIs.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Why it matters:&lt;/strong&gt; To provide real-world insights, agents need access to external information safely and reliably. Gateway allows defining &lt;a href="https://modelcontextprotocol.io/specification/2025-06-18/server/tools" rel="noopener noreferrer"&gt;tools&lt;/a&gt; with metadata about name, description, input/output schemas, and behavior.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Example Demo:&lt;/strong&gt; Mars Weather agent called NASA’s Open APIs using an API key. &lt;a href="https://api.nasa.gov/insight_weather/?api_key=DEMO_KEY&amp;amp;feedtype=json&amp;amp;ver=1.0" rel="noopener noreferrer"&gt;Here is an API response example&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Prompt Example:&lt;/strong&gt; &lt;code&gt;"Hi, can you list all tools available to you"&lt;/code&gt; &lt;code&gt;"What is the weather in northern part of the Mars"&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Tools Used:&lt;/strong&gt; REST APIs, AgentCore Gateway, API keys&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Takeaway:&lt;/strong&gt; Gateway bridges the agent and external systems, enabling &lt;strong&gt;actionable intelligence&lt;/strong&gt; and structured tool integration.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  Identity (Demo: AgentCore Runtime with vs without Authorization)
&lt;/h3&gt;

&lt;p&gt;Imagine that not everyone should be able to use your agent, or some tasks require special permissions. &lt;strong&gt;Identity&lt;/strong&gt; handles that.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;What it is:&lt;/strong&gt; Manages who can invoke agents and what they can access.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Why it matters:&lt;/strong&gt; Protects sensitive data and ensures compliance in enterprise environments.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Example Demo:&lt;/strong&gt; Weather agent invoked with authorization worked; without authorization, it returned an error &lt;code&gt;AccessDeniedException&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Prompt Example:&lt;/strong&gt; &lt;code&gt;"How is the weather?"&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Tools Used:&lt;/strong&gt; Amazon Cognito, JWT tokens&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Takeaway:&lt;/strong&gt; Identity ensures &lt;strong&gt;only authorized users or systems interact with agents&lt;/strong&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  Memory (Demo: AI Learning Agent)
&lt;/h3&gt;

&lt;p&gt;Imagine talking to an agent that remembers you and what you’ve discussed before. &lt;strong&gt;Memory&lt;/strong&gt; makes this possible.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;What it is:&lt;/strong&gt; Stores context for multi-turn conversations.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Short-term memory:&lt;/strong&gt; remembers context during a session (e.g., last few questions)
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Long-term memory:&lt;/strong&gt; preserves key information across sessions (e.g., user preferences, summaries)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Why it matters:&lt;/strong&gt; Memory enables agents to give &lt;strong&gt;personalized and context-aware responses&lt;/strong&gt;, improving over time.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Example Demo:&lt;/strong&gt; The agent remembered the user’s name (Alex) and topics of interest in AI across sessions.&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Prompt Example:&lt;/strong&gt; &lt;br&gt;
User: &lt;code&gt;"My name is Alex and I'm interested in learning about AI."&lt;/code&gt;&lt;br&gt;
Agent: &lt;code&gt;"Hi Alex! I’m excited to help you learn about AI!"&lt;/code&gt;&lt;br&gt;
Later:&lt;br&gt;
User: &lt;code&gt;"What was my name again?"&lt;/code&gt;&lt;br&gt;
Agent: &lt;code&gt;"Your name is Alex!"&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Tools Used:&lt;/strong&gt; AgentCore Memory, Strands MetricsClient&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Takeaway:&lt;/strong&gt; Short-term memory provides &lt;strong&gt;session-level context&lt;/strong&gt;, long-term memory provides &lt;strong&gt;persistent context&lt;/strong&gt; that improves user experience and enables agents to maintain continuity over time.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  Built-in Tools (Demo: Amazon Revenue Extraction)
&lt;/h3&gt;

&lt;p&gt;Imagine your agent needs to not just answer questions but &lt;strong&gt;extract and process data&lt;/strong&gt;.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;What it is:&lt;/strong&gt; Pre-built tools like Browser or Code Interpreter extend agent capabilities.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Why it matters:&lt;/strong&gt; Agents can perform specialized tasks safely and efficiently.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Example Demo:&lt;/strong&gt; Extract Amazon revenue data from a website using Browser tool with Nova Act SDK.&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Prompt Example:&lt;/strong&gt; &lt;code&gt;"Extract and return Amazon revenue for the last 4 years from stockanalysis.com."&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Tools Used:&lt;/strong&gt; Browser Tool, Code Interpreter, Nova Act SDK&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Takeaway:&lt;/strong&gt; Built-in tools enable agents to &lt;strong&gt;handle complex tasks&lt;/strong&gt;, making them more useful in enterprise contexts.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  Observability (Demo: CrewAI Travel Agent)
&lt;/h3&gt;

&lt;p&gt;Imagine launching an agent in production and needing insight into its behavior. Observability solves this.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;What it is:&lt;/strong&gt; Monitoring and logging for agent workflows, tool usage, performance, and errors.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Why it matters:&lt;/strong&gt; Ensures agents are &lt;strong&gt;traceable, measurable, and debuggable&lt;/strong&gt;, which builds trust.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Example Demo Workflow:&lt;/strong&gt; &lt;/li&gt;
&lt;li&gt;Create a runtime-ready &lt;a href="https://docs.crewai.com/en/concepts/agents#basic-research-agent" rel="noopener noreferrer"&gt;CrewAI agent&lt;/a&gt; using Amazon Bedrock, defining notably its &lt;strong&gt;role, goal, backstory, and task&lt;/strong&gt;.
&lt;/li&gt;
&lt;li&gt;Instrument the agent with &lt;code&gt;CrewAIInstrumentor().instrument()&lt;/code&gt;
to enable observability.
&lt;/li&gt;
&lt;li&gt;Use &lt;strong&gt;Boto3&lt;/strong&gt; to invoke the agent: &lt;code&gt;prompt = "What are some rodeo events happening in Oklahoma?"&lt;/code&gt;

&lt;ul&gt;
&lt;li&gt;Multiple responses are found in parallel.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;Dashboards on CloudWatch show runtime metrics across all agents, and clicking on a specific agent shows detailed metrics with custom time-frame filtering.&lt;/li&gt;

&lt;li&gt;

&lt;strong&gt;Tools Used:&lt;/strong&gt; Amazon CloudWatch, Boto3 SDK, Crew AI, Scarf, AWS Distro for OpenTelemetry&lt;/li&gt;

&lt;li&gt;

&lt;strong&gt;Takeaway:&lt;/strong&gt; Observability ensures &lt;strong&gt;production agents are monitored and performance is visible&lt;/strong&gt;, supporting reliability and optimization.&lt;/li&gt;

&lt;/ul&gt;




&lt;h2&gt;
  
  
  Why Amazon Bedrock AgentCore Matters
&lt;/h2&gt;

&lt;p&gt;Enterprises adopt Bedrock AgentCore to move from a &lt;strong&gt;proof of concept to production-ready AI applications&lt;/strong&gt;. It provides:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Scalable deployment&lt;/strong&gt; without managing infrastructure
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Secure, authorized execution&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Contextual and persistent memory&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Integration with external systems and workflows&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Full observability for performance and errors&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Understanding these modules helps developers &lt;strong&gt;deliver AI solutions that meet enterprise goals&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Final Takeaways
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Cloud development is about seeing the &lt;strong&gt;big picture&lt;/strong&gt;, not just writing code.
&lt;/li&gt;
&lt;li&gt;AgentCore offers a &lt;strong&gt;sandbox to experiment safely&lt;/strong&gt; with enterprise-grade agents.
&lt;/li&gt;
&lt;li&gt;Observability ensures live agents can be &lt;strong&gt;monitored, optimized, and trusted&lt;/strong&gt;.
&lt;/li&gt;
&lt;li&gt;Hands-on workshops and community engagement are invaluable for &lt;strong&gt;learning how tools solve real-world problems&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>aws</category>
      <category>cloud</category>
      <category>agents</category>
    </item>
  </channel>
</rss>
