Forem: Sonia Rahal

AWS re:Invent 2025 Montreal Recap: 6 Lightning Demos That Actually Change How You Build

Sonia Rahal — Fri, 23 Jan 2026 08:01:17 +0000

I went to a local re:Invent recap meetup in Montreal on January 15, expecting a high-level overview of AWS announcements.

What I got instead was something much better.

Six speakers each had ten minutes to demo one concrete feature they were genuinely excited about; not slides, not marketing talk, but “here’s what it does and why it changes things.”

I’m deeply curious about cloud computing and how modern systems are actually built, so this format really worked for me. It wasn’t a deep dive into internals, but it also wasn’t vague or fluffy. It sat in a sweet spot: specific enough to understand what’s new and why it matters, without needing to already be an AWS specialist.

Here’s a recap of the six features that stood out most - and how they fit into a much bigger shift AWS is making.

1) AWS DevOps Agent; AI That Investigates Incidents With You

The first demo showed AWS DevOps Agent, a new AI-powered operational assistant (currently in preview) designed to help teams investigate incidents and find root causes faster.

Instead of just alerting you that “something is broken,” the agent actually tries to understand why.

In the demo, the speaker intentionally broke a Lambda function by misconfiguring its handler. The DevOps Agent:

Detected errors from logs and metrics
Pulled configuration history
Built a timeline of what changed
Mapped dependencies between services
Suggested the most likely root cause

It also builds an application topology; basically a live map of how your Lambdas, databases, pipelines, and services connect. So it can reason about blast radius and downstream impact.

What made this feel different from normal observability tooling is that you can interact with the investigation:

Ask it follow-up questions
Tell it where else to look
Have it post findings to Slack or ServiceNow
Auto-generate AWS Support cases with context attached

It feels like AWS is trying to turn operations from “alert + panic + dashboards” into “alert + guided diagnosis + suggested fix.”

2) AWS Transform; AI-Guided Codebase Migration That Isn’t Reckless

The second demo focused on AWS Transform, an AI-powered tool for modernizing large codebases.

This isn’t just “throw your repo into ChatGPT and pray.”

You run it from a CLI, tell it what kind of migration you want (for example: Node.js 16 → Node.js 20, or AWS SDK v1 → v2), and it:

Scans your repository
Applies a guided refactor across files
Lets you attach context like:
- “Don’t break this logging framework”
- “Preserve backward compatibility for this API”
Requires a verification command to pass (like npm test or mvn verify) If the tests fail, the migration is considered unsuccessful.

What stood out to me was how seriously correctness is treated. This is closer to a controlled migration pipeline than a one-shot AI rewrite.

The speaker referenced two real AWS case studies:

Air Canada: migrated ~1,000 Lambda functions to a new Node.js runtime
Twitch: migrated ~913 Go repositories from AWS SDK v1 → v2

Saving ~2,800 developer-days.

The bigger idea here isn’t just faster refactors. It’s compressing years of technical debt cleanup into weeks.

3) SageMaker Studio; Becoming the Front Door for All Data + AI

The third demo showed the new version of Amazon SageMaker Studio and how AWS is trying to turn it into a single workspace for everything data and AI-related.

Three concrete things stood out:

Built-in Data Catalog + Discovery

Inside Studio, teams can now browse:

Datasets
Tables
Models
Notebooks
Pipelines

Each asset can include:

Documentation
Auto-generated descriptions (via Amazon Q)
Metadata
Data quality indicators
Lineage info

This makes it possible to build a real internal “marketplace” for data and models instead of everything living in random S3 buckets.

Querying + Notebooks Without Leaving Studio

You can:

Browse tables
Run SQL queries (powered by Athena)
Preview datasets
Open Jupyter notebooks

All from one UI.

Amazon Q is embedded directly into notebooks. In the demo, the speaker:

Asked Q to generate SQL
Asked Q to generate Python
Asked Q to generate a Matplotlib chart

This turns notebooks into an AI-assisted analysis environment instead of a blank coding surface.

Serverless Airflow Built Into Studio

Studio now integrates Amazon Managed Workflows for Apache Airflow in a serverless form.

That means:

No control plane to manage
No always-on cluster cost
Native UI integration

You can build:

Training pipelines
Evaluation pipelines
ML workflows

Directly inside Studio.

It collapses notebooks, orchestration, and ML tooling into one place.

4) Durable Lambda; Serverless That Can Finally Wait

Traditional Lambda breaks down for:

Long workflows
Human approvals
External callbacks
Multi-step orchestration

So people end up wiring together Step Functions + DynamoDB + retry logic.

AWS now added Durable Lambda primitives:

Wait; Pause execution without paying compute (up to one year)
Checkpoint; Persist state so retries resume from the same point
Wait for Callback; Send a token to an external system and resume when it returns

How it works in practice:

Create a Durable Lambda function in the AWS console.
AWS automatically manages the underlying state storage — no DynamoDB or S3 setup needed.
Function runtime can pause and resume at checkpoints or callback points.
Retry logic is built-in and safe: the function won’t duplicate payments or actions.

In the demo, the workflow looked like this:

Reserve inventory
Checkpoint
Process payment
Checkpoint
Wait 15 minutes for user payment
Resume
Ship product

No Step Functions. No external state store.

Retries also become safe:

No duplicate payments
No double reservations

This is also perfect for AI workflows:

Waiting for long LLM calls
Waiting for human-in-the-loop approvals
Waiting for batch embedding jobs

All without paying for idle compute.

5) Lambda on EC2 Capacity Providers; Serverless Without Cold Starts

Lambda can now run on AWS-managed EC2 instances, giving you more control and eliminating cold starts.

How it works:

Create a capacity provider in Lambda — AWS provisions and manages EC2 instances for you.
Configure instance type, CPU, memory, and architecture (GPU support coming).
Lambda functions run on these pre-warmed instances for predictable performance.
AWS handles patching, scaling, and lifecycle management — no SSH or instance management needed.

Benefits:

Always-warm environments
No cold starts
Control over instance types, CPU, memory
Multi-concurrency per vCPU (GPU support planned)

AWS still manages the instances - you can’t SSH or patch anything - but you get predictable performance and much better economics at scale.

Pricing example from the demo:

100M requests / month
20ms runtime
Default Lambda: ~$3,000/month
Lambda on EC2: ~$431/month

That’s a massive difference for high-throughput APIs or inference endpoints.

6) S3 Vectors Vector Storage at Object-Store Scale

The last demo started by explaining what vectors are and why they matter for modern AI workflows.

Vectors are numeric representations of data (like text, images, or embeddings) that let models compute similarity, find nearest neighbors, or perform semantic search.
Modern AI applications - RAG pipelines, recommendation systems, search engines - rely heavily on vectors.

The problem today: most vector databases are expensive, always-on, and operationally heavy.

AWS’s solution: S3 Vector Buckets.

Vector Buckets are a new type of S3 bucket optimized for storing embeddings. They allow you to:

Store embeddings directly in S3
Create vector indexes
Run approximate nearest-neighbor (ANN) search
Use them in RAG pipelines, Bedrock, and SageMaker

Why S3 Vector Buckets make sense:

Scalability: billions of vectors at object-store scale
Cost: much cheaper than always-on vector DBs
Durability: inherits S3 reliability
Integration: works natively with other AWS services

Trade-off: higher latency than specialized vector databases like Pinecone or OpenSearch.

Ideal use cases:

Knowledge bases
Large-scale RAG corpora
Offline or batch semantic search

The Bigger Pattern I Took Away

Across all six demos, a clear pattern emerged.

AWS is collapsing entire categories of glue infrastructure.

What used to require:

Step Functions
DynamoDB state tables
Vector databases
Orchestration clusters
Custom internal catalogs

Now lives inside:

SageMaker Studio
Durable Lambda
S3 Vectors
Lambda on EC2
Serverless Airflow

It’s not flashy, but it quietly changes what “simple architecture” even means in 2025.

Final Note

The meetup ended with an amazing giveaway.

Pure luck, I won. And so did the two people next to me.

So maybe that same luck carries over to you reading this: hope one of these features ends up being exactly what unlocks your next project.

When GPU Compute Moves Closer to Users: Rethinking CPU↔GPU Boundaries in Cloud Architecture

Sonia Rahal — Tue, 06 Jan 2026 17:01:55 +0000

Intro

Following my previous post on the availability of GPU cloud instances in new regions (Hong Kong), I became curious about the bottlenecks and architectural implications when GPU compute moves closer to users. As cloud providers expand GPU availability, assumptions about CPU↔GPU boundaries in cloud VMs are starting to break.

GPU-accelerated cloud compute is expanding rapidly as AI, ML, real-time graphics, and simulations become more central to modern applications. Historically, GPU instances were limited to a few regions, creating a mental model where GPUs were centralized accelerators, and CPU↔GPU interactions were a controlled, high-latency boundary.

In this post, I’ll explore what changes when GPUs move closer to users, why the CPU↔GPU boundary matters architecturally, and what design considerations engineers should keep in mind.

What is the CPU↔GPU Boundary?

At a high level, the CPU↔GPU boundary defines:

CPU responsibilities: control flow, scheduling, orchestration, I/O, system calls
GPU responsibilities: parallel computation, vectorized operations, specialized kernels
Data transfer: CPU memory ↔ GPU memory via PCIe (Peripheral Component Interconnect Express)

Traditionally in cloud VMs:

GPU resources were centralized and scarce
Workloads were batch-oriented and tolerant of latency
CPU↔GPU transfers happened infrequently and in large chunks

This boundary dictated service decomposition, batching strategies, and elasticity planning.

How CPU↔GPU Interactions Work (PCIe & Coding Example)

The CPU↔GPU boundary is implemented via PCIe, which moves data between the CPU and GPU memory (VRAM). GPU frameworks like CUDA, PyTorch, or TensorFlow handle these transfers automatically.

Here’s an example in Python using PyTorch:

import torch

# create data on CPU
x_cpu = torch.randn(1024, 1024)

# move data to GPU via PCIe
x_gpu = x_cpu.to("cuda")

# computation now happens on GPU
y_gpu = x_gpu @ x_gpu  # matrix multiplication

# bring result back to CPU
y_cpu = y_gpu.to("cpu")

.to("cuda") triggers the PCIe transfer.
GPU computation is fast, but PCIe transfers have limited bandwidth and non-negligible latency.
Frequent small transfers can bottleneck performance, especially for interactive workloads.

Why PCIe Can Be a Bottleneck

Limited bandwidth: PCIe Gen 4 tops out around 16 GB/s per lane; fast, but small relative to GPU compute speed.
Latency for interactive workloads: Small, frequent transfers amplify CPU↔GPU latency.
Multiple GPUs: Each GPU has its own PCIe link; scaling horizontally increases potential bottlenecks.
Elastic cloud instances: Each new GPU instance defines a new CPU↔GPU boundary, making scheduling more complex.

Why Regional GPU Availability Matters

When cloud providers launch GPUs in more regions:

GPUs are physically closer to end-users and storage, reducing network latency.
Interactive applications (AI inference, simulations, rendering) benefit because network latency no longer dominates total response time.
Scaling workloads becomes more flexible; elastic GPU instances can spin up closer to data.

Architectural implication:

The CPU↔GPU boundary is no longer just “how fast PCIe moves data,” but “how far is the data from the CPU↔GPU interface in the first place?”

Conceptual Diagram

      User / Data Source
             │
             ▼
       Regional Network
             │
    +--------+--------+
    |       CPU       |
    | Control / I/O   |
    +--------+--------+
             │ PCIe transfer
             ▼
    +--------+--------+
    |       GPU       |
    | Parallel Compute|
    +----------------+
             │
           VRAM

Adding more regions moves the CPU↔GPU block closer to users/data, reducing network latency.
PCIe remains a bottleneck inside the VM, but overall system latency decreases.

Architectural Implications

Lower Latency Matters

Previously, sending data to a distant GPU was negligible for batch workloads.
Regional GPUs make interactive workloads latency-sensitive.

GPU Workloads Become More Interactive

Smaller, frequent GPU calls are now feasible.
GPUs participate directly in request paths rather than only batch jobs.

Elasticity Changes Design Choices

Each new GPU instance introduces a new CPU↔GPU boundary.
Architects must ask: move data to GPU or move workload to data?

Data Locality Becomes Critical

Moving data across regions may cost more than computation.
CPU↔GPU transfers must be considered alongside storage and network placement.

Bottlenecks to Watch

Bottleneck	Traditional Model	Regional GPU Model	Implication
PCIe Bandwidth	Large infrequent transfers	Frequent smaller transfers	May limit interactive performance
Latency	Batch-tolerant	Sensitive, local GPU	Requires redesigned request paths
Elasticity	Rare, long-running	Frequent scaling	Complex scheduling and data partitioning
Data Gravity	Centralized storage	Regional GPUs	Must rethink storage placement and pipeline design

Key Takeaways

Redefine the CPU↔GPU contract: GPUs are local compute primitives, not just accelerators.
Plan for latency-sensitive workloads: Micro-batching, asynchronous pipelines, and request scheduling matter.
Design for dynamic boundaries: Elastic GPU instances change how workloads are partitioned.
Consider regional data placement: Moving computation to data can outperform moving data to GPUs.
Monitor new bottlenecks: PCIe, memory bandwidth, and network congestion may become critical in new architectures.

Discussion / Next Steps

Regional GPU availability is changing cloud design assumptions. Engineers and architects should ask:

When does regional GPU placement actually improve performance or reduce cost?
Which workloads remain centralized, and which move closer to users?
How should elasticity, PCIe, and network bottlenecks factor into architecture diagrams?

Closing

Cloud GPUs are no longer distant, static resources. As they move closer to users and data, they force us to rethink how compute is distributed, how workloads are scheduled, and how architectural assumptions evolve. Understanding these shifts now will help engineers design more resilient, scalable, and efficient cloud systems.

Amazon EC2 G5 Instances Now Available in Asia Pacific (Hong Kong)

Sonia Rahal — Tue, 06 Jan 2026 04:56:12 +0000

Today, AWS makes Amazon EC2 G5 instances available in the Asia Pacific (Hong Kong) Region, expanding access to GPU-powered compute for customers running graphics-intensive and machine learning workloads in Asia Pacific.

This post explains what EC2 and G5 instances are and shows how to launch a G5 instance using code, along with key details about GPU usage, PCIe, and regional context.

GPU Cloud Trends

GPU-accelerated cloud computing is growing rapidly as AI, machine learning, and real-time graphics workloads become central to modern applications. Cloud GPU instances like EC2 G5 let teams scale high-performance compute without owning physical hardware, supporting workloads across AI, media, research, simulation, and more.

What EC2 Is

Amazon EC2 provides virtual machines in the cloud that you control like physical servers. Each instance is defined by:

AMI (Amazon Machine Image) — a template including the operating system, pre-installed software, and default settings
Instance type — CPU, memory, networking, GPU
Storage and network configuration

EC2 is called “Elastic” because its capacity can expand or shrink based on demand. You can launch many instances when workloads spike, or terminate them when they’re no longer needed. If demand is high, you can instantly scale up — elastic. If demand is steady, you can run a minimal setup — inelastic.

For GPU workloads, this flexibility is especially useful:

Spin up G5 instances on-demand for bursty tasks like AI training or video rendering
Use reserved G5 instances for continuous workloads like inference or simulations

Launching a G5 Instance (Example Code)

Instances can be launched via the console or programmatically. Using Python (boto3):

import boto3

ec2 = boto3.resource("ec2")

instance = ec2.create_instances(
    ImageId="ami-12345678",  # AMI = Amazon Machine Image (OS + software template)
    InstanceType="g5.xlarge",
    MinCount=1,
    MaxCount=1
)

print(instance[0].id)

Here, g5.xlarge launches a virtual machine with a GPU attached.

👉 EC2 Launch Guide

What “G5” Means

The G in G5 stands for GPU / Graphics, indicating that these instances are optimized for GPU-accelerated workloads.

The 5 represents the generation of the GPU instance family:

G4 = previous generation (NVIDIA T4 GPUs)
G5 = current generation (NVIDIA A10G GPUs), offering more GPU cores, faster memory, higher network bandwidth, and improved performance for machine learning, AI training, and real-time graphics workloads.
G6 and beyond = future generations with updated GPUs, performance improvements, and additional features.

In short, G5 = the fifth-generation, high-performance GPU instance line from AWS.

If the instance type starts with g5, AWS will:

Attach NVIDIA A10G Tensor Core GPUs
Expose them to the OS via PCIe
Make them available to GPU-enabled software

Non-GPU instance types (m, c, t) include no GPU. The difference is decided at instance creation.

👉 Accelerated Computing Instances

👉 G5 Instance Types

What PCIe Is (Briefly)

PCIe is the high-speed interface connecting the GPU to the CPU. You don’t program PCIe directly — frameworks like CUDA, PyTorch, TensorFlow, and OpenGL handle it.

Example:

import torch

x = torch.randn(1024, 1024)  # CPU memory
x = x.to("cuda")             # PCIe transfer to GPU memory

All GPU computation after this runs on VRAM, no PCIe involved. Think of PCIe as the high-speed lane moving data between CPU and GPU.

EC2 Does Not Automatically Use the GPU

EC2 only exposes the GPU; your code decides how to use it. Typical workflow:

Install NVIDIA drivers
Install CUDA or GPU-enabled libraries
Run software targeting the GPU

Verify GPU availability:

nvidia-smi

nvidia-smi shows attached GPUs, memory usage, and utilization.

👉 Install NVIDIA Driver

Why Hong Kong

With G5 instances now available in Hong Kong, GPU compute is closer to the people and teams who need it.

This matters because Hong Kong has high demand for GPU-intensive workloads such as:

AI and machine learning — training and inference run faster with local GPUs
Real-time graphics and simulations — rendering, cloud gaming, and design applications benefit from reduced latency
Rapid experimentation — teams can prototype and iterate on GPU-powered applications without relying on distant regions

By providing GPU compute locally, AWS enables developers in Hong Kong to move faster, test more, and deploy GPU-driven projects efficiently, making it easier to innovate on compute-heavy workloads.

👉 Regions & Availability Zones

Summary

EC2 = virtual machines you control
Elastic = can scale up/down based on demand; relevant for bursty vs constant GPU workloads
G5 = GPU-enabled EC2 instances
GPU usage = controlled by your code, not EC2
PCIe = the interface that moves data between CPU and GPU
AMI = the template EC2 uses to launch the instance, including OS and software

Launching a G5 instance today gives you GPU acceleration through the same APIs and workflows you already know, making high-performance computing accessible, scalable, and programmable in the cloud.

What I Learned at the CNCF Montreal KubeCon NA 2025 Recap

Sonia Rahal — Sat, 20 Dec 2025 14:59:42 +0000

On December 10th, the Cloud Native Montreal community hosted a recap of KubeCon NA 2025 in Atlanta. Rather than being a traditional conference, this was a community-driven evening with lightning talks and reflections on where the cloud-native ecosystem is heading.

Instead of focusing on slides or announcements, the event emphasized patterns and lessons emerging across the ecosystem — from AI agents and observability to GitOps and energy-aware infrastructure.

Here are the key takeaways that stood out.

Cloud Native Is Becoming AI-Native

One recurring theme was that AI workloads are now first-class citizens in cloud-native environments.

Traditional observability answers questions like:

Is the service up?
Is latency within SLOs?

AI systems introduce new operational questions:

What prompt triggered this behavior?
Which model call was expensive?
Why did this agent take a specific action?

Tools such as OpenLLMetry extend OpenTelemetry with instrumentation for LLM and agent workflows, while OpenCost provides visibility into Kubernetes and cloud spend across workloads, teams, and environments.

The takeaway is clear:

You can’t scale AI systems you can’t observe or financially understand.

Observability Is Shifting From Dashboards to Agents

Observability is evolving beyond dashboards and alerts toward agent-assisted operations.

Instead of engineers manually correlating metrics, logs, and recent deployments, emerging tools aim to:

Perform root-cause analysis
Triage alerts
Recommend remediation steps

Projects like k8sgpt, Seraph, and newer agentic SRE tools suggest a future where observability systems don’t just surface data — they actively reason over it.

Several tools highlighted this shift:

k8sgpt — AI-native Kubernetes troubleshooting
HolmesGPT / Seraph — Automated root cause analysis and alert mitigation

Emerging Agent-Based Platforms:

These agents correlate logs, metrics, deployments, and incidents to assist on-call engineers and reduce alert fatigue.

This doesn’t replace engineers, but it changes the workflow: less time searching for signals, more time making informed decisions.

Abstraction Helps — but Security Must Follow

Another major topic was Cyclops, an open-source platform that simplifies Kubernetes by replacing raw YAML with structured, form-based abstractions.

Cyclops introduces:

Modules — logical groupings of all Kubernetes resources an application needs
Templates — mappings that translate module inputs into valid Kubernetes manifests

How Cyclops works with Helm:

Helm charts define the Kubernetes resources (Deployments, Services, Ingress, etc.) using templated YAML.

Cyclops wraps those Helm charts and exposes their values as validated forms instead of free-text YAML edits.
Users fill in forms, and Cyclops renders the underlying Helm templates into valid Kubernetes manifests.

Cyclops also supports AI-driven operations through a Model Context Protocol (MCP) server, allowing agents to manage applications using natural language rather than direct cluster access.

The key lesson here wasn’t blind automation, but caution:

Code generated by AI should be treated as untrusted.

Security risks still apply. As abstraction increases, guardrails, validation, and testing become even more critical.

GitOps Works Best When Designed for Teams

A practical GitOps case study highlighted that repository structure matters as much as tooling.

Key principles discussed:

Align configuration structure with team ownership
Centralize configuration while keeping environments explicit
Keep related files close together (“proximity matters”)
Optimize for developer experience, not just correctness

Using ArgoCD, deployments become automated, auditable, and consistent — but only when GitOps is treated as both a technical and organizational design.

Energy Efficiency Is Becoming a Platform Concern

The final talk focused on Kepler, a CNCF project designed to expose energy consumption at the container level.

Kepler provides:

Fine-grained container and process power metrics
Support for CPUs, GPUs, and heterogeneous hardware
Low overhead using eBPF
Integration with existing observability stacks

As GPU-heavy and AI workloads grow, energy usage and cooling costs are becoming operational concerns.

The key message:

Sustainability is now part of platform engineering, not just hardware planning.

Final Reflection

This KubeCon recap wasn’t about memorizing tools — it was about understanding direction.

Across talks, a consistent shift emerged:

From reactive monitoring to AI-assisted operations
From raw YAML to safe, opinionated abstractions
From cost surprises to cost-aware platforms
From performance-only metrics to energy-aware infrastructure

Community-driven events like this help connect individual technologies into a cohesive mental model of where cloud-native systems are heading next.

AWS Bedrock AgentCore Hands-On Workshop: A Recap

Sonia Rahal — Sat, 20 Dec 2025 02:16:01 +0000

Location: Montréal AWS User Group
Date: December 18, 2025

TL;DR

This workshop was a hands-on journey through Amazon Bedrock AgentCore (a platform to run AI agents at scale), covering Runtime, Gateway, Identity, Memory, Built-in Tools, and Observability. Participants learned how to take AI agents from simple PoC (Proof of Concept) to secure, enterprise-ready applications.

Note: Each demo shown here is just one example, and the tools mentioned are a subset of what was explored during the workshop, not exhaustive.

My Story: Why Cloud and Agents Matter

Getting into cloud development isn’t just about learning services—it’s about understanding the real problem first. Code is a tool for reliability, not the final asset. The bigger picture is knowing why a company would use Amazon Bedrock AgentCore.

Enterprises want AI agents that can go from experiments to real-life, secure, scalable, and observable applications.

This workshop helped me connect the dots: how modules and tools work together to create agents that are not just smart, but reliable and trustworthy.

Target audience: Enterprises or developers wanting AI agents without managing all the complex infrastructure themselves. Their goals include building reliable agents, scaling safely, integrating with external systems, and having full visibility (observability) into agent operations.

Workshop Modules: A Story Through Examples

Runtime (Demo: Weather + Calculator Agent)

Imagine you want to create an agent that can tell the weather or perform calculations for users. Runtime is the engine that makes this possible.

What it is: A secure environment that runs your agent (the software that answers questions or performs tasks), handling infrastructure, scaling, and session management.
Why it matters: Developers can focus on what the agent does instead of worrying about servers or security.
Example Demo: Weather + Calculator agent. Runtime handled all container orchestration and session isolation.
Prompt Example: How is the weather?
Tools Used: Strands Agent, Elastic Container Registry, Terminal prompts
Takeaway: Runtime is the backbone that turns a prototype into a production-ready agent.

Gateway (Demo: Mars Weather Agent)

Imagine your agent needs data from external sources, like NASA’s weather data for Mars. Gateway is what connects your agent to the outside world.

What it is: The integration layer that allows agents to interact with external systems or APIs.
Why it matters: To provide real-world insights, agents need access to external information safely and reliably. Gateway allows defining tools with metadata about name, description, input/output schemas, and behavior.
Example Demo: Mars Weather agent called NASA’s Open APIs using an API key. Here is an API response example.
Prompt Example: "Hi, can you list all tools available to you" "What is the weather in northern part of the Mars"
Tools Used: REST APIs, AgentCore Gateway, API keys
Takeaway: Gateway bridges the agent and external systems, enabling actionable intelligence and structured tool integration.

Identity (Demo: AgentCore Runtime with vs without Authorization)

Imagine that not everyone should be able to use your agent, or some tasks require special permissions. Identity handles that.

What it is: Manages who can invoke agents and what they can access.
Why it matters: Protects sensitive data and ensures compliance in enterprise environments.
Example Demo: Weather agent invoked with authorization worked; without authorization, it returned an error AccessDeniedException.
Prompt Example: "How is the weather?"
Tools Used: Amazon Cognito, JWT tokens
Takeaway: Identity ensures only authorized users or systems interact with agents.

Memory (Demo: AI Learning Agent)

Imagine talking to an agent that remembers you and what you’ve discussed before. Memory makes this possible.

What it is: Stores context for multi-turn conversations.
Short-term memory: remembers context during a session (e.g., last few questions)
Long-term memory: preserves key information across sessions (e.g., user preferences, summaries)
Why it matters: Memory enables agents to give personalized and context-aware responses, improving over time.
Example Demo: The agent remembered the user’s name (Alex) and topics of interest in AI across sessions.
Prompt Example:
User: "My name is Alex and I'm interested in learning about AI."
Agent: "Hi Alex! I’m excited to help you learn about AI!"
Later:
User: "What was my name again?"
Agent: "Your name is Alex!"
Tools Used: AgentCore Memory, Strands MetricsClient
Takeaway: Short-term memory provides session-level context, long-term memory provides persistent context that improves user experience and enables agents to maintain continuity over time.

Built-in Tools (Demo: Amazon Revenue Extraction)

Imagine your agent needs to not just answer questions but extract and process data.

What it is: Pre-built tools like Browser or Code Interpreter extend agent capabilities.
Why it matters: Agents can perform specialized tasks safely and efficiently.
Example Demo: Extract Amazon revenue data from a website using Browser tool with Nova Act SDK.
Prompt Example: "Extract and return Amazon revenue for the last 4 years from stockanalysis.com."
Tools Used: Browser Tool, Code Interpreter, Nova Act SDK
Takeaway: Built-in tools enable agents to handle complex tasks, making them more useful in enterprise contexts.

Observability (Demo: CrewAI Travel Agent)

Imagine launching an agent in production and needing insight into its behavior. Observability solves this.

What it is: Monitoring and logging for agent workflows, tool usage, performance, and errors.
Why it matters: Ensures agents are traceable, measurable, and debuggable, which builds trust.
Example Demo Workflow:
Create a runtime-ready CrewAI agent using Amazon Bedrock, defining notably its role, goal, backstory, and task.
Instrument the agent with CrewAIInstrumentor().instrument() to enable observability.
Use Boto3 to invoke the agent: prompt = "What are some rodeo events happening in Oklahoma?"
- Multiple responses are found in parallel.
Dashboards on CloudWatch show runtime metrics across all agents, and clicking on a specific agent shows detailed metrics with custom time-frame filtering.
Tools Used: Amazon CloudWatch, Boto3 SDK, Crew AI, Scarf, AWS Distro for OpenTelemetry
Takeaway: Observability ensures production agents are monitored and performance is visible, supporting reliability and optimization.

Why Amazon Bedrock AgentCore Matters

Enterprises adopt Bedrock AgentCore to move from a proof of concept to production-ready AI applications. It provides:

Scalable deployment without managing infrastructure
Secure, authorized execution
Contextual and persistent memory
Integration with external systems and workflows
Full observability for performance and errors

Understanding these modules helps developers deliver AI solutions that meet enterprise goals.

Final Takeaways

Cloud development is about seeing the big picture, not just writing code.
AgentCore offers a sandbox to experiment safely with enterprise-grade agents.
Observability ensures live agents can be monitored, optimized, and trusted.
Hands-on workshops and community engagement are invaluable for learning how tools solve real-world problems.