Forem: Todd Sharp

Swapping Participant Tokens in Real-Time with Amazon IVS

Todd Sharp — Wed, 25 Mar 2026 13:58:17 +0000

Version 1.33.0 of the Amazon IVS Web Broadcast SDK introduced a new exchangeToken() API that lets you swap a participant's token without disconnecting from the stage. While this may sound like a simple API enhancement, it's actually a pretty big deal and enables some powerful features in your real-time streaming application. Prior to this, if you wanted to change a participant's capabilities or attributes, you had to leave the stage and rejoin with a new token. That meant a visible interruption for the participant and everyone else on the stage. With exchangeToken(), you can promote a viewer to a publisher, demote them back, or update their attributes on the fly. No disconnect, no rejoin, no interruption. In this post, we'll look at how to implement this API in your applications.

What We're Building

A single-page app with a Python (Flask) backend and vanilla JS frontend. The application will have two roles:

Host: joins with publish + subscribe capabilities
Guest: joins as subscribe-only, then can be promoted to publish via token exchange

We'll also add the ability to "feature" a participant by updating their token attributes, and demote or unfeature them, all without leaving the stage.

Self-Signed JWTs

One important thing to note: exchangeToken() requires self-signed JWT tokens. Tokens generated by the CreateParticipantToken API won't work here. You'll need to create a public key resource in your AWS account and sign your own tokens with the corresponding private key using ES384.

Generate a key pair:

openssl ecparam -name secp384r1 -genkey -noout -out private.pem
openssl ec -in private.pem -pubout -out public.pem

Import the public key:

aws ivs-realtime import-public-key \
  --public-key-material "$(cat public.pem)"

This gives you a public key ARN that you'll use as the kid in your JWT header.

Token Structure

The JWT payload for an IVS real-time stage token looks like this:

{
    "exp": now + 86400 #configurable,
    "iat": now,
    "jti": "unique-token-id",
    "resource": STAGE_ARN,
    "topic": STAGE_ID,
    "events_url": "wss://global.events.live-video.net",
    "whip_url": "https://[your-id].global-bm.whip.live-video.net",
    "capabilities": {
        "allow_publish": True,
        "allow_subscribe": True,
    },
    "user_id": "some-user",
    "attributes": {},
    "version": "1.0",
}

The header includes the algorithm, your public key ARN as the kid, and the type:

{"alg": "ES384", "kid": PUBLIC_KEY_ARN, "typ": "JWT"}

Mutable vs. Immutable Fields

This is the part that will trip you up if you're not paying attention. When you exchange a token, the SDK validates that certain fields haven't changed between the original and new token. The fields that can change are:

capabilities
user_id
attributes
exp
iat

Everything else (jti, resource, topic, whip_url, events_url, version) must be identical. If any immutable field differs, the SDK rejects the exchange before it even hits the server.

This means you need to preserve the jti from the original token and reuse it when generating the exchange token. In our demo, the backend returns the jti with the token response, and the frontend passes it back when requesting an exchange token.

The Backend

The Flask backend is straightforward. One endpoint generates tokens, accepting capabilities, user ID, attributes, and an optional jti for exchanges:

def generate_stage_token(user_id="", attributes=None,
                         publish=True, subscribe=True, jti=None):
    now = int(time.time())
    token_jti = jti or secrets.token_hex(6)
    payload = {
        "exp": now + 86400,
        "iat": now,
        "jti": token_jti,
        "resource": STAGE_ARN,
        "topic": STAGE_ID,
        "events_url": EVENTS_URL,
        "whip_url": WHIP_URL,
        "capabilities": {
            "allow_publish": publish,
            "allow_subscribe": subscribe,
        },
        "user_id": user_id,
        "attributes": attributes or {},
        "version": "1.0",
    }
    headers = {"alg": "ES384", "kid": PUBLIC_KEY_ARN, "typ": "JWT"}
    return jwt.encode(payload, PRIVATE_KEY,
                      algorithm="ES384", headers=headers), token_jti

The /api/token endpoint accepts a JSON body and returns the signed token along with the jti:

@app.route("/api/token", methods=["POST"])
def create_token():
    body = request.json or {}
    caps = body.get("capabilities", ["PUBLISH", "SUBSCRIBE"])
    user_id = body.get("userId", "")
    attributes = body.get("attributes")
    jti = body.get("jti")
    token, token_jti = generate_stage_token(
        user_id=user_id,
        attributes=attributes,
        publish="PUBLISH" in caps,
        subscribe="SUBSCRIBE" in caps,
        jti=jti,
    )
    return jsonify({"token": token, "jti": token_jti})

The Frontend

On the frontend, we use the IVS Web Broadcast SDK (1.33.0) to join a stage and handle token exchanges. Here's the fetchToken helper that talks to our backend:

let currentJti = null;

async function fetchToken(capabilities, userId, jti, attributes) {
  const res = await fetch("/api/token", {
    method: "POST",
    headers: { "Content-Type": "application/json" },
    body: JSON.stringify({ capabilities, userId, jti, attributes }),
  });
  const data = await res.json();
  currentJti = data.jti;
  return data.token;
}

We store the jti from the response so we can pass it back for subsequent exchanges.

Joining the Stage

The host joins with publish + subscribe. The guest joins with subscribe-only:

const caps = isGuest ? ["SUBSCRIBE"] : ["PUBLISH", "SUBSCRIBE"];
const userId = isGuest ? "guest" : "host";
const token = await fetchToken(caps, userId);

stage = new Stage(token, strategy);
await stage.join();

At this point, the host is publishing and the guest can see the host's video.

The guest joins as subscribe-only and can see the host, but isn't publishing anything yet. Notice the "Promote → Publish" button.

Promoting a Guest to Publisher

When the guest clicks "Promote → Publish", we request a new token with publish capabilities, grab the user's camera, and call exchangeToken():

const newToken = await fetchToken(["PUBLISH", "SUBSCRIBE"], "guest-promoted", currentJti);

const devices = await navigator.mediaDevices.getUserMedia({
  video: true,
  audio: true,
});
localStreams = devices.getTracks().map((t) => new LocalStageStream(t));
canPublish = true;

await stage.exchangeToken(newToken);

The guest is now publishing. No disconnect, no rejoin.

And here's what the server-side composition looks like with both participants publishing:

Featuring a Participant

Once a participant is publishing, they can be "featured" by exchanging their token with an updated attributes field. Since attributes is in the mutable set, this works without any issues.

If you're using server-side composition, you'll need to configure it to know which attribute to look for when determining the featured participant. When starting a composition, set the "Featured slot attribute name" to match the attribute key you're using in your token. In our case, it is featured.

When a participant's token includes featured: "true" in their attributes, the composition will automatically give them the featured slot in the layout. This is what makes exchangeToken() so useful here. You can toggle a participant in and out of the featured slot just by swapping their token attributes, and the composition reacts accordingly.

const newToken = await fetchToken(["PUBLISH", "SUBSCRIBE"], userId, currentJti, { featured: "true" });
await stage.exchangeToken(newToken);

After the exchange, the participant's attributes are updated. You could use the STAGE_PARTICIPANT_METADATA_CHANGED event to react to this on other clients, for example to highlight the featured participant in the UI.

The server-side composition reflects the featured participant:

Unfeaturing and Demoting

Unfeaturing works the same way, just exchange with empty attributes:

const newToken = await fetchToken(["PUBLISH", "SUBSCRIBE"], userId, currentJti, {});
await stage.exchangeToken(newToken);

And demoting back to subscribe-only is just another exchange with reduced capabilities. Make sure to stop the local media tracks and pass empty attributes to clear the featured status:

const newToken = await fetchToken(["SUBSCRIBE"], "guest", currentJti, {});

localStreams.forEach((s) => s.mediaStreamTrack.stop());
localStreams = [];
canPublish = false;

await stage.exchangeToken(newToken);

Things to Watch Out For

A few gotchas I ran into while building this:

Use user_id, not userId. The SDK's mutable field list uses snake_case (user_id). If you use userId (camelCase) in your token payload, the SDK treats it as immutable and will reject any exchange that changes it.
Preserve the jti. The jti is immutable. Your exchange token must have the same jti as the original. Generate it once on the initial token request and reuse it for all subsequent exchanges.
Self-signed tokens only. Tokens from the CreateParticipantToken API don't work with exchangeToken(). You need to sign your own JWTs.
Prepare media before exchanging. If you're promoting a subscriber to a publisher, get the user's camera/mic and update your strategy before calling exchangeToken(). The SDK will call your strategy's stageStreamsToPublish() after the exchange completes.

Summary

The exchangeToken() API in SDK 1.33.0 opens up a lot of possibilities for dynamic stage management. Promoting viewers to speakers, featuring participants, adjusting capabilities on the fly - all without the disruption of a disconnect/rejoin cycle. The key is understanding which token fields are mutable and which aren't, and making sure your backend can generate exchange tokens that preserve the immutable fields from the original.

Check out the IVS Real-Time Streaming User Guide for more details on stages and token management.

Image by Igor Ovsyannykov from Pixabay

Creating Interactive Gaming Experiences with Amazon GameLift Streams and Amazon Interactive Video Service

Todd Sharp — Wed, 11 Feb 2026 00:32:08 +0000

If you're in the business of creating video games, you know that creating and marketing a game is not easy. Game studios face numerous challenges related to sales and marketing, gamer experience, and of course testing & development. For example, studios might struggle with choosing a distribution and monetization strategy as well as think about player acquisition and retention, piracy and cheating, platform and hardware support, and much more. Not to mention the actual development related issues like choosing an engine, language, framework and how to conduct play testing and validation.

There's no easy road to making a profitable game that is fun to play and stands out from the crowd. You're swimming in a large pool that is full of talented sharks for a small piece of every gamer's time, money and attention. One of the best paths to success is creating a unique, engaging experience with endless playability and high potential for viral moments. This requires a solution that creates shared community experiences around your gameplay. In this post, we'll look at a solution that we've been working on called "Project Engage" that can help you to create dynamic, engaging gaming experiences with AWS.

Enabling Your Game’s Success with AWS

Everyone knows that viral moments on social media are the cheat code that can skyrocket your game into the hearts and minds of players around the world. User-generated content live streaming platforms like Twitch have launched many games from small, indie studios into viral community superstardom. The combination of Amazon GameLift Streams, Amazon Interactive Video Service (Amazon IVS) and AWS AppSync is one approach that you can utilize to create a unique, interactive gaming experiences that delight both your players and your community.

So what are these services and how can you use them?

Level 1: Boost Your Game with the Cloud

Amazon GameLift Streams is cloud gaming on AWS. It's an AWS service that gives anyone with an internet connection the ability to play fun and exciting games without needing high-end hardware. Amazon GameLift Streams is an AWS service that enables publishers to spawn on-demand, low latency games to players, globally at up to 1080p at 60 FPS. Players don't need a $3000 rig with a specialized GPU - just a stable connection and a controller or keyboard and mouse. Amazon GameLift Streams is easy to get started with and it directly addresses some of the challenges related to distribution that traditionally plague game studios. As we'll see below, when you combine it with Amazon IVS, you'll also be equipped to address some of the challenges around player acquisition and retention that keep you up at night.

Level 2: User-Generated Content Is the New Meta

Amazon IVS is managed live streaming at scale on AWS. Like Amazon GameLift Streams, it has a proud lineage as it was born from the popular live-streaming platform Twitch. Since it was built by video experts and is powered by a global network of purpose-built, video optimized infrastructure, it enables you to focus on what matters: your community and your gamer’s experience. It’s not some watered-down, knock-off version. It’s the same servers and pipes that deliver Twitch’s traffic fully available for your live streaming workloads. This means that you can build high-quality, globally scalable, ultra-low latency interactive live streaming applications.

We’ll dig deeper into the different ways that you can broadcast your Amazon GameLift Streams to Amazon IVS below, but there are two main approaches for this solution. The first (and most performant) option is to stream directly from the Amazon GameLift Streams instance via a small "sidecar" binary, like so:

The second option is to utilize the Amazon IVS Web Broadcast SDK to re-stream the gameplay from the player’s browser.

Want recordings for AI-based analysis or VOD playback later? Amazon IVS can record your live streams directly to Amazon Simple Storage Service (Amazon S3), so with some simple configuration your streams are automatically stored for future use.

Level 3: Chat, Is This Real?

AWS AppSync is the "glue" that binds these experiences by empowering you to send low-latency, high throughput chat and messages between your viewers and players. This messaging channel enables you to create dynamic gameplay that changes based on live-stream viewer feedback. Want to spawn a health pickup when viewers send their love to the player? Done. Want to modify the environment, or spawn enemies based on community interactions? No problem! What about giving your players new monetization options based on the popularity of their streams or their connection with the community? Easy!

Level 4: Opening the Loot Box

This combination of AWS services helps game developers solve some of their biggest problems. Betting and regulated gaming studios can give players real-time interaction from stream viewers which means that the viewers can also place wagers directly on the action which provides another level of excitement and entertainment. Instead of "leaning back" and consuming, they're "leaning forward" and participating. In this video, the game player is on the left, and the live stream viewer is on the right.

Cheat Code for Viral Success

Depending on your use case and requirements, there are several ways to approach this solution. GameLift Streams are delivered to the player via an ultra-low latency WebRTC connection.

Direct Broadcasting From the GameLift Streams Instance (Sidecar)

For use cases that demand less network and resource utilization on the client side, the Amazon IVS broadcast can be sent directly from the Amazon GameLift Streams instance via the sidecar approach that was mentioned earlier in this post. In this approach, a small binary is packaged along with the game binary and deployed alongside the application. When the Amazon GameLift Session is launched, necessary configuration environment variables are passed via the Amazon GameLift Streams API. These variables contain an Amazon IVS participant token, as well as several video configuration arguments (for resolution, FPS, bitrate, etc.).

As you can see in the following video, the latency with the sidecar approach is nearly imperceptible. Because the viewer interactions are handled via AWS AppSync, there is no need to sacrifice any user interactivity and engagement. Again, the player is on the left, and the viewer is on the right. If you’re interested in viewing the measured latency between the player and viewer, try to pause the video at different points and check the timer in the upper left corner.

Rebroadcasting From the Player's Client (Amazon IVS Web Broadcast SDK)

Another option is to rebroadcast the gameplay. Once the GameLift Stream is received on the player side, you can capture that stream in the player’s browser and re-stream it to Amazon IVS via the Amazon IVS Web Broadcast SDK. This requires a bit more bandwidth on the player’s side, but it’s usually not enough to be noticeable or have much impact on the experience.

The architecture shown above is demonstrated in the following video. On the left side of the screen, the player is playing in fullscreen mode, and the live stream view is shown on the right hand of the screen. There’s a barely perceivable amount of latency at times, but it’s not enough to impact the viewer interactivity. Notice how the viewer interactions (chat messages and emotes) are integrated into the game, so the player can feel the community support and engagement. Near the end of the video, you can see another advantage of using Amazon IVS real-time stages for the broadcast when you see the viewer’s view exit fullscreen. Since Amazon IVS real-time stages support multiple participants publishing, the player’s webcam and microphone are published as a separate stream so that viewers can see and hear the player without blocking the gameplay with a webcam overlay.

Easter Egg: Passing Control Between Multiple Participants

Play testing should happen early and often, but approaches vary. Some studios have dedicated testers, while others contract out and request recorded sessions for analysis. Imagine a system where:

Play testers play directly from their browser without downloading the game (+1XP)
You could join interactive sessions and provide real-time feedback (+10XP)
Sessions automatically record to Amazon S3 for analysis (+100XP)
Participants could pass gameplay control between each other (+10000XP)

Since Amazon GameLift Streams supports reconnecting to recently disconnected streams, this is achievable. Implement a control request system via AppSync messages—when a request is approved, the current session disconnects and passes the reconnect ID to the new player, who reconnects and broadcasts gameplay to other participants

Will You Accept the Quest?

To try this solution, we have created a couple of sample repos on GitHub that can be used with your AWS account. First, you will need to build the sidecar client application and place it in the same directory as your game binary. Then upload the sidecar and game binary to an S3 bucket and create a GameLift Streams Application and Stream Group in your AWS account. Once you have the Stream Group and Application IDs, you can deploy the sample multi-view React web app to your AWS account using to the deployment instructions in the repo.

Amazon GameLift Streams, Amazon IVS, and AWS AppSync offer numerous opportunities to improve your game development lifecycle and differentiate your games in a competitive market. Together, these services help create unique, interactive gaming experiences. It's time to level up your games on AWS.

Virtual Assistants with Amazon IVS and OpenAI's Realtime API

Todd Sharp — Wed, 01 Oct 2025 12:52:35 +0000

We recently explored building a virtual agent that seamlessly joins your Amazon Interactive Video Service (Amazon IVS) real-time streams as a conversational assistant with vision capabilities. While Amazon Nova Sonic S2S delivers solid performance and integrates beautifully with the AWS ecosystem, you might be curious about harnessing OpenAI's cutting-edge real-time model for your agents. This post walks you through a demo integration that makes this possible.

Before we dive into the technical details, here's a glimpse of what a conversation looks like with OpenAI's real-time model working behind the scenes.

The results speak for themselves: crisp, natural voice quality, lightning-fast response times, and remarkably accurate visual analysis that nails what it's looking at. It's genuinely impressive how well everything comes together.

❓ How Is This Different?

This integration leverages the Python aiortc library to tap into the remote participant's stream, channeling their audio directly into the gpt-realtime model. Meanwhile, the agent establishes itself as a publisher on the Amazon IVS stage. When gpt-realtime generates its response, that audio flows right back into the agent's published feed. Need vision? The agent captures the current frame from the remote participant and instantly analyzes what's happening. It's elegantly straightforward - the gpt-realtime model essentially becomes another participant that both publishes and subscribes to the stage.

🤖 Try It Out!

Ready to get your hands dirty? Start by spinning up a new Amazon IVS stage and generating a remote participant token for your agent (refer to the user guide for more info if you're new to IVS).

aws ivs-realtime create-stage \
  --name "my-stage" \
  --region us-east-1 \
  --participant-token-configurations '[
    {
      "duration": 720,
      "attributes": {
        "username": "gpt-realtime-agent"
      }
    },
    {
      "duration": 720,
      "attributes": {
        "username": "local-participant"
      }
    }
  ]' \
  --query '{stageArn: stage.arn, gptParticipant: {participantId: participantTokens[0].participantId, token: participantTokens[0].token}, localParticipant: {participantId: participantTokens[1].participantId, token: participantTokens[1].token}}' \
  --no-cli-pager

This will produce output similar to the following:

{
    "stageArn": "arn:aws:ivs:us-east-1:639934345351:stage/abcdef123456",
    "gptParticipant": {
        "participantId": "6APQqRu2XnqK",
        "token": "eyJhbGciOiJLTVMiLCJ0eXAiOiJKV1QifQ...."
    },
    "localParticipant": {
        "participantId": "m243ru5A2idL",
        "token": "eyJhbGciOiJLTVMiLCJ0eXAiOiJKV1QifQ...."
    }
}

No client app yet? No problem - you can publish directly from the AWS Management Console. If you are using the AWS Console, make sure to grab your participant ID from the 'Stage sessions' section by clicking the active session ID.

You can also try things out with this CodePen. Paste the token from the localParticipant into the Token input box and click 'Join' to publish to your new stage.

Next, clone the sample repo and navigate to the stages-gpt-realtime directory. The README there walks you through the setup process. There's even a simple management script that you can use to launch agent instances via WebSocket messages from the frontend. When you're ready to roll, fire up the script with your credentials:

python ivs-stage-gpt-realtime.py \
  --token <GPT_PARTICIPANT_TOKEN> \
  --subscribe-to <LOCAL_PARTICIPANT_ID> \
  --openai-key <OPENAI_API_KEY>

Your agent will hop onto the stage, subscribe to your local participant, and you're off to the races with real-time AI conversations!

The script provides a simplified video track with an audio visualization which can be overridden for your use case. I personally like to create a client side visualization which tends to be a bit cleaner and more responsive (as you can see in the video above).

👏 Extending The Solution

This sample gives you the foundation to start building something amazing. Dig into the repo code and OpenAI's documentation to discover how you can supercharge the solution - add custom tools for function calling, or transform the gpt-realtime model into a smart transcription service for your stages. Drop a comment and let me know what creative solutions you'll build with Amazon IVS and gpt-realtime!

Live Streaming to Amazon IVS on an ESP32 Microcontroller with Embedded Sensor Metadata

Todd Sharp — Fri, 19 Sep 2025 12:32:53 +0000

Until recently, quality live streaming from embedded devices like an ESP32 was severely limited. ESP32-CAM boards can only produce low-quality, low frame rate (5-15 FPS) streams of MJPEG images via RTSP at resolutions like 320x240. These microcontrollers simply didn't have enough CPU and RAM for real video processing with modern codecs. Then Espressif changed everything by releasing the ESP32-P4-Function-EV-Board, which features a dual-core 400MHz RISC-V processor, 32MB PSRAM, and most importantly, a hardware H.264 encoder capable of 1080P@30fps streaming. It's a game changer to be sure, but all of that hardware is nothing without a software library to push those beautifully encoded H264 streams to a valid destination. Thankfully Espressif came through again with the esp-webrtc-solution library. So does it live up to they hype? More importantly, can we integrate it with a managed live streaming service like Amazon IVS? Let's find out!

⚠️ Disclaimer: This is a personal project!

📟 The Board

🛑 Before we go any further - just look at this thing!

⛔️ Note: Ignore that connected sensor for a moment. We'll get to that!

It's beautiful! The form factor is reminiscent of a Raspberry Pi - a huge step up from the tiny ESP32 boards we're used to seeing. Granted, sometimes a smaller board is better when we're integrating it into a larger project that we want to be compact, but with all the horsepower 🐎 this thing has, it was bound to be a little larger than usual. The kit I bought even comes with a full display, but I'm not using it for now. Side note: I'd love a nice project case for this kit, but I haven't found any available yet. To give you the full picture, here are the specs:

Dual-core 400MHz RISC-V processor with 32MB PSRAM
Hardware H.264 video encoder for real-time streaming
1920x1080 Full HD video at 25fps
MIPI-CSI camera interface (2MP camera included)
MIPI-DSI display output (7" 1024x600 touchscreen included)
Wi-Fi 6 & Bluetooth 5 LE via ESP32-C6 module
10/100 Ethernet port for wired networking
Dual USB 2.0 ports (Type-A host + Type-C device)
Professional audio system with codec, microphone, and 3W amplifier (bring your own speaker)
40-pin GPIO header for sensor/peripheral expansion
MicroSD card slot for storage expansion
Built-in USB Serial/JTAG debugging

Bottom Line: The ESP32-P4-Function-EV isn't just another microcontroller - it's a complete multimedia development platform that brings professional-grade video, audio, and connectivity features that would require multiple separate boards and components with traditional ESP32 setups. If I had one complaint, it's that the video color from the included 2MP SC2336 camera seems a bit off (see screenshot below). It's not terrible, but we're used to much higher quality cameras these days. I may try to find a way to upgrade that in the future.

🤝 Integrating with Amazon IVS

About a month ago, me and my friend Kiro started trying to get the ESP32-P4 working with Amazon IVS real-time stages. Since stages support WHIP ingest, I started with the whip_demo sample solution. I quickly ran into a few issues with the core library - one major blocker in particular was that it didn't properly support redirects during SDP negotiation. This meant that when requesting an SDP for the IVS stage, the participant token was stripped from the request resulting in a 400 error. I worked on a fix for this (and a few other feature requests) and submitted a few PRs that have all been integrated into the core library (thank you @TempoTian!).

Yeah, the color is a bit off. But the resolution and latency is amazing!

🏷️ SEI Support

Once I got the board broadcasting to Amazon IVS, the next step was pretty obvious (to me at least). This is a microcontroller with 55 programmable GPIO pins. It was made to read sensors, why not come up with a way to publish all that potential data? And since real-time stages have support for Supplemental Enhancement Information (SEI), why not add support for publishing sensor (or really any data) as SEI? This required another PR to add a hook into the core lib to expose the video frames so that we can manipulate it to insert the SEI NAL units. Once this hook was implemented, Kiro and I were able to implement an SEI publishing system. This is why my image of the board above has a DHT-11 sensor attached. If you're interested in how this works, check the DHT-11 markdown doc in the repo (link below).

🎉 Try it Out!

To try it out for yourself, check out the ivs-esp-whip-demo on GitHub.

The documentation is pretty solid in the repo and should help you get it up and running. If you have any questions, post them below. For problems getting it running or enhancement ideas, please file an issue on the repo!

Happy streaming!

4 Ways to Supercharge Your HLS Live Streaming App with AI-Powered Analysis

Todd Sharp — Wed, 17 Sep 2025 13:52:36 +0000

Amazon IVS customers building live streaming platforms want to focus on creating engaging experiences, not wrestling with complex video analysis pipelines. While we handle the heavy lifting of low-latency HLS delivery, they get to focus on the things they are great at - content discovery 🔍, moderation 🛡️, and user engagement 💫. In our last post, we looked at how to solve some of these problems in real-time WebRTC based streams. Since that post, I've gotten many questions from developers keep asking us about AI solutions for low-latency (RTMP+HLS) channel analysis 🤖, so in this post we're going to dig into four powerful ways to use AI and open-source tools to transform your live streaming app's capabilities!

1️⃣ Real-Time Frame Analysis for Content Discovery

Want to know what's actually happening in your streams beyond just titles and tags? Frame-by-frame analysis using Claude via Amazon Bedrock can provide incredibly detailed descriptions of live content, perfect for content discovery, accessibility, and moderation.

Check out the ivs-channel-subscribe-analyze-frames.py script in this repo. Once configured, run it with:

python channels-subscribe/ivs-channel-subscribe-analyze-frames.py \
  --playlist-url "https://example.com/playlist.m3u8" \
  --highest-quality \
  --analysis-interval 30

This gives you detailed frame analysis output like:

2025-09-15 15:48:35,507 - INFO - 🤖 VideoFrameAnalyzer initialized with 3.0s interval
2025-09-15 15:48:35,507 - INFO - 🌍 Using Bedrock region: us-east-1
2025-09-15 15:48:35,507 - INFO - 🧠 Using model: us.anthropic.claude-sonnet-4-20250514-v1:0
2025-09-15 15:48:35,507 - INFO - 🤖 Video frame analysis enabled (every 3.0s)
2025-09-15 15:48:36,073 - INFO - 📊 Stream FPS: 30.0
2025-09-15 15:48:36,074 - INFO - 🚫 Video display disabled - press Ctrl+C to quit
2025-09-15 15:48:36,254 - INFO - 🔍 Analyzing frame...
2025-09-15 15:48:47,551 - INFO - ✅ Frame analysis completed
2025-09-15 15:48:47,551 - INFO - 📝 Analysis:

This video frame shows a professional kitchen environment with a chef actively cooking. Here's what I observe:

Person:

A chef wearing traditional white chef's attire (white chef's coat and white apron)

The chef appears to be middle-aged and is focused on cooking at the stovetop

Kitchen Environment:

Commercial-grade stainless steel kitchen with professional equipment

Multiple cooking stations visible with gas burners

Industrial ventilation hood system overhead with exhaust fans

Stainless steel prep surfaces and countertops throughout

Cooking Activity:

The chef is working at a stovetop with what appears to be a pan containing yellow/golden colored food (possibly eggs, butter, or a sauce)

Active cooking in progress, suggesting this is during meal preparation

Equipment & Features:

Professional-grade gas range with multiple burners

Overhead heat lamps/warming lights visible

Industrial kitchen ventilation system

Red rubber anti-fatigue floor mats for safety and comfort

Various kitchen tools and equipment visible in the background

Setting:

This appears to be a high-end restaurant or commercial kitchen

The lighting suggests this could be during active service hours

The setup indicates this is likely a live cooking demonstration or behind-the-scenes view of restaurant operations

The overall scene captures the fast-paced, professional environment of commercial food preparation.

This analysis is perfect for automatically categorizing streams, generating accessibility descriptions, or powering content-based recommendations! 👨‍🍳🍝

2️⃣ Live Transcription with Timed Metadata Publishing

Real-time transcription isn't just great for accessibility - it's a goldmine for content discovery and moderation. The ivs-channel-subscribe-transcribe.py script uses OpenAI Whisper for multi-language transcription and can even publish results back to your IVS channel as timed metadata.

python channels-subscribe/ivs-channel-subscribe-transcribe.py \
  --playlist-url "https://example.com/playlist.m3u8" \
  --language en \
  --whisper-model base \
  --publish-transcript-as-timed-metadata

Which will produce output that looks similar to this:

2025-09-15 15:52:46,480 - INFO - 🎤 Starting audio chunk recording
2025-09-15 15:52:53,317 - INFO - ⏹️  Stopping recording after 6.84s with 346 frames
2025-09-15 15:52:53,338 - INFO - Processing audio chunk 6 in memory (shape: (118101,))
[TRANSCRIPT] A little bit of salt, a little bit of cracked pepper. We're gonna let that sweat a little bit. I have a big pot of pasta water here.
2025-09-15 15:52:54,167 - INFO - ✅ Published transcript metadata chunk 1/1 (150 bytes)
...
[TRANSCRIPT] right before I don't want to go into brown brown brown so we're going to put some Italian parsley in there.
...
[TRANSCRIPT] And it's not just for color, it's fragrant, it's earthy balances, the dish. I'm gonna go in with our spaghetti.
2025-09-15 15:53:07,786 - INFO - ✅ Published transcript metadata chunk 1/1 (129 bytes)
2025-09-15 15:53:07,786 - INFO - 🎉 Successfully published 1/1 transcript chunks

On the player side, we can consume the metadata and render it as a caption or translate it as necessary.

Of course this data can also be persisted to your app's database and subsequently queried by your app. Which means your viewers can search for streams by actual spoken content in addition to using the transcripts for captions and automated content moderation! 🗣️📝

3️⃣ Comprehensive Video Segment Analysis

Another options is to analyze the video content itself (not just a single frame). The ivs-channel-subscribe-analyze-video.py script records video segments and analyzes them using TwelveLabs Pegasus for holistic content understanding.

python channels-subscribe/ivs-channel-subscribe-analyze-video.py \
  --playlist-url "https://example.com/playlist.m3u8" \
  --analysis-duration 15 \
  --highest-quality

This provides incredibly rich analysis:

2025-09-15 15:57:14,710 - INFO - 🤖 VideoAnalyzer initialized with 15.0s recording duration
2025-09-15 15:57:14,710 - INFO - 🌍 Using Bedrock region: us-west-2
2025-09-15 15:57:14,710 - INFO - 🧠 Using model: us.twelvelabs.pegasus-1-2-v1:0
2025-09-15 15:57:14,710 - INFO - 🤖 Video analysis enabled (recording duration: 15.0s)
2025-09-15 15:57:15,220 - INFO - 📊 Stream FPS: 30.0
2025-09-15 15:57:15,221 - INFO - 🚫 Video display disabled - press Ctrl+C to quit
2025-09-15 15:57:15,226 - INFO - 🎥 Starting audio and video chunk recording
2025-09-15 15:57:30,238 - INFO - ⏹️  Stopping recording after 15.01s
2025-09-15 15:57:30,238 - INFO - ✅ Recorded 349 video frames
2025-09-15 15:57:30,238 - INFO - ✅ Recorded 0 audio frames
2025-09-15 15:57:32,890 - INFO - ✅ Successfully encoded video: 3428047 bytes, 4570732 base64 chars
2025-09-15 15:57:32,891 - INFO - 🔍 Analyzing video...
2025-09-15 15:57:52,693 - INFO - ✅ Video analysis completed
2025-09-15 15:57:52,693 - INFO - 📝 Analysis:

In the video from a live stream, a chef dressed in a white uniform is seen in a kitchen setting. He is initially holding a small piece of red chili pepper, which he brings to his mouth to taste. The text overlay on the video reads "ONE PASTAS" and "PASTA AGIO OLIO WITH PEPPERONCINO." After tasting the chili pepper, the chef puts it down and picks up a knife to chop the chili pepper on a white cutting board. The text overlay changes to indicate that he likes very spicy food and is going to use two chili peppers without removing the seeds, suggesting the dish will be quite hot. The chef continues to chop the chili pepper while the camera focuses on his actions. The background of the kitchen includes various equipment and ingredients, but the primary focus remains on the chef and his task of preparing the spicy pasta dish.

This analysis is perfect for content categorization, highlight generation, and creating searchable content libraries!

4️⃣ Advanced Audio-Video Synchronization

For the most sophisticated analysis, ivs-channel-subscribe-analyze-audio-video.py uses PyAV for proper audio-video stream handling with perfect synchronization.

python channels-subscribe/ivs-channel-subscribe-analyze-audio-video.py \
  --playlist-url "https://example.com/playlist.m3u8" \
  --analysis-duration 20 \
  --highest-quality

This approach ensures your AI analysis captures the complete context of both audio and video streams, making it ideal for complex content like educational streams, gaming commentary, or multi-person discussions where timing matters! 🎯⚡ Here's the output for this script:

2025-09-15 16:00:06,306 - INFO - 🤖 VideoAnalyzer initialized with 15.0s recording duration
2025-09-15 16:00:06,306 - INFO - 🌍 Using Bedrock region: us-west-2
2025-09-15 16:00:06,306 - INFO - 🧠 Using model: us.twelvelabs.pegasus-1-2-v1:0
2025-09-15 16:00:06,306 - INFO - 🤖 Video analysis enabled (recording duration: 15.0s)
2025-09-15 16:00:06,306 - INFO - 🚫 Video display disabled - press Ctrl+C to quit
2025-09-15 16:00:06,306 - INFO - 🔗 Opening stream with PyAV...
2025-09-15 16:00:07,096 - INFO - 🔊 Found audio stream: aac, 48000Hz, 2 channels
2025-09-15 16:00:07,096 - INFO - 📺 Found video stream: h264, 360x640
2025-09-15 16:00:07,100 - INFO - 🎥 Starting audio and video chunk recording
2025-09-15 16:00:24,348 - INFO - ⏹️  Stopping recording after 17.25s
2025-09-15 16:00:24,348 - INFO - ✅ Recorded 615 video frames
2025-09-15 16:00:24,348 - INFO - ✅ Recorded 963 audio frames
2025-09-15 16:00:29,360 - INFO - ✅ Successfully encoded video: 6016865 bytes, 8022488 base64 chars
2025-09-15 16:00:29,367 - INFO - 🔍 Analyzing video...
2025-09-15 16:01:06,321 - INFO - ✅ Video analysis completed (finish reason: unknown)
2025-09-15 16:01:06,321 - INFO - 📝 Analysis:

In the video, a chef is preparing a dish of pasta with pesto. Early on, the chef uses tongs to transfer the pasta from a pan to a plate, where the pasta is coated in a green pesto sauce with visible red pepper flakes. The chef then grates some fresh pecorino cheese over the pasta using a metal grater, as mentioned in the dialogue. The chef carefully distributes the pasta and cheese, ensuring a well-presented dish. The camera captures these actions in detail, focusing on the chef's hand movements and the final presentation of the pasta.

Throughout the video, there are clear and readable captions overlaid on the screen, providing additional information about the dish and the chef's actions. The captions mention the type of pasta, the sauce used, and the chef's name. The overall presentation is professional and visually appealing, making it suitable for content discovery, moderation, or accessibility purposes. The chef holds up the plate towards the end, giving a clear view of the vibrant colors and textures of the pasta dish.

🏁 Summary

AI-powered channel analysis opens up incredible possibilities for live streaming platforms:

• Content Discovery: Let users find streams by actual content, not just titles 🔍
• Accessibility: Automatic captions and descriptions for all users ♿
• Moderation: Real-time content analysis for safety and compliance 🛡️
• Monetization: Target ads based on actual stream content 💰
• Engagement: Generate highlights and clips automatically ✨

The sample repository provides everything you need to get started. What creative ways will you use AI to enhance your streaming platform? Drop your ideas in the comments!

5 Ways to Improve Your UGC Live Streaming App with AI

Todd Sharp — Wed, 03 Sep 2025 14:45:51 +0000

Most of the customers that depend on Amazon IVS for managed live streaming would rather focus on what they do best - creating safe and engaging communities - instead of worrying about the challenges related to delivering low-latency live video to a global audience at scale. Which makes total sense - if they can depend on us to handle the complexities related to video, they can focus on creating the best possible user experience. There are enough challenges that they have to worry about. Things like monetization, moderation, content discovery, analytics (and much more) are tricky enough to manage without having to worry about transcoding, packaging and delivery of video bits around the world. As I talk to developers around the world, many are starting to ask about how they might use AI to help solve some of these problems. So in this post, we'll take a look at three possible ways to use AI and various open-source tools to make it easier to create a better UX for a social UGC live streaming application.

🐉 Here Be Dragons! Some of the methods that we'll look at below all utilize the WebRTC-HTTP Egress Protocol - or WHEP - to subscribe to an Amazon IVS real-time stage. Technically this is an unsupported protocol on Amazon IVS. This means that the usage of WHEP and the endpoints related to subscribing to a real-time stream are not documented, and you'll be unable to receive support related to WHEP if you have any issues. That said, all of the scripts below have been tested to work as of the initial publish date of this blog post. Keep this in mind before implementing any of these solutions and consider this post experimental and educational!

1️⃣ Transcribing a Real-Time Stream

Providing captions (and even translations) for a live stream is great for viewer engagement and helps to make your app accessible for all users, but it's not the only reason why you might want to transcribe a real-time stream. Once you've got a running transcript you can now use AI to summarize the content of that stream and use it for moderation discovery purposes. Now your viewers can find the perfect live stream based on the actual content of the stream, not just based on what the streamer decided to use for a title, description and tags.

To transcribe a real-time live stream, check out the ivs-stage-subscribe-transcribe.py script in this repo. Once you've configured the script, you can run it with:

python ivs-stage-subscribe-transcribe.py \
  --participant-id "participant123" \
  --token "eyJ0eXAiOiJKV1QiLCJhbGciOiJFUzM4NCJ9..."

Which will give you output that looks similar to the following:

2025-09-02 15:10:21,022 - INFO - Processing audio chunk 1 in memory (shape: (80000,))
[TRANSCRIPT] or a correction? I mean, that's kind of, you kind of glance over the...
2025-09-02 15:10:26,037 - INFO - Processing audio chunk 2 in memory (shape: (80000,))
[TRANSCRIPT] important part there. that's a mighty big that ellipsis in that reply.
2025-09-02 15:10:31,037 - INFO - Processing audio chunk 3 in memory (shape: (80000,))
[TRANSCRIPT] it is doing a whole heck of a lot of work right there. She kind of went
2025-09-02 15:10:36,020 - INFO - Processing audio chunk 4 in memory (shape: (80000,))
[TRANSCRIPT] from zero to a billion in one giant step.

This script is a great starting point. You can modify it for your needs by persisting each transcript to a database, or publishing it over a WebSocket.

2️⃣ Analyzing Individual Frames of a Real-Time Stream

As we just saw, transcribing can provide a huge amount of context for discover, moderation and more. But it's only part of the story - sometimes what is unspoken can provide the rest. For this, we can grab individual frames from the live stream and analyze them with Amazon Bedrock. The repo that we looked at earlier has an example of frame analysis in the ivs-stage-subscribe-analyze-frames.py demo. Once this script is ready to run (check out the https://github.com/aws-samples/sample-amazon-ivs-python-demos/README.md for setup instructions) you can run it with:

python ivs-stage-subscribe-analyze-frames.py \
  --subscribe-to "participant123" \
  --token "eyJ0eXAiOiJKV1QiLCJhbGciOiJFUzM4NCJ9..." \
  --analysis-interval 5

Which will provide output like this:

2025-09-02 15:18:38,449 - INFO - 🤖 Frame analysis enabled (interval: 5.0s)
2025-09-02 15:18:38,552 - INFO - 🧊 ICE connection state changed to: completed
2025-09-02 15:18:38,598 - INFO - 🔗 Connection state changed to: connected
2025-09-02 15:18:44,140 - INFO - 🔍 Analyzing frame for participant y1rhVboBJgQb...
2025-09-02 15:18:51,417 - INFO - ✅ Frame analysis completed for participant y1rhVboBJgQb
2025-09-02 15:18:51,417 - INFO - 📝 Analysis:

Here's the analysis broken out for readability:

This is a screenshot from a Twitch live stream showing:

Main Subject: A middle-aged man with a gray beard wearing a black baseball cap and what appears to be a white hoodie or jacket with orange/red accents. He's speaking into a black microphone and appears to be seated in a dark gaming chair.

Setting: The background shows a dimly lit room with purple/blue ambient lighting. There's a framed picture or artwork visible on the wall behind him, and the Twitch logo is prominently displayed in purple neon-style lighting on the left side of the frame.

UI Elements:

A chat message is visible at the bottom from user "raymondcamden" asking "hey betty - can you explain the entirety of all existence, starting from the big bang to the current time?"

There's an anime-style avatar/VTuber character visible in the bottom right corner - appears to be a figure with green hair

An orange circular logo/symbol is visible in the upper right

The typical Twitch streaming interface elements are present

Activity: This appears to be an interactive streaming session where the streamer is responding to viewer questions or comments, likely in a talk show or Q&A format rather than gaming content.

The overall aesthetic suggests this is a professional streaming setup with deliberate lighting and branding elements.

This kind of overview is so much more helpful than anything even the streamer themselves could provide.

3️⃣ Analyzing Audio/Video Chunks in a Real-Time Stream

We can take #2 one step further, and instead of analyzing a single video frame we can use TwelveLabs Pegasus to analyze a chunk of actual video (including audio). This combines the effectiveness of transcription (audio context) and frame analysis into a single, holistic analysis.

Try ivs-stage-subscribe-analyze-video.py:

python ivs-stage-subscribe-analyze-video.py \
  --subscribe-to "participant123" \
  --token "eyJ0eXAiOiJKV1QiLCJhbGciOiJFUzM4NCJ9..." \
  --analysis-duration 10

Which will provide output like:

2025-09-02 15:22:37,494 - INFO - 🔍 Video analysis active - recording 10.0 second videos
2025-09-02 15:22:37,494 - INFO - 🔄 1 connection(s) active. Press Ctrl+C to exit.
2025-09-02 15:22:37,494 - INFO - 🎵 Starting audio processing task
2025-09-02 15:22:37,494 - INFO - 🧐 Starting audio and video processing task
2025-09-02 15:22:37,494 - INFO - 🤖 Video analysis enabled (duration: 10.0s)
2025-09-02 15:22:37,596 - INFO - 🧊 ICE connection state changed to: completed
2025-09-02 15:22:37,647 - INFO - 🔗 Connection state changed to: connected
2025-09-02 15:22:37,745 - INFO - 🎥 Starting audio and video chunk recording
2025-09-02 15:22:47,776 - INFO - ⏹️  Stopping recording after 10.03s
2025-09-02 15:22:47,776 - INFO - ✅ Recorded 301 video frames
2025-09-02 15:22:47,776 - INFO - ✅ Recorded 501 audio frames
2025-09-02 15:22:56,227 - INFO - ✅ Successfully encoded video: 1001838 bytes, 1335784 base64 chars
2025-09-02 15:22:56,233 - INFO - 🔍 Analyzing video for participant eBJWcH5N8zSk...
2025-09-02 15:23:02,471 - INFO - ✅ Video analysis completed for participant eBJWcH5N8zSk (finish reason: unknown)
2025-09-02 15:23:02,471 - INFO - 📝 Analysis:

Again, breaking out the analysis for readability:

In the video, a man is seated in a chair, speaking into a microphone.

He is wearing a black cap and a gray hoodie. The background includes a Twitch logo on the left side and a world map on the right side, with a cartoon image of a girl with green hair also visible on the right.

The man appears to be engaging in a conversation with an audience or a specific person named Betty, as indicated by the text overlay that reads, "Ask a question in chat with Betty!" and "Hey Betty! Can you explain the entirety of the universe starting from the big bang to the current time?"

The dialogue suggests that the man is asking about the origins of the universe. The overall setting is that of a live stream or video call, where the man is addressing viewers and possibly discussing topics related to gaming or history.

This is super helpful for content discovery, moderation and analysis. Imagine being able to monetize a stream based on the exact content within it!

4️⃣ Indexing VODs with TwelveLabs Marengo

Analyzing live video is amazing and powerful, but VODs are just as important to any social UGC video platform. TwelveLabs Marengo can be used via Amazon Bedrock to generate embeddings from video, text, audio, or image inputs. These embeddings can be used for similarity search, clustering, and other machine learning tasks which means your users can easily search for and discover VODs based on the actual video content itself! Check out the docs to learn more.

5️⃣ Curating VOD Highlights

The key to growth for any social UGC video application is social clip sharing. I've previously blogged about a simple way you can curate clips for social sharing which can help your users easily create their next viral highlight. Check out this post to learn how!

🏁 Summary

There are tons of ways to improve the UX of your social UGC live streaming application, and AI is only making it easier for developers. What is your favorite way to use AI with video? Post it in the comments below!

Publishing Real-Time Video via WHIP to Amazon IVS

Todd Sharp — Fri, 29 Aug 2025 15:19:12 +0000

Traditionally, Amazon IVS real-time stages are used in mobile or web applications. For these types of applications, we offer great SDKs for web, iOS and Android. But sometimes your application isn't a web or mobile application. Or, sometimes your viewers are on web or mobile but your publisher isn't. There are several use cases that could fall into the "need to publish from something other than a browser or mobile application" bucket, and in this post I'd like to highlight a few various options. All of the options that we'll look at below are possible because Amazon IVS stages supports the WebRTC-HTTP Ingest Protocol (WHIP). So if you're looking to broadcast from a desktop app, headless server, or any other platform that supports WHIP, read on to learn more!

Create a Stage

Before we can test out any of the following methods, we'll need to first create an Amazon IVS real-time stage. We can create a stage and a single participant token for testing out all of the methods below with the following command via the AWS CLI.

aws ivs-realtime create-stage \
  --name "whip-demo-stage" \
  --participant-token-configurations \
    userId=demo-whip-broadcaster,capabilities=PUBLISH,SUBSCRIBE,duration=720

This will give us output similar to the following:

{
    "stage": {
        "arn": "arn:aws:ivs:us-east-1:[redacted]:stage/[redacted]",
        "name": "whip-demo-stage",
        "tags": {},
        "autoParticipantRecordingConfiguration": {
            ...
        },
        "endpoints": {
            "whip": "https://f99084460c35.global-bm.whip.live-video.net",
            ...
        }
    },
    "participantTokens": [
        {
            "participantId": "[redacted]",
            "token": "eyJhbGciOiJLTVMiLCJ0eXAiOiJKV1QifQ.[redacted]",
            "userId": "demo-whip-broadcaster",
            "duration": 720,
            "capabilities": [
                "PUBLISH",
                "SUBSCRIBE"
            ]
        }
    ]
}

I've removed some bits for brevity, but the important part going forward is the token value. Take note of these and let's dig into the various options.

⚠️ Note: You'll notice that the WHIP endpoint returned is not the WHIP endpoint that we'll use below. The one we'll use is a global endpoint (https://global.whip.live-video.net) which is load-balanced and will ultimately redirect to the most appropriate endpoint for each session. For more information, refer to the docs.

1️⃣ Broadcasting with OBS

Broadcasting via WHIP with OBS is quite easy! In Settings -> Stream, choose WHIP under Service. Then enter the https://global.whip.live-video.net as the endpoint for your stage and your participant token as the Bearer Token. For best performance, make sure your stream matches the recommended settings in the IVS docs.

We're ready to start streaming! Click Start Streaming, then go to the stage details in the AWS console to view your stream.

For the next few options, we'll be using the command line. To simplify the inputs, set your stage token as an environment variable.

$ export STAGE_TOKEN=eyJhbGciOiJLTVMiLCJ0eXAiOiJKV1QifQ.[redacted]

2️⃣ Broadcasting with Python's `aiortc`

Another option for broadcasting to a stage with WHIP is aiortc which is a library for Web Real-Time Communication (WebRTC) and Object Real-Time Communication (ORTC) in Python. With aiortc, we can publish from a user's webcam and microphone, or by using a pre-recorded video. I recently published a full repo of demos showing various use cases of aiortc with Amazon IVS, and to test out publishing an MP4, refer to the ivs-stage-publish.py script in that repo. Once you've checked out the project and configured it as necessary, you can publish an MP4 like this:

python ivs-stage-publish.py \
  --token $STAGE_TOKEN \
  --path-to-mp4 /path/to/an.mp4

The script output is pretty detailed, but once you see the following message, your video is ready to view.

2025-08-29 10:44:16,786 - INFO - 🎉 WebRTC publishing established!

3️⃣ Broadcasting with FFMPEG

Another option, which is relatively new, is to broadcast to an Amazon IVS stage via FFMPEG. Back in June FFMPEG merged the long-outstanding WHIP muxer into the mainline repo. You'll need to compile it from source to enable WHIP (make sure to include the --enable-muxer=whip and --enable-libharfbuzz options). Once it's compiled you can use the same endpoint and token from earlier to publish an MP4.

ffmpeg -re -stream_loop -1 \
  -i /path/to/an.mp4 \
  -c:v libx264 -profile:v baseline -c:a copy \
  -f whip -authorization "$STAGE_TOKEN" \
  "https://global.whip.live-video.net"

Again, take a look at the AWS console to view your stream. (Yes, I'm using the same picture from earlier - it's literally the same result 😆).

4️⃣ Broadcasting with `srtc`

The final option that we'll look at in this post is an open-source library called srtc (repo). This is a "simple" WebRTC implementation written in C++ as a side project by Kostya Vasilyev, one of the brilliant Amazon IVS engineers. You can use the library to create your own WebRTC application, but for simple testing you can download the latest release for your OS and the sintel.h264 file from the release page. Then, you can publish a test stream with the srtc_publish binary.

$ ./srtc_publish --url https://global.whip.live-video.net --token $STAGE_TOKEN --file ./sintel.h264 --loop
*** Using source file: ./sintel.h264
*** Using WHIP URL:    https://global.whip.live-video.net
*** Loading ./sintel.h264
*** PeerConnection state: connecting
*** PeerConnection state: connected
Played    25 video frames

🔇 Note: The command line demo publishes video only. The project is intended to be used as a dependency for building your own applications, so audio is not included in the command line demo utility.

Again, preview the stream in the AWS console.

Summary

In this post, we've looked at several ways to publish to an Amazon IVS real-time stage using the WebRTC HTTP Ingest Protocol (WHIP). I hope this post helped you to realize the possibilites of WebRTC and real-time streaming outside of traditional web and mobile applications and maybe inspired you to think beyond the "usual" approach to publishing real-time video. If you have questions, or ideas about how you might use WHIP with Amazon IVS, please share them below!

Strands Agents SDK を使用してライブストリームからソーシャルメディア用のクリップを自動生成する

Todd Sharp — Mon, 25 Aug 2025 12:41:47 +0000

生成 AI は、大量のデータを分析し、指示に基づいてデータの意味ある箇所を特定し、要約することに優れています。データのソースは何でも構いません。あなたのメールの受信箱、ローカル上の大きなテキストファイル、またはインターネット上のウェブページなど。大量のデータを含む大きなタスクは完璧にこなせなくても、タスクを効率よくスタートさせることができます。私が開発者からよく相談される話題の一つは、AI をどのようにビデオコンテンツの分析に活用できるかということです。マルチモーダル LLM は大きく進歩してきましたが、1 時間ほどのライブストリーム録画を取り込んで要約させたり、ソーシャルメディアで共有するためのクリップを提案してもらったりできるレベルにはまだ達していません。この問題を解決するためには、本記事で紹介する少し異なるアプローチを取る必要があります。

ソリューション概要

ライブストリームの録画（VOD）を分析するツールを作成するために、私は以下のアプローチを取りました：

HLS メディアをローカルにダウンロードし、 MP4 形式に変換
MP4 ファイルの文字起こしを作成
AI エージェントを用いて、文字起こしを元にクリップ候補となる開始/終了タイムスタンプを特定
特定されたタイムスタンプに基づいて MP4「クリップ」を作成
作業の要約を生成

ソリューションアーキテクチャ

このソリューションの実装では、新しい Strands Agents SDK を試してみることにしました。Strands Agent SDKはエージェント構築のための比較的新しいフレームワークであり、エージェントがプロンプトへの応答を生成する際に利用できるツールを簡単に定義できることが特徴です。このソリューションでは、エージェントが実行されるマシンに FFmpeg と Whisper がインストールされている必要があります。私はローカルで作業していますが、FastAPI や他の Web フレームワークを使用してサーバーベースのソリューションに簡単に変換でき、Docker や Podman のコンテナとしてでクラウドにデプロイできます。

ツールの定義

Strands Agents の概要は公式ドキュメントで説明されているので、ここでは割愛します。代わりに、デモを通じて実際の動作を確認していきましょう。

まずは、Python の新しい仮想環境を作成します。

python -m venv .venv
source .venv/bin/activate

次に、必要な依存関係をインストールします。次の内容を requirements.txt に記述します：

FFmpeg-python
git+https://github.com/openai/whisper.git
strands-agents
strands-agents-tools 
click

依存関係をインストールします：

pip install -r requirements.txt

続いて、agent.py ファイルを作成し、依存関係のインポートを定義します。

import os
import ffmpeg
import tempfile
import uuid
import shutil
import whisper
import click
from whisper.utils import get_writer
from strands import Agent, tool
from strands_tools import (
    file_read,
    file_write,
    use_aws,
)

一時作業用のディレクトリを作成し、一時ファイル用の変数をいくつか定義します。

temp_dir = tempfile.mkdtemp()
mp4_file = f"{uuid.uuid4()}.mp4"
mp4_path = os.path.join(temp_dir, mp4_file)
transcript_file = f"{uuid.uuid4()}.vtt"
transcript_path = os.path.join(temp_dir, transcript_file)

これで、エージェントがタスクを完了するために必要なツールを定義することができます。ツールは、 @tool デコレータを用いることで定義できます。
最初のツールは、指定された URL 先の HLS プレイリストである m3u8 ファイルを、FFmpeg を用いて mp4 ファイルとしてダウンロードします。

@tool
def convert_m3u8_to_mp4(path_to_m3u8: str):
    stream = FFmpeg.input(path_to_m3u8)
    stream = FFmpeg.output(
        stream,
        mp4_path,
        acodec="copy",
        vcodec="copy",
        crf=50,
        **{"bsf:a": "aac_adtstoasc"},
        loglevel=("verbose" if verbose else "quiet"),
    )
    FFmpeg.run(stream, overwrite_output=True)
    print(f"Saved stream to {mp4_path}")

次のツールは、Whisper を用いて MP4 ファイルの文字起こしを行います。

@tool
def transcribe_mp4():
    print(f"Transcribing {mp4_path} to {transcript_file}")
    model = whisper.load_model("base")
    result = model.transcribe(mp4_path, verbose=verbose)
    vtt_writer = get_writer("vtt", temp_dir)
    vtt_writer(result, transcript_file)
    print(f"Transcribe complete")

最後のツールは、エージェントが特定した開始/終了タイムスタンプを基に、指定された時間範囲で元の mp4 ファイルから新しいクリップを生成する機能です。

@tool
def create_mp4_clip(
    input_file: str,
    start_time: str,
    end_time: str,
    output_file: str,):
    """FFmpeg を用いて mp4 をクリップする"""
    stream = FFmpeg.input(input_file, ss=start_time, to=end_time)
    stream = FFmpeg.output(
        stream,
        output_file,
        c="copy",
        loglevel=("verbose" if verbose else "quiet"),
    )
    FFmpeg.run(stream, overwrite_output=True)

エージェントへのプロンプト

ツールの定義が終わりましたので、エージェントのインスタンスを作成しましょう。エージェントには、先ほど作成したカスタムツールと、必要に応じて使用できる組み込みツール（file_read、file_write、use_aws など）を渡します。

agent = Agent(
    tools=[
        create_mp4_clip,
        convert_m3u8_to_mp4,
        transcribe_mp4,
        file_read,
        file_write,
        use_aws,
    ],
)

次に、ユーザーに verbose モード（FFmpeg と Whisper の標準出力を表示）で実行するかを確認し、m3u8（HLS プレイリスト）URL の入力を求めます。

verbose = click.confirm("Enable verbose output?", default=False)
m3u8_url = click.prompt("Enter M3U8 URL", type=str)

そして、エージェントにプロンプトを渡します。

agent(
    f"""
      your purpose is to generate small clips from a larger live stream 
      so that they can be shared on social media.
      your task involves the following steps:

      1. download the HLS segments and convert them to an MP4 file
      2. transcribe the downloaded MP4 using whisper
      3. analyze the transcript to identify potential clips
      4. create the clips
      5. provide a summary of your work

      here are some details:

      Step 1: using the tool `convert_m3u8_to_mp4`:
      given the HLS manifest file located at {m3u8_url},
      convert the live stream to an MP4 and save it to {mp4_path}.

      Step 2: using the tool `transcribe_mp4`:
      transcribe the MP4 and save the transcript to {transcript_path}.
      this will produce a timestamped transcript from the live stream. 

      Step 3: take a look at the transcript and identify a list of suggested 
      timestamps (start and end) for clips that might be good to share 
      on social media. provide a suggested title, the start and end 
      timestamp, the transcript in that section, a summary/description 
      and several tags.

      Step 4: using the tool `create_mp4_clip`:
      after you have identified them, create an mp4 for each based on 
      the timestamp identified. use the original mp4 that was created 
      in the first step that is located at {mp4_path}.
      each output file should use a kebab case mp4 filename
      generated from your suggested title. 
      save the new clips in {temp_dir}.

      do not cleanup the temp directory at this time, even though 
      you have a tool that can perform this step. 

      Step 5: provide a summary of your work, 
      in the summary, output the following information for each clip:
        - Title
        - Start Time
        - End Time 
        - Transcript
        - Summary
        - Tags
        - File Location of Clip
    """
)

アプリケーションの実行

python agent.py コマンドでアプリケーションを実行すると、verbose モードの有効化と URL の入力が求められます。

Enable verbose output? [y/N]: 
Enter M3U8 URL:

今回は、私の友人である Darko Mesaroš の最近のライブストリームを用いてエージェントをテストしました。

I'll help you generate small clips from the larger live stream following the steps you've outlined. Let's begin:

## Step 1: Download the HLS segments and convert them to an MP4 file

First, I'll convert the HLS manifest file to an MP4.
Tool #1: convert_m3u8_to_mp4
Saved stream to /[redacted]/0a464c96-e071-4300-83d7-a74eeee82e28.mp4

期待通りの結果です。プロンプト通りに MP4 ファイルが正常に作成されました。

## Step 2: Transcribe the MP4 file

Now, I'll transcribe the downloaded MP4 file:
Tool #3: transcribe_mp4
Transcribing /[redacted]/0a464c96-e071-4300-83d7-a74eeee82e28.mp4 to 55c5f5f6-85ed-46d4-80f9-d455f6c51fcb.vtt
/[redacted]/transcribe.py:132: UserWarning: FP16 is not supported on CPU; using FP32 instead
  warnings.warn("FP16 is not supported on CPU; using FP32 instead")
Detected language: English
Transcribe complete

そして文字起こしも完了しました！次は、この文字起こしを分析してクリップ候補を提案してくれるはずです。

## Step 3: Analyze the transcript to identify potential clips

Based on the transcript, I'll identify several potential clips that could be good to share on social media. These will be moments that are engaging, informative, or entertaining.

### Clip 1: Stream Introduction and Technical Difficulties

**Title:** "Starting a Creative Coding Stream with Rust"

**Start Time:** 01:26.580
**End Time:** 02:12.220

**Transcript:**
"NOW! We're live. Yes. Okay. Now there's audio. There's audio. Okay. I was like, ah, um, so last moment I had to cut off my, um, what is it called? My, uh, my audio. Not only my whole Thunderbolt docked here because it just froze. So lost all my audio and everything else as I went along. So how do you, how do we, Jay? How are you doing my friend? It has been way too long. So I hope you're doing good. Um, welcome to the show. Welcome. Welcome. It has been actually, um, actually even a good old while for myself in the terms of live streaming. I think I have a stream for like 20 days. So incredibly happy to be back. So yeah, let's do some live streaming."

**Summary:** The streamer begins their first live stream in 20 days, explaining how they had technical difficulties with their audio setup when trying to start the stream. Despite the challenges, they're enthusiastic about being back and connecting with viewers.

**Tags:** #RustDevelopment #LiveCoding #StreamingTech #CreativeCoding #TechnicalDifficulties

### Clip 2: Introducing GGEZ - A Rust Game Engine

**Title:** "What is GGEZ? A Rust Game Engine for Creative Coding"

**Start Time:** 02:30.860
**End Time:** 03:29.820

**Transcript:**
"Um, okay. So the plan for today, my friends, is to do some stuff with, well, as always, Rust. So I'm going to do some creative coding today. Something with, um, um, Rust and a, and a framework I was using over the last few days called, GGE. It is a, I think they call themselves like a rust game engine thing. Like not terribly serious. It's not a full blown game engine. Um, it's a, it's a thing and it kind of does graphics. It is very similar to like P5JS, which is a very much, uh, an artistic type of coding framework, which I used in the past stream a long time ago. And, and of course, P5JS comes from processing. If anybody knows what processing is, um, so processing is an old framework. I think it uses its own bespoke language to do like graphics and like stuff. So, um, that's the whole plan today to do a little bit of that. Um, and have some fun."

**Summary:** The streamer introduces GGEZ, a Rust game engine framework they'll be using for creative coding. They explain how it's similar to P5JS and Processing, highlighting that it's not a full game engine but a tool for making graphics and creative coding projects.

**Tags:** #RustProgramming #GGEZ #GameDevelopment #CreativeCoding #GraphicsProgramming

### Clip 3: Drawing Fire - Creative Coding Goal

**Title:** "Let's Draw Fire with Rust: My Creative Coding Challenge"

**Start Time:** 06:03.980
**End Time:** 07:09.740

**Transcript:**
"So what, why would I like to do is I would love to do some creative coding, but in Rust. So, so that's the plan for today. Do a little bit of that. Um, there's no certain point to today. Like, in a sense, like we're not building up to, we're going to share. We're going to try to draw some pixels on the screen. We're not going to go too crazy and be like S.D.L type of approach. And I'm like, just super complicated. We're going to just do, um, squares and circles and whatnot. So, um, I want to try to draw flames. This is one of my, one of my tests when I draw this, these kinds of, when I work with these, a kind of artistic tools is like, can I draw fire, right? A C doughnut. Is, is a C doughnut, so. I know a C, oh, oh, wow. This would be a, I mean, probably at one point, I'm not as good as, as being able to do something like this. I know this whole thing code that actually is shaped like doughnut and it's just math. I'm not as good as math in this, but, um, yeah, yeah, yeah. We're going to try to draw some flames just simple as that. And I've done that in the past in Lua, but I want to try to do a Darwin rust. And I want to use this GGZ to set it up."

**Summary:** The streamer explains their goal for the creative coding session: to draw flames using Rust and GGEZ. They mention this as their go-to test when trying out new creative coding frameworks, comparing it to the famous "donut" code visualization but clarifying they'll stick to something simpler that they've previously implemented in Lua.

**Tags:** #RustCreativeCoding #DigitalArt #FlameSimulation #CreativeTech #VisualProgramming

### Clip 4: Setting Up GGEZ Fundamentals

**Title:** "How GGEZ Works: The Core Architecture of a Rust Game Engine"

**Start Time:** 12:46.140
**End Time:** 13:21.340

**Transcript:**
"So the way this works is that, um, your application has, um, basically a state, a main state where it kind of keeps stuff between frames. And, um, it has specific events on that state. That means as your application runs, there is a draw event, and there's an update event. And within those two events, you can do certain things. Now, of course, in the draw event, you draw on the screen, and the update event, you update whatever you're doing. And those are the two, two main important things we're going to be creating. And GGZ requires you to do it in a specific way."

**Summary:** The streamer explains the fundamental architecture of GGEZ, detailing how the game loop works with a main state that maintains data between frames, and the two key events - draw and update - that form the backbone of any GGEZ application.

**Tags:** #GameDevelopment #RustProgramming #GameLoop #GGEZ #GraphicsProgramming

出力が長々と続きますが、まさに期待通りです。指定された 20 分のライブストリームを分析し、「魅力的で、情報に富み、または娯楽性のある」4 つのクリップ候補が提案されています。この要約を見る限り、ソーシャルメディアで共有するのに適した、長時間ストリームから抽出すべき短時間クリップを見つける作業を非常にうまく行っています。
次のステップは、実際の mp4 クリップファイルの生成です。

Now I'll create MP4 clips for each of the identified segments:
Tool #6: create_mp4_clip
Tool #7: create_mp4_clip
Tool #8: create_mp4_clip
Tool #9: create_mp4_clip

最後に、エージェントが全作業の要約を提供します。

## Step 5: Summary of Work

### Clip 1: Starting a Creative Coding Stream with Rust
- **Title:** Starting a Creative Coding Stream with Rust
- **Start Time:** 00:01:26.580
- **End Time:** 00:02:12.220
- **Transcript:** "NOW! We're live. Yes. Okay. Now there's audio. There's audio. Okay. I was like, ah, um, so last moment I had to cut off my, um, what is it called? My, uh, my audio. Not only my whole Thunderbolt docked here because it just froze. So lost all my audio and everything else as I went along. So how do you, how do we, Jay? How are you doing my friend? It has been way too long. So I hope you're doing good. Um, welcome to the show. Welcome. Welcome. It has been actually, um, actually even a good old while for myself in the terms of live streaming. I think I have a stream for like 20 days. So incredibly happy to be back. So yeah, let's do some live streaming."
- **Summary:** The streamer begins their first live stream in 20 days, explaining how they had technical difficulties with their audio setup when trying to start the stream. Despite the challenges, they're enthusiastic about being back and connecting with viewers.
- **Tags:** #RustDevelopment #LiveCoding #StreamingTech #CreativeCoding #TechnicalDifficulties
- **File Location:** /[redacted]/starting-a-creative-coding-stream-with-rust.mp4

[trimmed for brevity]

以下に、エージェントが長いライブストリームから生成したクリップの一例です：

エージェントの拡張

必要に応じて、例えば生成された mp4 ファイルを S3 バケットにアップロードするようエージェントに指示することもできます。その場合、処理中に生成された一時ファイルをクリーンアップするツールも作成したくなるでしょう。エージェントを拡張するツールはいくらでも追加できます。

このソーシャルクリップ生成エージェントにどのような拡張機能を追加したいですか？Strands Agents フレームワークをどのように活用していますか？コメントでぜひお聞かせください。

原文はこちらです。
本ブログは Amazon Web Services Japan Startup Solutions Architect の Torgayev Tamirlan が翻訳しました。

写真出展：Sangharsh Lohakare on Unsplash

Building Virtual Agents with Amazon Nova Sonic S2S

Todd Sharp — Wed, 13 Aug 2025 16:59:04 +0000

One of the more intriguing and helpful use cases of generative AI is voice agents. Advancements in speech recognition and synthesis have blurred the lines between human and machine to the point where it no longer feels like you're speaking to a terrible robot from a 1980's sci-fi flick. There's something strangely soothing to me about speaking to a virtual agent these days. There's no need to worry that the person on the other end of the line has had an awful day, or just spilled their coffee on their lap. Just a friendly, no-nonsense bot who is happy to help you. Of course there are exceptions to this - but I'd rather talk to a bot than a human for most customer service interactions these days. So what does it take to create your own virtual agent that understands your speech and responds to you with a friendly demeanor and a human-like voice? Let's take a look!

Agent Demo

Before we dig into things, here's a quick video showing the agent in action. There's room for a bit of improvement here since the Nova S2S agent is not optimized for WebRTC communication. We've got to buffer the audio a bit to make things sound clear and smooth when streaming over WebRTC. Using tools (as we'll talk about in a bit) will also incur a bit of delay, since the agent needs to first invoke the tool before it can parse and summarize the data and respond. But overall, it's an amazing experience and opens up the door to many possibilities.

Introducing Amazon Nova Sonic

Amazon Nova Sonic is a foundation model created by Amazon AGI Labs with a bidirectional streaming API. This means you can speak to it, and it can respond by speaking to you. No need to convert speech-to-text, or text-to-speech - the model supports direct audio input and output. Right off the top, this is a huge win. TTS and STS are not terribly difficult to implement these days, but removing the need to concern yourself with these tasks lets you focus on building the experience of your application and enhancing your agent with custom tools that can provide it with deep contextual knowledge. Want to build a reservation system for your customers? No problem - just wire up your agent with the tools that it needs to access the user's data and it'll be able to retrieve and modify reservations through simple conversations. Practically any use case that you can think of is possible. Try it out for yourself without writing a single line of code. When you're ready to implement your own agent, check out the user guide - or read on in this post to learn how I created a demo that shows you how to integrate your Nova Sonic agent into an Amazon IVS real-time stream.

Building A Live-Stream Agent

Running in the browser or on the command line is an excellent use-case for a voice agent. But wouldn't it be super cool if the agent could actually join as a virtual participant in a WebRTC based video call? Yeah, I thought so too - so about a month ago I started playing around with the Python library aiortc to provide an integration between Amazon IVS and a Nova Sonic agent. Here's how it works:

User joins an Amazon IVS real-time stage with camera and microphone in the browser
Agent script launched
Agent subscribes to a single user's audio and video
- Pipe user's audio into Nova Sonic
- Captures video frames
Agent joins as a publisher, with dynamically generated speech visualization video track
Agent listens for user audio, and responds (invoking 'tools' as necessary)
Agent audio is published as an audio track to the stage

Tools ⚒️

The demo agent has a few tools to demonstrate Nova Sonic's ability to augment the agent's contextual and domain-specific knowledge. The simplest example is asking the agent for the current time. Because the agent is trained on a static dataset, it has no way to determine the current date or time. But if we expose a tool for this, it's really easy to add this potentially import context to the agent. The tool can use simple Python (datetime.now()) or can be more complex, like calling third-party APIs or using SDKs to retrieve dynamic data. For example, the demo in the repo linked in this post has a weather tool that can get the current weather via a remote API.

Giving the Agent Vision 🤖👀

Because the agent script is subscribed to the user's video feed, it seemed logical to give the agent the ability to "see" the user. You can literally ask it to describe what you look like or what it sees in the environment around you. You can even ask it to tell you what you're holding in your hand. When it determines that it needs to use its "vision" tool, it grabs the latest video frame and passes it to a multi-modal LLM for image analysis. The analysis is returned to the agent which summarizes it and tells you what it saw.

Try It Out!

If you'd like to try it out for yourself, clone the demo repo to your machine and refer to the README. There are several different use cases in that repo, so refer to the stages-nova-s2s directory for this demo (but check out the other demos too!). There's also a Nova Sonic specific README in that directory that goes into a bit more detail.

Once you've cloned the repo, create an Amazon IVS real-time stage and publish to it. You can do this right from the Amazon IVS Console. Once you're publishing to the stage, you'll need to find your participant ID. On the stage details page (the page that you're using to publish your camera), scroll down to Stage sessions and select the ACTIVE session.

On the session details page, scroll down to Stage participants and find the CONNECTED participant and copy the Participant Id.

Head back to the stage details page and generate a participant token.

Then, copy the newly generated token.

Launch the new agent with the agent's generated token and the original participant's id:

python ivs-stage-nova-s2s.py \
  --token  eyJhbGciOiJLTVMiLCJ0eXAiOiJKV1QifQ... \
  --subscribe-to 43QZqauB2sFz

After a few seconds, you'll see your agent join the stage and you'll be able to chat with it.

Summary

Feel free to check out the source code in the demo repo to learn how to enhance and customize the agent in your own applications. Try to add your own custom tools and give the agent domain specific superpowers! This demo should give you a great start towards building your own voice agents powered by Amazon IVS and Amazon Nova Sonic.

For more information, refer to the user guides and documentation for Amazon IVS and Amazon Nova Sonic:

So what will you build with Amazon Nova Sonic and Amazon IVS?

Auto-Generating Clips for Social Media from Live Streams with the Strands Agents SDK

Todd Sharp — Fri, 06 Jun 2025 17:12:34 +0000

Generative AI is excellent at analyzing large amounts of data and identifying and summarizing a meaningful subset of the data based on a guided set of instructions. The source of the data could be anything - your email inbox, a large text file on your machine or a webpage on the internet. It's not perfect, but it can give you a head start on large tasks that involve a lot of data. One of the things I get asked about a lot by developers is how they can use AI to analyze video content. Multimodal LLMs have come a long way, but they're not quite at the point where they could ingest an hour's worth of video from a live stream and provide a summary or identify potential clips for sharing on social media. But there's another approach that we can take to solve this problem.

Solution Overview

In order to create a tool to analyze a video-on-demand (VOD) from a live stream, I settled on the following approach.

Download/convert the HLS media to a local MP4
Transcribe the MP4
Ask an AI agent to identify start/end timestamps for potential clips based on the transcript
Create MP4 'clips' based on the identified timestamps
Summarize the work

Solution Architecture

To accomplish this task, I decided to try out the new Strands Agents SDK. It's a fairly new framework for building agents that has a simple way to define tools that the agent can use to assist in responding to prompts. For this solution, we'll need FFMPEG and Whisper installed on the machine where the agent runs. I'll be working locally, but this could easily be converted to a server-based solution using FastAPI or another web framework and deployed to the cloud in a Docker/Podman container.

Defining Tools

I won't cover how to get started with Strands Agents since the docs do a great job of that. Instead, we'll walk through this demo and see how it works.

Of course we start with a fresh virtual environment.

python -m venv .venv
source .venv/bin/activate

Next we'll need to install our dependencies. Place the following in requirements.txt:

ffmpeg-python
git+https://github.com/openai/whisper.git
strands-agents
strands-agents-tools 
click

And install the dependencies with:

pip install -r requirements.txt

Next, in agent.py, we'll define our imports.

import os
import ffmpeg
import tempfile
import uuid
import shutil
import whisper
import click
from whisper.utils import get_writer
from strands import Agent, tool
from strands_tools import (
    file_read,
    file_write,
    use_aws,
)

Create a temp working directory and some variables for our temp files.

temp_dir = tempfile.mkdtemp()
mp4_file = f"{uuid.uuid4()}.mp4"
mp4_path = os.path.join(temp_dir, mp4_file)
transcript_file = f"{uuid.uuid4()}.vtt"
transcript_path = os.path.join(temp_dir, transcript_file)

Now we can start defining the tools that our agent will need to complete the task. For this we use the @tool decorator. The first tool will take the m3u8 URL for the HLS playlist and download it as an mp4 with FFMPEG.

@tool
def convert_m3u8_to_mp4(path_to_m3u8: str):
    stream = ffmpeg.input(path_to_m3u8)
    stream = ffmpeg.output(
        stream,
        mp4_path,
        acodec="copy",
        vcodec="copy",
        crf=50,
        **{"bsf:a": "aac_adtstoasc"},
        loglevel=("verbose" if verbose else "quiet"),
    )
    ffmpeg.run(stream, overwrite_output=True)
    print(f"Saved stream to {mp4_path}")

The next tool will transcribe the MP4 with Whisper.

@tool
def transcribe_mp4():
    print(f"Transcribing {mp4_path} to {transcript_file}")
    model = whisper.load_model("base")
    result = model.transcribe(mp4_path, verbose=verbose)
    vtt_writer = get_writer("vtt", temp_dir)
    vtt_writer(result, transcript_file)
    print(f"Transcribe complete")

And finally, a tool that will take the start/end timestamps that the agent identifies and generate a new clip from the original mp4 at the given start/end time.

@tool
def create_mp4_clip(
    input_file: str,
    start_time: str,
    end_time: str,
    output_file: str,
):
    """clip an mp4 using ffmpeg"""
    stream = ffmpeg.input(input_file, ss=start_time, to=end_time)
    stream = ffmpeg.output(
        stream,
        output_file,
        c="copy",
        loglevel=("verbose" if verbose else "quiet"),
    )
    ffmpeg.run(stream, overwrite_output=True)

Prompting the Agent

Now that our tools are defined, we can create an instance of the agent. Here we are passing our custom tools, as well as some built-in tools like file_read, file_write and use_aws that the agent can use as necessary.

agent = Agent(
    tools=[
        create_mp4_clip,
        convert_m3u8_to_mp4,
        transcribe_mp4,
        file_read,
        file_write,
        use_aws,
    ],
)

We'll prompt the user to ask if they'd like to run in verbose mode (which will provide STDIO output from FFMPEG and Whisper) and ask them to provide the m3u8 (HLS playlist) URL.

verbose = click.confirm("Enable verbose output?", default=False)
m3u8_url = click.prompt("Enter M3U8 URL", type=str)

And now we can pass a prompt to the agent.

agent(
    f"""
      your purpose is to generate small clips from a larger live stream 
      so that they can be shared on social media.
      your task involves the following steps:

      1. download the HLS segments and convert them to an MP4 file
      2. transcribe the downloaded MP4 using whisper
      3. analyze the transcript to identify potential clips
      4. create the clips
      5. provide a summary of your work

      here are some details:

      Step 1: using the tool `convert_m3u8_to_mp4`:
      given the HLS manifest file located at {m3u8_url},
      convert the live stream to an MP4 and save it to {mp4_path}.

      Step 2: using the tool `transcribe_mp4`:
      transcribe the MP4 and save the transcript to {transcript_path}.
      this will produce a timestamped transcript from the live stream. 

      Step 3: take a look at the transcript and identify a list of suggested 
      timestamps (start and end) for clips that might be good to share 
      on social media. provide a suggested title, the start and end 
      timestamp, the transcript in that section, a summary/description 
      and several tags.

      Step 4: using the tool `create_mp4_clip`:
      after you have identified them, create an mp4 for each based on 
      the timestamp identified. use the original mp4 that was created 
      in the first step that is located at {mp4_path}.
      each output file should use a kebab case mp4 filename
      generated from your suggested title. 
      save the new clips in {temp_dir}.

      do not cleanup the temp directory at this time, even though 
      you have a tool that can perform this step. 

      Step 5: provide a summary of your work, 
      in the summary, output the following information for each clip:
        - Title
        - Start Time
        - End Time 
        - Transcript
        - Summary
        - Tags
        - File Location of Clip
    """
)

Running the Application

When we run the application using python agent.py we'll be prompted to toggle verbose mode and provide the URL.

Enable verbose output? [y/N]: 
Enter M3U8 URL:

I decided to test the agent out with a recent live stream from my friend Darko Mesaroš.

I'll help you generate small clips from the larger live stream following the steps you've outlined. Let's begin:

## Step 1: Download the HLS segments and convert them to an MP4 file

First, I'll convert the HLS manifest file to an MP4.
Tool #1: convert_m3u8_to_mp4
Saved stream to /[redacted]/0a464c96-e071-4300-83d7-a74eeee82e28.mp4

Looks good - the MP4 file was generated as expected based on our prompt.

## Step 2: Transcribe the MP4 file

Now, I'll transcribe the downloaded MP4 file:
Tool #3: transcribe_mp4
Transcribing /[redacted]/0a464c96-e071-4300-83d7-a74eeee82e28.mp4 to 55c5f5f6-85ed-46d4-80f9-d455f6c51fcb.vtt
/[redacted]/transcribe.py:132: UserWarning: FP16 is not supported on CPU; using FP32 instead
  warnings.warn("FP16 is not supported on CPU; using FP32 instead")
Detected language: English
Transcribe complete

And the transcript was generated! Now it should analyze the transcript to identify some clips.

## Step 3: Analyze the transcript to identify potential clips

Based on the transcript, I'll identify several potential clips that could be good to share on social media. These will be moments that are engaging, informative, or entertaining.

### Clip 1: Stream Introduction and Technical Difficulties

**Title:** "Starting a Creative Coding Stream with Rust"

**Start Time:** 01:26.580
**End Time:** 02:12.220

**Transcript:**
"NOW! We're live. Yes. Okay. Now there's audio. There's audio. Okay. I was like, ah, um, so last moment I had to cut off my, um, what is it called? My, uh, my audio. Not only my whole Thunderbolt docked here because it just froze. So lost all my audio and everything else as I went along. So how do you, how do we, Jay? How are you doing my friend? It has been way too long. So I hope you're doing good. Um, welcome to the show. Welcome. Welcome. It has been actually, um, actually even a good old while for myself in the terms of live streaming. I think I have a stream for like 20 days. So incredibly happy to be back. So yeah, let's do some live streaming."

**Summary:** The streamer begins their first live stream in 20 days, explaining how they had technical difficulties with their audio setup when trying to start the stream. Despite the challenges, they're enthusiastic about being back and connecting with viewers.

**Tags:** #RustDevelopment #LiveCoding #StreamingTech #CreativeCoding #TechnicalDifficulties

### Clip 2: Introducing GGEZ - A Rust Game Engine

**Title:** "What is GGEZ? A Rust Game Engine for Creative Coding"

**Start Time:** 02:30.860
**End Time:** 03:29.820

**Transcript:**
"Um, okay. So the plan for today, my friends, is to do some stuff with, well, as always, Rust. So I'm going to do some creative coding today. Something with, um, um, Rust and a, and a framework I was using over the last few days called, GGE. It is a, I think they call themselves like a rust game engine thing. Like not terribly serious. It's not a full blown game engine. Um, it's a, it's a thing and it kind of does graphics. It is very similar to like P5JS, which is a very much, uh, an artistic type of coding framework, which I used in the past stream a long time ago. And, and of course, P5JS comes from processing. If anybody knows what processing is, um, so processing is an old framework. I think it uses its own bespoke language to do like graphics and like stuff. So, um, that's the whole plan today to do a little bit of that. Um, and have some fun."

**Summary:** The streamer introduces GGEZ, a Rust game engine framework they'll be using for creative coding. They explain how it's similar to P5JS and Processing, highlighting that it's not a full game engine but a tool for making graphics and creative coding projects.

**Tags:** #RustProgramming #GGEZ #GameDevelopment #CreativeCoding #GraphicsProgramming

### Clip 3: Drawing Fire - Creative Coding Goal

**Title:** "Let's Draw Fire with Rust: My Creative Coding Challenge"

**Start Time:** 06:03.980
**End Time:** 07:09.740

**Transcript:**
"So what, why would I like to do is I would love to do some creative coding, but in Rust. So, so that's the plan for today. Do a little bit of that. Um, there's no certain point to today. Like, in a sense, like we're not building up to, we're going to share. We're going to try to draw some pixels on the screen. We're not going to go too crazy and be like S.D.L type of approach. And I'm like, just super complicated. We're going to just do, um, squares and circles and whatnot. So, um, I want to try to draw flames. This is one of my, one of my tests when I draw this, these kinds of, when I work with these, a kind of artistic tools is like, can I draw fire, right? A C doughnut. Is, is a C doughnut, so. I know a C, oh, oh, wow. This would be a, I mean, probably at one point, I'm not as good as, as being able to do something like this. I know this whole thing code that actually is shaped like doughnut and it's just math. I'm not as good as math in this, but, um, yeah, yeah, yeah. We're going to try to draw some flames just simple as that. And I've done that in the past in Lua, but I want to try to do a Darwin rust. And I want to use this GGZ to set it up."

**Summary:** The streamer explains their goal for the creative coding session: to draw flames using Rust and GGEZ. They mention this as their go-to test when trying out new creative coding frameworks, comparing it to the famous "donut" code visualization but clarifying they'll stick to something simpler that they've previously implemented in Lua.

**Tags:** #RustCreativeCoding #DigitalArt #FlameSimulation #CreativeTech #VisualProgramming

### Clip 4: Setting Up GGEZ Fundamentals

**Title:** "How GGEZ Works: The Core Architecture of a Rust Game Engine"

**Start Time:** 12:46.140
**End Time:** 13:21.340

**Transcript:**
"So the way this works is that, um, your application has, um, basically a state, a main state where it kind of keeps stuff between frames. And, um, it has specific events on that state. That means as your application runs, there is a draw event, and there's an update event. And within those two events, you can do certain things. Now, of course, in the draw event, you draw on the screen, and the update event, you update whatever you're doing. And those are the two, two main important things we're going to be creating. And GGZ requires you to do it in a specific way."

**Summary:** The streamer explains the fundamental architecture of GGEZ, detailing how the game loop works with a main state that maintains data between frames, and the two key events - draw and update - that form the backbone of any GGEZ application.

**Tags:** #GameDevelopment #RustProgramming #GameLoop #GGEZ #GraphicsProgramming

That's a lot of output, but it's done exactly what we wanted. It analyzed the 20 minute stream that I provided and identified 4 clips that would be "engaging, informative, or entertaining". Based on the summary, I think it does a really good job at finding small clips of the larger stream that would be good for sharing on social media.

The next step is to generate the mp4 files.

Now I'll create MP4 clips for each of the identified segments:
Tool #6: create_mp4_clip
Tool #7: create_mp4_clip
Tool #8: create_mp4_clip
Tool #9: create_mp4_clip

Finally, the agent summarizes all of the work

## Step 5: Summary of Work

### Clip 1: Starting a Creative Coding Stream with Rust
- **Title:** Starting a Creative Coding Stream with Rust
- **Start Time:** 00:01:26.580
- **End Time:** 00:02:12.220
- **Transcript:** "NOW! We're live. Yes. Okay. Now there's audio. There's audio. Okay. I was like, ah, um, so last moment I had to cut off my, um, what is it called? My, uh, my audio. Not only my whole Thunderbolt docked here because it just froze. So lost all my audio and everything else as I went along. So how do you, how do we, Jay? How are you doing my friend? It has been way too long. So I hope you're doing good. Um, welcome to the show. Welcome. Welcome. It has been actually, um, actually even a good old while for myself in the terms of live streaming. I think I have a stream for like 20 days. So incredibly happy to be back. So yeah, let's do some live streaming."
- **Summary:** The streamer begins their first live stream in 20 days, explaining how they had technical difficulties with their audio setup when trying to start the stream. Despite the challenges, they're enthusiastic about being back and connecting with viewers.
- **Tags:** #RustDevelopment #LiveCoding #StreamingTech #CreativeCoding #TechnicalDifficulties
- **File Location:** /[redacted]/starting-a-creative-coding-stream-with-rust.mp4

[trimmed for brevity]

Here's an example of one of the clips that our agent generated for us from the longer live stream.

Enhancing the Agent

If we wanted to, we could prompt the agent to upload the resulting mp4 files to an S3 bucket. If we did that, we might want to also create a tool that cleans up the temp files that were generated during the process. There's no limit to the amount of tools we can create to enhance the agent.

What kind of enhancements would you want to see with this social clip generating agent? How are you using the Strands Agents framework? Let me know in the comments.

Photo by Sangharsh Lohakare on Unsplash

MCP サーバーを活用して Amazon Q デベロッパーCLIの能力を強化しよう

Todd Sharp — Fri, 02 May 2025 16:24:31 +0000

少し前に、Model Context Protocol (MCP) が LLM アプリケーションに不足しているコンテキストを補完する方法について、4部構成のブログシリーズで紹介しました。このシリーズでは、AWS リソース管理を支援するカスタムMCPサーバーを構築しました。次に、RAG ナレッジベースとユーティリティをサーバーに追加して、AWS サービスのドメイン固有知識を持たせることで、新しいアプリケーションの学習と開発を迅速に進められるようにしました。最後に、MCP サーバーが提供するツールを利用して Amazon Bedrock を通じて Claude 3.7 Sonnet を呼び出すカスタム MCP クライアントを作成しました。

カスタム MCP クライアントの開発は MCP と Amazon Bedrock について理解を深めるための良い練習になりましたが、実際には私たちの MCP サーバーは MCP プロトコルに対応するどのクライアントとも連携できます。今回は、これまで作成したカスタム MCP サーバーを Amazon Q Developer CLI と連携させる方法をご紹介します。

Amazon Q Developer

まだ Amazon Q Developer をお試しでない方は、ぜひインストールして使ってみることをお勧めします。優れた生成 AI を搭載したコーディング支援ツールとして、お気に入りの IDE と連携し、開発作業を強力にサポートします。また、q chat コマンドで起動できるコマンドラインベースのアシスタント、Amazon Q Developer CLI も含まれています。執筆時点ではちょうど、この Amazon Q Developer CLI がカスタムMCPサーバーに対応するようになりました。この新機能の詳細は発表ブログをご覧いただくとして、ここでは Amazon IVS 用MCPサーバーを実際に追加して試してみましょう！

Amazon Q Developer CLI にカスタム MCP サーバーを追加する

まず、Amazon Qをインストールし、AWS Builder ID で認証を行います。既にインストール済みの場合は、バージョンが 1.9.0 以上であることをご確認ください。

次に、~/.aws/amazonq 配下にある mcp.json というファイルで MCP サーバーを定義しましょう。このファイルに、Amazon Q が MCP サーバーと連携できるように、必要な情報を記述しなければなりません。

{
  "mcpServers": {
    "amazon-ivs-mcp-server": {
      "command": "node",
      "args": ["/projects/demos/ivs-mcp/amazon-ivs-mcp-server-demo/index.js"],
      "env": {
      }
    }
  }
}

mcpServer オブジェクトの中に、amazon-ivs-mcp-server というキーでオブジェクトを定義します。このキー名自体は好きに書いてもいいのですが、追加するサーバーの概要が伝わるものにすると良いでしょう。このキーに対してバリューとなるオブジェクトでは、実行コマンド (node) と、MCP サーバーに渡すを引数 (args) 配列を指定します。必要に応じて、env オブジェクトでは環境変数を定義することもできます。私の場合は、環境変数をセッションで設定しているので、このファイルでの明示的な定義は不要です。

最後に、q chat コマンドで Amazon Q Developer CLI を起動しましょう。このブログシリーズで作成した Amazon IVS MCP サーバーが正しく設定されていれば、起動時に利用可能なツール一覧に表示されます。

$ q chat
To learn more about MCP safety, see https://docs.aws.amazon.com/amazonq/latest/qdeveloper-ug/command-line-mcp-security.html
⚠ amazon_ivs_mcp_server has the following warning:
The following tool names are changed:
 - amazon_ivs_mcp_server___get-logging-configuration -> amazon_ivs_mcp_server___getloggingconfiguration
 - amazon_ivs_mcp_server___get-playback-restriction-policy -> amazon_ivs_mcp_server___getplaybackrestrictionpolicy
...more warnings
✓ 1 of 1 mcp servers initialized

⚠️の警告は気にしないでください。これは、Amazon Q がツール名を命名規則に合わせて変更したことを通知しているだけです。重要なのは「✓ 1 of 1 mcp servers initialized」という部分です。MCP サーバーとAmazon Q Developer CLI の連携に成功しました！

ツールの管理

ツールを管理するためのコマンドがいくつか用意されています。利用可能なコマンドの一覧は /help コマンドで確認できます。以下はツール関連のコマンド一覧です：

/tools        View and manage tools and permissions
  help        Show an explanation for the trust command
  trust       Trust a specific tool or tools for the session
  untrust     Revert a tool or tools to per-request confirmation
  trustall    Trust all tools (equivalent to deprecated /acceptall)
  reset       Reset all tools to default permission levels

ツールが「trust (信頼)」されていない場合、Amazon Q Developer CLI がそのツールを使用する必要があると判断したときに、「信頼」するかどうかの確認が表示されます。

Amazon Q Developer CLI から Amazon IVS MCP サーバーを利用する

これで、Amazon IVS があるアカウントについて Amazon Q に質問できるようになりました。例えば：

tell me about the last streaming session on demo-stage
(demo-stageでの最後のストリーミングセッションについて教えて)

と聞くと、次のような応答が返ってきます：

I'll help you find information about the last streaming session on "demo-stage". To do this, I need to first check if this stage exists and then retrieve
its session information.

Let me first list the available IVS Real-Time stages to find the one named "demo-stage":

🛠️  Using tool: list-stages from mcp server amazon_ivs_mcp_server
 ⋮
 ● Running list-stages with the param:
 ⋮  {
 ⋮    "arguments": {},
 ⋮    "name": "list-stages"
 ⋮  }
Allow this action? Use 't' to trust (always allow) this tool for the session. [y/n/t]:

list-stages ツールを信頼するために「t」と入力すると、ツールが実行されます。Amazon Q は質問に答えるために必要な情報を収集するまで、さまざまなツールを順次実行していきます。

I found the "demo-stage" in the list. Now let me get the stage sessions for this stage to find the last streaming session:

🛠️  Using tool: list-stage-sessions from mcp server amazon_ivs_mcp_server
 ⋮
 ● Running list-stage-sessions with the param:
 ⋮  {
 ⋮    "arguments": {
 ⋮      "arn": "arn:aws:ivs:us-east-1:[redacted]:stage/[redacted]"
 ⋮    },
 ⋮    "name": "list-stage-sessions"
 ⋮  }

🛠️  Using tool: get-stage-session from mcp server amazon_ivs_mcp_server
 ⋮
 ● Running get-stage-session with the param:
 ⋮  {
 ⋮    "arguments": {
 ⋮      "sessionId": "st-[redacted]",
 ⋮      "stageArn": "arn:aws:ivs:us-east-1:[redacted]:stage/[redacted]"
 ⋮    },
 ⋮    "name": "get-stage-session"
 ⋮  }

🛠️  Using tool: list-participants from mcp server amazon_ivs_mcp_server
 ⋮
 ● Running list-participants with the param:
 ⋮  {
 ⋮    "arguments": {
 ⋮      "arn": "arn:aws:ivs:us-east-1:[redacted]:stage/[redacted]",
 ⋮      "sessionId": "st-[redacted]"
 ⋮    },
 ⋮    "name": "list-participants"
 ⋮  }

最終的に、ステージセッションの概要が詳細に表示されます：

Based on the information I've gathered, here are the details about the last streaming session on "demo-stage":

## Last Streaming Session Details

• **Stage Name**: demo-stage
• **Session ID**: st-[redacted]
• **Start Time**: April 24, 2025 at 15:50:38 UTC
• **End Time**: April 24, 2025 at 16:55:06 UTC
• **Duration**: Approximately 1 hour and 5 minutes

### Participant Information:
• Total participants: 8
• 6 participants published content during the session
• 1 participant was identified with user ID "[redacted]"
• All participants are now disconnected

The session had multiple participants joining at different times, with the first participant joining at 15:50:37 UTC and the last one joining at 16:41:06 UTC. None of the participants had recording enabled during this session.
(このセッションには複数の参加者が異なる時間に参加し、最初の参加者はUTC 15:50:37に参加し、最後の参加者はUTC 16:41:06に参加しました。このセッション中、録画を有効にした参加者はいませんでした。)

This was the most recent session on the demo-stage, which ended about 6 days ago.
(これがdemo-stageでの最新のセッションで、約6日前に終了しました。)

Amazon Q Developer CLIを使った Amazon IVS の迅速なプロトタイピング

前回の記事で作成したカスタム MCP クライアントと同様に、Amazon Q に簡単なプロトタイプアプリケーションを作成してもらうことができ、その際に Amazon IVS の専門知識とツールを活用できるようになります。

using the latest version of the amazon ivs web broadcast sdk, create a simple prototype application to broadcast to an amazon ivs low-latency channel. within the UI, include input elements for ingest endpoint and stream key, and let the user choose their webcam and mic. refer to the latest documentation as necessary: https://aws.github.io/amazon-ivs-web-broadcast/docs/low-latency-sdk-guides/introduction

(最新バージョンの Amazon IVS Web Broadcast SDK を使用して、Amazon IVS 低レイテンシーチャンネルに配信するシンプルなプロトタイプアプリケーションを作成してください。UI内には、インジェストエンドポイントとストリームキーの入力要素を含め、ユーザーがウェブカムとマイクを選択できるようにしてください。必要に応じて最新のドキュメントを参照してください：https://aws.github.io/amazon-ivs-web-broadcast/docs/low-latency-sdk-guides/introduction)

このプロンプトに対して、Amazon IVS 低レイテンシースチャンネルへの配信が可能なシンプルな HTML アプリケーションが生成されます。

まとめ

本記事では、カスタム MCP サーバーを Amazon Q Developer CLI でしようできるように設定する方法を紹介しました。この機能は自作の MCP サーバーだけでなく、信頼できる既存の MCP サーバーを追加して Amazon Q の能力を拡張できるため、開発者にとって非常に強力なツールとなります。ぜひ、インストールして試してみてください！

原文はこちらです。
本ブログは Amazon Web Services Japan Startup Solutions Architect の Torgayev Tamirlan が翻訳しました。

What Would a Live Stream Look Like on a Commodore 64?

Todd Sharp — Fri, 02 May 2025 15:56:43 +0000

This morning my buddy Raymond Camden shared a fun little experiment. He has been playing around with a bit of "vibe coding", and decided to ask Claude to create a simple HTML file to take an image from the DOM and generate a "pixel art" version of that image. I was immediately curious if I could get Amazon Q Developer to create a similar demo to pixelate a webcam in real-time. This led me down quite a fun path that ended up with an absolutely amazing solution. Let me tell you about it...

The Inspiration

Here's Raymond's solution. A fun, but rather basic pixel art demo.

I loved the effect, but thought it would be even better if applied to a webcam stream in real-time!

First Iteration - Pixelate a Webcam

My first prompt simply asked Amazon Q to create an HTML file that uses the Insertable Streams for MediaStreamTrack API to pixelate a webcam. The result was pretty good!

It even added a parameter to specify the pixel size, so you could modify how much pixelization to add to the resulting video.

Here's an 8 pixel version:

And a 16 pixel version:

Second Iteration - Commodore 64 Style

The pixel effect was great, and it immediately made me think of the "old days" and the low-level graphics that I grew up with. I shared it with my buddy and well-known retro computing enthusiast Darko Mesaroš and told him that it reminded me of what a live stream would look like on a Commodore 64. Being the expert that he was, he mentioned that the C64 pixels were double-wide, so I went back to Amazon Q and asked it to create a version that would reflect what a live stream would look like on a C64. It knew the proper pixel ratio, and even took the initiative to modify the pixel colors to the authenticate 16-color C64 palette! Glorious!!

Final Iteration - The Full C64 Webcam Simulator

Feeling quite happy with the results, I decided to prompt Amazon Q to update the UI to a full C64 experience. My prompt was rather vague as I decided to see what it could come up with. The final result was so much more than I could have expected. Amazon Q even decided to include a full terminal with a C64 style blinking cursor. You can try it out for yourself below. Choose a camera, click 'START CAMERA', and then type RUN at the C64 prompt!

⚠️ Note: CodePen can't get webcam permissions when running in embedded mode here on dev.to. Visit the pen in a new browser tab to try it out!

Summary

This was a fun experiment to see what I could create with the Amazon Q Developer CLI, and it helped me learn more about using the Insertable Streams API to modify a video track at the frame level. Sometimes learning new things can be fun just by vibing with an assistant to build a silly, throwaway app like this.

Did this post inspire you to create something similar? Post your ideas and screenshots in the comments below!

Forem: Todd Sharp

Swapping Participant Tokens in Real-Time with Amazon IVS

What We're Building

Self-Signed JWTs

Token Structure

Mutable vs. Immutable Fields

The Backend

The Frontend

Joining the Stage

Promoting a Guest to Publisher

Featuring a Participant

Unfeaturing and Demoting

Things to Watch Out For

Summary

Creating Interactive Gaming Experiences with Amazon GameLift Streams and Amazon Interactive Video Service

Enabling Your Game’s Success with AWS

Level 1: Boost Your Game with the Cloud

Level 2: User-Generated Content Is the New Meta

Level 3: Chat, Is This Real?

Level 4: Opening the Loot Box

Cheat Code for Viral Success

Direct Broadcasting From the GameLift Streams Instance (Sidecar)

Rebroadcasting From the Player's Client (Amazon IVS Web Broadcast SDK)

Easter Egg: Passing Control Between Multiple Participants

Will You Accept the Quest?

Virtual Assistants with Amazon IVS and OpenAI's Realtime API

❓ How Is This Different?

🤖 Try It Out!

👏 Extending The Solution

Live Streaming to Amazon IVS on an ESP32 Microcontroller with Embedded Sensor Metadata

📟 The Board

🤝 Integrating with Amazon IVS

🏷️ SEI Support

🎉 Try it Out!

4 Ways to Supercharge Your HLS Live Streaming App with AI-Powered Analysis

1️⃣ Real-Time Frame Analysis for Content Discovery

2️⃣ Live Transcription with Timed Metadata Publishing

3️⃣ Comprehensive Video Segment Analysis

4️⃣ Advanced Audio-Video Synchronization

🏁 Summary

5 Ways to Improve Your UGC Live Streaming App with AI

1️⃣ Transcribing a Real-Time Stream

2️⃣ Analyzing Individual Frames of a Real-Time Stream

3️⃣ Analyzing Audio/Video Chunks in a Real-Time Stream

4️⃣ Indexing VODs with TwelveLabs Marengo

5️⃣ Curating VOD Highlights

🏁 Summary

Publishing Real-Time Video via WHIP to Amazon IVS

Create a Stage

1️⃣ Broadcasting with OBS

2️⃣ Broadcasting with Python's aiortc

3️⃣ Broadcasting with FFMPEG

4️⃣ Broadcasting with srtc

Summary

Strands Agents SDK を使用してライブストリームからソーシャルメディア用のクリップを自動生成する

ソリューション概要

ソリューションアーキテクチャ

ツールの定義

エージェントへのプロンプト

アプリケーションの実行

エージェントの拡張

Building Virtual Agents with Amazon Nova Sonic S2S

Agent Demo

Introducing Amazon Nova Sonic

Building A Live-Stream Agent

Tools ⚒️

Giving the Agent Vision 🤖👀

Try It Out!

Summary

Auto-Generating Clips for Social Media from Live Streams with the Strands Agents SDK

Solution Overview

Solution Architecture

Defining Tools

Prompting the Agent

Running the Application

Enhancing the Agent

MCP サーバーを活用して Amazon Q デベロッパーCLIの能力を強化しよう

Amazon Q Developer

Amazon Q Developer CLI にカスタム MCP サーバーを追加する

ツールの管理

2️⃣ Broadcasting with Python's `aiortc`

4️⃣ Broadcasting with `srtc`