Forem: Robindeva

How I Built a Recipe Extractor from YouTube Using AWS Transcribe

Robindeva — Wed, 04 Mar 2026 09:18:50 +0000

"Cooking videos are great, but following along in the kitchen is a pain. You're elbow-deep in dough and suddenly need to rewind for that one ingredient you missed."

So I built a small pipeline that takes any YouTube cooking video, pulls the audio, sends it to Amazon Transcribe, and gives me a clean text file of the entire recipe.

No paid tools. No complex setup. Just AWS services and a few Python scripts.

What the Pipeline Does

YouTube Video
↓
Download Audio (yt-dlp)
↓
Upload to S3
↓
Amazon Transcribe
↓
recipe.txt

Four steps. That's it.

Step 1 — Download the Audio

I used yt-dlp to pull just the audio from the video. No need to download the full video.

yt-dlp \ --extract-audio \ --audio-quality 0 \ --output "output/audio.%(ext)s" \ "https://youtu.be/YOUR_VIDEO_ID"

One thing I ran into — ffmpeg was not installed on my machine, so the mp3 conversion failed. But Amazon Transcribe supports webm format natively, so I skipped the conversion entirely and uploaded the raw .webm file. Saved time.

Step 2 — Create an S3 Bucket and Upload

BUCKET_NAME="recipe-transcribe-$(date +%s)" aws s3 mb s3://$BUCKET_NAME --region us-east-1 aws s3 cp output/audio.webm s3://$BUCKET_NAME/audio.webm

Using date +%s as a suffix keeps the bucket name unique without any extra thinking.

Step 3 — Start the Transcribe Job

  import boto3

  BUCKET_NAME = "your-bucket-name"
  JOB_NAME    = "recipe-job-01"
  REGION      = "us-east-1"
  MEDIA_URI   = f"s3://{BUCKET_NAME}/audio.webm"

  client = boto3.client("transcribe", region_name=REGION)

  client.start_transcription_job(
      TranscriptionJobName=JOB_NAME,
      Media={"MediaFileUri": MEDIA_URI},
      MediaFormat="webm",
      LanguageCode="en-US",
      OutputBucketName=BUCKET_NAME,
      OutputKey="transcript.json",
  )

Amazon Transcribe picks up the file from S3 and writes transcript.json back to the same bucket once done.

Step 4 — Poll the Job and Save the Recipe

while True:
      response = transcribe.get_transcription_job(TranscriptionJobName=JOB_NAME)
      status   = response["TranscriptionJob"]["TranscriptionJobStatus"]
      print(f"Status: {status}")

      if status == "COMPLETED":
          break
      if status == "FAILED":
          raise RuntimeError("Job failed")

      time.sleep(15)

  # Download and extract plain text
  s3.download_file(BUCKET_NAME, "transcript.json", "output/transcript.json")

  with open("output/transcript.json") as f:
      data = json.load(f)

  text = data["results"]["transcripts"][0]["transcript"]

  with open("output/recipe.txt", "w") as f:
      f.write(text)

The script checks every 15 seconds. For a 10-minute video, the job finished in about a minute.

The Output

Here's what came out for a Guntur Chicken Masala video:

Readable. Accurate. Ready to use in the kitchen.

IAM Permissions You Need

  {
    "Effect": "Allow",
    "Action": [
      "s3:CreateBucket",
      "s3:PutObject",
      "s3:GetObject",
      "transcribe:StartTranscriptionJob",
      "transcribe:GetTranscriptionJob"
    ],
    "Resource": "*"
  }

What I'd Build Next

Trigger the whole pipeline on S3 upload via Lambda
Process a full YouTube playlist at once
Add speaker labels for videos with multiple hosts

The full code is on GitHub:
(https://github.com/robindeva/Extracting-a-Recipe)

How We Cut AWS Costs by 65% in 3 Weeks Without Sacrificing Anything

Robindeva — Wed, 04 Feb 2026 18:46:50 +0000

Our dev environment bill was killing us. Every day was costing around $28, and no matter what we did, that number wouldn't budge. We'd optimize here, scale down there, but nothing seemed to stick. Then one afternoon, I actually looked at the bill line by line instead of just checking the dashboard.

That's when I found the real problem.

The Two Leaks We Discovered

The ECS Cluster Nobody Used
We had spun up a full ECS cluster for development workloads. Full cluster. For dev. The reasoning at the time was probably something like "production-grade infrastructure everywhere" which sounds good until you realize you're paying production prices for development work.

The reality: most of these workloads didn't need that level of orchestration. We were running simple background jobs and web services that spent most of their time idle. The cluster was burning money just sitting there waiting for occasional traffic.

The Jump Boxes That Sat Empty
We had Windows jump hosts for database access. Two of them. Running 24/7. They were idle probably 90% of the time, but there they were on the bill every single day.

On top of that, we were paying for NAT gateways to route traffic to these instances. Those aren't cheap when you're not actively using them.

What We Actually Changed

We didn't reinvent the wheel. We just right-sized our architecture to match what we actually needed.

Moving to Lightsail
We migrated the non-critical workloads off ECS and onto Amazon Lightsail. This sounds like a bigger change than it was. In practice, we containerized the same code, pointed it at Lightsail's container service, and we were done.

The difference was immediate. Lightsail is simpler, cheaper, and honestly overkill for what we were running anyway. The workloads didn't care about ECS's orchestration features. They just needed to run.

Daily compute cost dropped hard after this move.

Replacing Jump Boxes With SSH Tunneling
The jump boxes were there for one reason: secure access to databases. But a $150-300/month Windows instance is a sledgehammer solution for something that SSH tunneling solves better and cheaper.

We set up a bastion host using a basic EC2 instance (the smallest one you can get), configured SSH tunneling, and removed the expensive jump boxes. Developers could tunnel securely to the database without maintaining dedicated gateway infrastructure.

Cost went from paying for always-on jump hosts to paying for a tiny bastion that barely gets touched.

The Numbers

Before: $28/day
After: under $10/day
That projects to about $540/month in savings, or roughly 65% less.

Did we lose any functionality? No. The dev environment works the same. Deployments work the same. The only difference is developers don't have to wait for infrastructure that was overbuilt for what they actually do.

Why This Matters

There's this pattern in cloud architecture where "enterprise-grade" becomes the default. It makes sense for production—you want resilience, you want redundancy, you want things to survive failures gracefully.

But development environments aren't production. They're where you build and test. They need to be reliable enough that your team isn't blocked, but they don't need to cost like they're running Netflix.

The biggest optimization win isn't usually some clever trick. It's looking at what you're actually using versus what you're actually paying for, then doing something about the gap.
In our case, the gap was huge.

If You're In a Similar Spot

Take 30 minutes this week and break down your non-production AWS bill by service. Look for the same patterns we found—full-featured services handling simple workloads, infrastructure running idle most of the time, features you provisioned "just in case" but never use.

You might find the same kinds of leaks. And fixing them is usually simpler than you'd think.

NAT Gateway vs VPC Endpoints: Which One Should You Use?

Robindeva — Wed, 07 Jan 2026 10:51:56 +0000

NAT Gateways cost money. A lot of it if you're not careful. If you're using them to connect private resources to AWS services like S3 or DynamoDB, you're probably overpaying. VPC Endpoints do the same job for a fraction of the cost.

The problem

You have an EC2 instance in a private subnet. It needs to upload files to S3, write logs to CloudWatch, hit DynamoDB. Basic stuff. But it has no internet access.

How does it talk to AWS services?

Using a NAT Gateway

A NAT Gateway gives your private resources internet access.
Your EC2 wants to upload to S3. The request goes: instance → NAT Gateway → Internet Gateway → actual internet → S3 public endpoint → all the way back.

Even though S3 is an AWS service, your data goes out to the internet first. Feels backwards.

Cost breakdown: $0.045/hour for the gateway (roughly $32/month) plus $0.045 per GB. Moving 800GB monthly? That's another $36. One NAT Gateway = $68/month. Need two for high availability? Double it.

Security-wise, your traffic is leaving your VPC. It's encrypted, sure, but it's still going through the public internet to reach AWS services that are... also in AWS. Never sat right with me.

Option 2: VPC Endpoints (wish I'd known about this sooner)

This one's different. VPC Endpoints create a direct private connection from your VPC to AWS services. No internet involved.
Same scenario with S3: your request goes from EC2 → straight to S3 through AWS's private network. Done. Traffic never leaves AWS infrastructure.

Cost difference is huge. Gateway endpoints for S3 and DynamoDB? Free. Actually free. Interface endpoints for other services run about $0.01/hour (roughly $7/month).

I replaced my NAT Gateway setup with VPC Endpoints and cut that $136/month down to around $15. No joke.

Security is better because everything stays internal. You can also set policies on endpoints to control exactly which S3 buckets your instances can access.

Here's a real situation from my project
I'm running a data processing pipeline. Lambda functions pull CSV files from S3, process them, dump results in DynamoDB, and send completion notifications through SNS. All this runs in private subnets.
What I was doing (NAT Gateway):

Two NAT Gateways (one per AZ) = $64/month
Processing about 500GB = $22.50/month
Total = $86.50/month

What I switched to (VPC Endpoints):

S3 Gateway Endpoint = $0
DynamoDB Gateway Endpoint = $0
SNS Interface Endpoint = $7/month
Total = $7/month

Saved almost $80 monthly on one project. Multiply that across environments and projects, and it adds up fast.
So when should you actually use each?

NAT Gateway makes sense when:

Your app needs to call external APIs (Stripe, Twilio, whatever)
You're pulling packages from npm, pip, apt repositories
You need actual internet access for patches and updates

VPC Endpoints make sense when:

You're talking to AWS services (and most support endpoints now).
You care about keeping costs down.
Compliance requires traffic stay off the public internet.

Most of the time, you'll use both. I keep a NAT Gateway for external API calls and package downloads, but use VPC Endpoints for all AWS service communication.

Quick setup notes

Creating an S3 endpoint is straightforward:

VPC console → Endpoints → Create
Pick com.amazonaws.your-region.s3
Select your VPC and route tables
Done

Your application code doesn't change. S3 SDK calls automatically route through the endpoint.

Mistakes I made (so you don't have to)

First mistake: I created Interface Endpoints for S3. Turns out Gateway Endpoints exist for S3 and they're free. Interface Endpoints cost money. Oops.

Second mistake: Forgot to update security groups for Interface Endpoints. Spent 20 minutes debugging why my Lambda couldn't reach SNS. Security group wasn't allowing the traffic.

Third mistake: Left my NAT Gateways running after setting up endpoints "just to be safe." Wasted $130 over two months before I actually verified I didn't need them anymore.

What I'd recommend

Look at your CloudTrail logs or VPC Flow Logs. See what services you're actually calling. If they support VPC Endpoints (S3, DynamoDB, Lambda, SQS, SNS, Secrets Manager, ECR, and tons more), switch to endpoints.

Keep NAT Gateway only for actual internet access needs. Don't use it as a catch-all for AWS service communication.

Check your bill after a month. You'll probably see the difference immediately.

One more thing

Different endpoint types matter:

Gateway Endpoints (S3, DynamoDB): Free, route table based, no security groups needed.

Interface Endpoints (everything else): Cost money, need security groups, ENI-based.

Always check if a Gateway Endpoint exists before creating an Interface Endpoint.

That's pretty much it. This change alone dropped our AWS networking costs by 60%. Your mileage may vary, but it's worth looking into.

Why Your EC2 is Taking the Long Way to S3

Robindeva — Sun, 04 Jan 2026 08:32:45 +0000

Last month, I was reviewing a client's AWS infrastructure when something caught my eye. Their monthly bill had a line item that didn't sit right with me - data transfer charges that seemed way too high for what they were doing.

After digging through their architecture, I found the culprit. Their EC2 instances were talking to S3 buckets the long way around - through the public internet.

Wait, Why Does That Matter?

Here's the thing most people don't realize when they start with AWS. When your EC2 instance needs to grab a file from S3, it has to go somewhere to get it. Without any special configuration, that "somewhere" is out through your Internet Gateway, across the public internet, and back into AWS to reach S3.

Think of it like this. Imagine you work in a large office building. Your colleague sits on the same floor, just around the corner. But instead of walking over to their desk, you leave the building, walk around the block, enter through the main lobby, go through security again, and then finally reach them. Sounds ridiculous, right?

That's exactly what happens when EC2 talks to S3 without a VPC Endpoint.

What Was Actually Happening

The client had a pretty standard setup. Application servers running on EC2, storing and retrieving files from S3. Nothing fancy. But every single S3 request was taking the scenic route through the internet.

This created two problems.

First, they were paying data transfer fees for traffic that didn't need to leave AWS at all. Every gigabyte going out through the Internet Gateway costs money. When you're moving terabytes of data to and from S3 daily, those charges add up fast.

Second, their data was traveling across the public internet unnecessarily. Even though S3 connections are encrypted, why expose your traffic to the outside world when you don't have to?

The Fix Was Surprisingly Simple

We created a VPC Gateway Endpoint for S3. That's it. No agents to install, no complex networking changes, no application code modifications.

A Gateway Endpoint is basically a private door between your VPC and S3. Once it's in place, traffic to S3 stays entirely within the AWS network. Your EC2 instance talks to S3 through this private connection instead of going out to the internet.

Here's what the setup looks like in practice:

# Using AWS CLI to create a Gateway Endpoint for S3
aws ec2 create-vpc-endpoint \
    --vpc-id vpc-1234567890abcdef0 \
    --service-name com.amazonaws.ap-south-1.s3 \
    --route-table-ids rtb-1234567890abcdef0

You'll need to attach it to the route tables used by your subnets. After that, any traffic destined for S3 automatically uses the endpoint. The applications don't even know the difference - they keep using S3 the same way they always did.

What Changed After Implementation

The data transfer charges dropped noticeably in the next billing cycle. I won't throw around specific numbers because every environment is different, but the reduction was significant enough that the client asked me what else we could optimize.

Beyond the cost savings, there's a security benefit that's harder to quantify. Traffic between EC2 and S3 no longer traverses the public internet. It stays on AWS's private backbone. For workloads dealing with sensitive data, this is a meaningful improvement.

A Few Things Worth Knowing

Gateway Endpoints are free. AWS doesn't charge you for creating or using them. The only thing you pay for is the standard S3 request and storage costs you'd pay anyway.

They work for S3 and DynamoDB. If you need private connectivity to other AWS services like SQS, SNS, or Secrets Manager, you'll want Interface Endpoints instead. Those do have an hourly charge, but they're still cheaper than routing everything through NAT Gateways.

One gotcha I've seen trip people up - if your S3 bucket policy restricts access by source IP, you might need to update it. Traffic through a Gateway Endpoint doesn't come from your NAT Gateway's public IP anymore. You can use VPC Endpoint conditions in your bucket policy to handle this.

The Bigger Picture

Cost optimization in AWS isn't always about buying Reserved Instances or committing to Savings Plans. Sometimes the biggest wins come from architectural decisions that seem minor on the surface.

I've seen teams spend hours negotiating enterprise discounts while ignoring networking patterns that waste hundreds of dollars every month. A quick review of your VPC configuration, endpoint usage, and data transfer patterns can reveal opportunities that are easier to capture and often have immediate impact.

If you haven't looked at your data transfer charges lately, it might be worth a few minutes of your time. You might find your infrastructure is taking the long way around too.

Understanding AWS VPC: A Guide I Needed When I First Started

Robindeva — Mon, 29 Dec 2025 09:45:55 +0000

When I started studying AWS, VPC was the topic that made me want to close my laptop and go for a walk.

Every tutorial I found was either too technical or too vague. Then one day, a senior engineer explained it to me using a house analogy, and suddenly it all made sense.

So here's my attempt to pass that clarity forward.

So What Exactly is a VPC?

VPC stands for Virtual Private Cloud. Fancy name, simple concept.

It's your own private network inside AWS. Think of it as buying a plot of land with a boundary wall. Inside that wall, you decide everything. What buildings go where. Who can enter. Who can leave. What roads connect what.

AWS gives you the land. You design the layout.

Subnets - Dividing Your Space

Once you have your land (VPC), you need to organize it. That's where subnets come in.

A subnet is just a smaller section of your VPC. Like dividing your property into a front yard and a backyard.

Public Subnet - This is your front yard. It faces the main road. Visitors can see it and reach it. You put things here that need to interact with the outside world. Web servers, load balancers, that kind of stuff.

Private Subnet - This is your backyard. Hidden from the road. No direct access from outside. You keep valuable things here. Databases, application servers, anything you don't want exposed.

Internet Gateway - Your Main Door

Here's something that tripped me up initially.

Just because you call something a "public subnet" doesn't automatically make it public. You need an Internet Gateway attached to your VPC.

The Internet Gateway is like the main door of your property. Without a door, nobody gets in or out. Doesn't matter how many rooms you have inside.

So when you create a public subnet, you also need to:

Attach an Internet Gateway to your VPC
Update the route table so traffic knows to use that gateway

Skip either step and your "public" subnet is just a private subnet with a misleading name.

NAT Gateway - The Interesting One

This took me a while to understand. Why would private resources ever need internet access?
Well, think about it. Your database server in a private subnet might need to:

Download security patches
Pull updates from package repositories
Send logs to an external monitoring service

But you don't want anyone from the internet connecting TO your database. That would defeat the purpose of keeping it private.
NAT Gateway solves this. It lets your private resources reach out to the internet, but blocks any incoming connections. One way traffic.

Like having a mail slot in your door. You can send letters out. But nobody can climb in through it.

Putting It Together - A Real Example

Let me walk through how I set up a recent project.

I needed to deploy a simple web app with a database. Here's what I created:
VPC: 10.0.0.0/16 (gives me plenty of IP addresses to work with)
Public Subnet: 10.0.1.0/24

Placed my load balancer here
Also added a bastion host for SSH access when I need to troubleshoot

Private Subnet: 10.0.2.0/24

My application servers live here
They receive traffic from the load balancer
They can reach the internet through NAT Gateway for updates

Another Private Subnet: 10.0.3.0/24

Database goes here
Only the application servers can talk to it
Completely isolated from the internet

Internet Gateway: Attached to the VPC, route table configured for the public subnet
NAT Gateway: Placed in the public subnet (it needs internet access to work), private subnets route outbound traffic through it
When a user visits my site:

Request comes through the Internet Gateway
Hits the load balancer in public subnet
Load balancer sends it to an app server in private subnet
App server queries the database in the other private subnet
Response goes back the same way

The database never touches the internet. The app servers only receive traffic from the load balancer. Everything stays controlled.

Mistakes I Made Along the Way

Forgot to update route tables - I created an Internet Gateway, attached it to my VPC, and wondered why my EC2 instance still couldn't reach the internet. Turns out the route table for that subnet was still pointing nowhere. Rookie mistake.

Put RDS in a public subnet "temporarily" - We all know how temporary solutions go. Don't do this. Just set it up properly from the start.

Used only one Availability Zone - My NAT Gateway was in one AZ. When that AZ had issues, my private resources in other AZs lost internet access. Now I always create NAT Gateways in multiple AZs for production workloads.

Overcomplicated the CIDR blocks - Started with a /24 VPC because I thought I wouldn't need much space. Then had to recreate everything when I needed more subnets. Start with /16 and give yourself room to grow.

That's Really It

VPC seems complex because AWS documentation throws every possible option at you. But the core concept is straightforward.
You get an isolated network. You divide it into sections. Some sections face the internet, some don't. Gateways control what traffic flows where.

Start with this mental model. Build a simple VPC manually through the console. Don't use the default VPC that AWS creates for you. Actually go through the steps yourself. Create subnets, attach gateways, configure route tables.

Break something. Figure out why it broke. Fix it.

That's how I learned. Probably how you'll learn too.

Built a Cost Spike Alert System After AWS Charged Me $800 for a Forgotten EC2 Instance

Robindeva — Fri, 26 Dec 2025 14:57:57 +0000

So there I was, checking my AWS bill like I do every month, and boom - $800 more than expected. Turns out I left a beefy EC2 instance running in ap-south-1 after a client demo three weeks ago. Classic mistake. We've all done it.

That night I decided enough was enough. I needed something that would ping me the moment costs started looking weird, not after the damage was done.

Here's what I built and how you can set it up too.

The Problem with Regular Budget Alerts

AWS Budgets exists, yeah. But here's the thing - it only tells you when you've crossed a line you drew yourself. If I set a $500 budget and I hit $499, silence. Hit $501, alert. But what if my normal spend is $200 and suddenly it jumps to $400? That's a 100% increase and Budgets won't say a word because I'm still "under budget."

What I needed was something smarter. Something that knows what normal looks like and yells when things go sideways.

Enter Cost Anomaly Detection

AWS has this service called Cost Anomaly Detection that does exactly this. It uses ML to figure out your spending patterns and alerts you when something breaks the pattern. And get this - it's free. No extra charges for the detection itself.

Let me walk you through setting this up.

Part 1: Basic Setup (10 minutes)
Go to Billing Console > Cost Anomaly Detection. If you created your AWS account recently, there might already be a default monitor there.
Click Create monitor and pick AWS Services as the type. This watches all your services together. Name it something useful - I called mine all-services-prod.

Now create an alert subscription:

Name: daily-cost-alerts
Frequency: Individual alerts (I want to know immediately)
Threshold: $50 (adjust this based on your typical spend)

For notification, you have two options:

Direct email - simple, works fine
SNS topic - more flexible, what I recommend

I chose SNS because I wanted to send alerts to Slack and also format the messages eventually.

Part 2: Setting Up SNS

Head to SNS Console and create a standard topic. I named mine cost-alerts.
Add your email as a subscriber. You'll get a confirmation email - click the link or nothing will work. I've forgotten this step more times than I'd like to admit.
Go back to Cost Anomaly Detection and update your subscription to use this SNS topic.
At this point, you've got a working system. AWS will detect weird spending and email you. Done.
But the emails look terrible. Just a blob of JSON. So I added a Lambda function to make them readable.

Part 3: Making Alerts Actually Useful

{"anomalyId":"abc123","accountId":"1234567890","impact":{"totalImpact":127.45},"rootCauses":[{"service":"Amazon EC2","region":"us-east-1"}]}

Not exactly scannable when you're checking your phone at dinner.

Here's a Lambda function that turns this mess into something human:

import json
import boto3
import os

sns = boto3.client('sns')

def lambda_handler(event, context):
    for record in event.get('Records', []):
        try:
            msg = json.loads(record['Sns']['Message'])

            impact = msg.get('impact', {}).get('totalImpact', 0)
            causes = msg.get('rootCauses', [])

            # Build readable message
            text = f"Cost spike detected: ${impact:.2f}\n\n"
            text += "What's causing it:\n"

            for c in causes:
                svc = c.get('service', 'Unknown')
                region = c.get('region', 'Unknown')
                text += f"- {svc} in {region}\n"

            text += f"\nCheck it out: https://console.aws.amazon.com/cost-management/home#/anomaly-detection"

            sns.publish(
                TopicArn=os.environ['ALERT_TOPIC'],
                Subject=f'AWS Cost Alert: ${impact:.2f} spike',
                Message=text
            )

        except Exception as e:
            print(f"Error processing: {e}")
            raise

    return {'statusCode': 200}

Create this function with Python 3.11 runtime. Add an environment variable ALERT_TOPIC with your SNS topic ARN.

The function needs permission to publish to SNS. Add this to the execution role:

{
    "Version": "2012-10-17",
    "Statement": [{
        "Effect": "Allow",
        "Action": "sns:Publish",
        "Resource": "arn:aws:sns:ap-south-1:YOUR_ACCOUNT:cost-alerts"
    }]
}

Now you need to wire it up. Create another SNS topic called cost-anomaly-raw. Set your Lambda as a subscriber to this topic. Then update Cost Anomaly Detection to send to cost-anomaly-raw instead of directly to cost-alerts.

The flow is now:
Anomaly detected → cost-anomaly-raw → Lambda formats it → cost-alerts → Your email

Part 4: Catching Expensive Stuff in Real-Time

Here's the thing about Cost Anomaly Detection - it relies on billing data, which can be delayed by hours. If someone spins up a p4d.24xlarge (those GPU monsters that cost $30+/hour), I don't want to find out 6 hours later.
EventBridge lets us catch these events as they happen.

Create an EventBridge rule with this pattern:

{
  "source": ["aws.ec2"],
  "detail-type": ["EC2 Instance State-change Notification"],
  "detail": {
    "state": ["running"]
  }
}

Point it at this Lambda:

import boto3
import os

sns = boto3.client('sns')
ec2 = boto3.client('ec2')

# Instance types that'll wreck your budget
EXPENSIVE = ['p4d', 'p3', 'p2', 'x2', 'u-', 'dl1', 'inf1', 'g5', 'g4dn']

def lambda_handler(event, context):
    instance_id = event['detail']['instance-id']

    resp = ec2.describe_instances(InstanceIds=[instance_id])
    instance_type = resp['Reservations'][0]['Instances'][0]['InstanceType']

    # Check if it's something expensive
    if any(e in instance_type for e in EXPENSIVE):
        msg = f"Heads up: {instance_type} just launched\n"
        msg += f"Instance: {instance_id}\n"
        msg += f"Region: {event['region']}\n"
        msg += f"\nThese instances are expensive. Make sure this was intentional."

        sns.publish(
            TopicArn=os.environ['ALERT_TOPIC'],
            Subject=f'Expensive EC2 launched: {instance_type}',
            Message=msg
        )

    return {'statusCode': 200}

Now you get pinged within seconds when someone launches a GPU instance or any other budget-killer.

What This Actually Catches

After running this for about two months, here's what it's caught for me:

A Lambda function that got stuck in a retry loop (spotted within 2 hours instead of end of month)
Teammate who launched m5.4xlarge instead of t3.medium for testing
Unexpected data transfer spike when a client started hammering our API
S3 request costs jumping 3x after a deployment (turned out we had a logging misconfiguration)

Few Things I Learned

Start with a higher threshold. I initially set $20 and got way too many alerts for normal fluctuations. Bumped it to $50 and the noise dropped significantly.

The ML needs time. Cost Anomaly Detection takes about 10 days to learn your patterns for new services. Don't expect accurate alerts on day one.

Daily summaries exist. If individual alerts feel like too much, you can switch to daily digest emails. I use individual for production accounts and daily for dev.

Tag your resources. You can create monitors based on cost allocation tags. If you tag by team or project, you can send alerts to the right people automatically.

Total Cost to Run This

Cost Anomaly Detection: Free
Lambda invocations: Maybe $0.10/month if that
SNS notifications: Cents
The whole thing costs practically nothing to run.

Conclusion

Look, cloud cost management isn't glamorous. Nobody's going to pat you on the back for setting up billing alerts. But when you catch a runaway service before it racks up a four-figure bill, you'll be glad you spent the hour setting this up.

The basic setup (Part 1 and 2) takes maybe 10 minutes. That alone will save you from most surprises. Add the Lambda formatting if you want nicer alerts. Add the EventBridge rule if you're paranoid about expensive instances like I am now.

Finally! Amazon RDS Tells You What's Actually Happening During Snapshot Exports

Robindeva — Tue, 23 Dec 2025 10:20:58 +0000

If you've ever kicked off an RDS snapshot export to S3 and then sat there wondering "Is this thing even working?"—this one's for you.
AWS quietly dropped an update on December 19, 2025 that solves one of those small but annoying pain points we've all dealt with: lack of visibility into snapshot exports.

The Problem We've All Faced

Picture this scenario. You're migrating a production database, and compliance requires you to archive snapshots to S3 in Parquet format for long-term analytics. You start the export, grab a coffee, come back... and the console just shows "In Progress."
No idea how far along it is. No clue which tables are done. No visibility into whether that one massive table is going to take 10 minutes or 10 hours.
I've been in situations where stakeholders ask "When will this be done?" and all I could say was "It's running." Not a great look.

What Changed

RDS now gives you actual insight into your snapshot exports. Four new event types tell you:

Current export progress — percentage complete, tables exported vs pending
Table-level notifications — especially useful for those massive tables that take forever
Exported data sizes — helps you track throughput and plan storage
Troubleshooting recommendations — when something fails, you get actionable guidance

A Real-World Example

Say you're exporting a 500GB snapshot from your e-commerce RDS MySQL instance. The snapshot has 45 tables, but three of them—orders, order_items, and user_activity_logs—make up 80% of the data.

Before this update:

Export Status: In Progress
Started: 2 hours ago

That's it. You're flying blind.

After this update:

You subscribe to SNS notifications and get events like:

Export Progress: 67% complete
Tables exported: 42/45
Tables pending: orders, order_items, user_activity_logs
Data exported: 115 GB
Current table: orders (estimated 180 GB)

Now you can tell your team: "The three big tables are still processing. Based on the throughput, we're looking at another 90 minutes."

Setting This Up

The setup is straightforward:

Go to RDS in the AWS Console
Navigate to Event subscriptions
Create a new subscription targeting snapshot export events
Point it to an SNS topic
Subscribe your email, Slack webhook, or Lambda function to that topic From the CLI:

aws rds create-event-subscription \
  --subscription-name snapshot-export-alerts \
  --sns-topic-arn arn:aws:sns:us-east-1:123456789:rds-alerts \
  --source-type db-snapshot \
  --event-categories "export"

Where This Fits in Your Architecture

This feature becomes really valuable when you're building data pipelines. Consider this flow:

RDS Snapshot 
    → Export to S3 (Parquet)
    → SNS notifications track progress
    → Lambda triggers when export completes
    → Glue crawler catalogs new data
    → Athena/Redshift Spectrum queries available

The SNS events let you chain automation reliably. No more polling the API to check if the export finished.

Supported Engines

This works with:

RDS PostgreSQL
RDS MySQL
RDS MariaDB

Available in all commercial regions where RDS operates.

Why This Matters for Operations

Three practical benefits I see:

Better capacity planning. When you know export throughput, you can schedule jobs during maintenance windows with confidence.

Faster incident response. If an export fails at table 38 of 45, you know exactly where to look. The troubleshooting recommendations point you toward the issue.

Stakeholder communication. You can give accurate ETAs instead of vague "it's running" responses. Small thing, but it builds trust.

Wrapping Up

This isn't a flashy feature. It's a quality-of-life improvement that makes RDS snapshot exports actually manageable at scale.

If you're doing regular exports for compliance archiving, disaster recovery prep, or feeding data lakes—set up the SNS subscription. The visibility alone is worth the five minutes of configuration.

References:

AWS Announcement
RDS Event Categories Documentation

I Tested AWS Regional NAT Gateway — Here's Why It Changes Everything

Robindeva — Wed, 17 Dec 2025 10:11:50 +0000

Last week, I got my hands on the new Regional NAT Gateway for Amazon VPC. After spending a few hours testing it in a real environment, I wanted to share what I learned and why this matters for anyone building on AWS.

The Problem We've Been Living With

Let me paint you a picture. You're running a three-tier application spread across three Availability Zones (us-east-1a, us-east-1b, us-east-1c). Your backend servers sit in private subnets because, well, that's the right thing to do. But these servers need to reach the internet — maybe to pull security patches, call external APIs, or push logs to a third-party service.

Three NAT Gateways. Three public subnets. Three separate route tables to maintain. And if you forgot to set up one AZ properly? Your workloads in that zone lose internet access.

I've seen this go wrong more times than I'd like to admit. Someone provisions a new private subnet, forgets to update the route table, and suddenly their application can't reach external services. Hours of debugging later, they find a missing route.

What Regional NAT Gateway Actually Does

The new Regional NAT Gateway flips this model. Instead of deploying NAT Gateways per Availability Zone, you deploy one at the VPC level. AWS handles the rest.

One NAT Gateway. One route table entry. Done.

Real-World Test: What I Actually Did

I set up a test VPC with private subnets in three AZs. Each subnet had a few EC2 instances running a simple Python script that pings an external API every 30 seconds.

Traditional Setup (Before)

3 NAT Gateways provisioned
3 public subnets created to host these NAT Gateways
3 route tables with individual NAT Gateway targets
Total monthly cost estimate: ~$97 (NAT Gateway hourly charges alone)

Regional NAT Gateway (After)

1 Regional NAT Gateway
0 public subnets needed for NAT
1 route table entry pointing to the Regional NAT Gateway
Total monthly cost estimate: ~$32

The instances in all three AZs continued reaching the external API without interruption. I deliberately terminated and relaunched instances in different AZs to confirm the routing worked seamlessly. It did.

The Security Angle Nobody's Talking About

Here's something that doesn't get enough attention: public subnets are attack surface.

Every public subnet you create is a potential misconfiguration waiting to happen. Someone attaches an Elastic IP to the wrong instance. A security group gets too permissive. An Internet Gateway route gets added where it shouldn't.

With Regional NAT Gateway, you can architect VPCs where private subnets don't require companion public subnets. Your workloads stay private. The NAT Gateway handles outbound traffic without exposing your infrastructure.

For regulated industries — banking, healthcare, government — this simplified model makes compliance audits less painful. Fewer components means fewer things to document, monitor, and justify.

When Should You Use This?

Regional NAT Gateway makes sense when:
Your workloads span multiple Availability Zones
You want to reduce operational overhead
You're building new VPCs and want a cleaner architecture from day one
Cost optimization matters (and when doesn't it?)

Stick with zonal NAT Gateways if:

You need granular control over which AZ handles outbound traffic
Your compliance requirements mandate AZ-level isolation for network components
You have existing automation that depends on per-AZ NAT Gateway ARNs

How to Set It Up

The setup is straightforward. When creating a NAT Gateway in the console, you'll now see an option for "Availability zone mode" — select "Regional" instead of the default zonal option.

For Terraform users, the configuration looks like this:

resource "aws_nat_gateway" "regional" {
  connectivity_type = "public"
  subnet_id         = aws_subnet.public.id
  allocation_id     = aws_eip.nat.id

  # This is the key setting
  secondary_allocation_ids = []

  tags = {
    Name = "regional-nat-gateway"
  }
}

Then update your private subnet route tables:

resource "aws_route" "private_nat" {
  route_table_id         = aws_route_table.private.id
  destination_cidr_block = "0.0.0.0/0"
  nat_gateway_id         = aws_nat_gateway.regional.id
}

That single route table entry now covers all your private subnets across every AZ.

My Takeaway

After testing this feature, I'm convinced it belongs in most new VPC designs. The reduction in complexity is real. The cost savings are measurable. And the security posture improvement — while harder to quantify — matters.

AWS doesn't always get credit for these incremental improvements. They're not flashy like a new AI service or a major database feature. But for those of us who spend our days building and maintaining cloud infrastructure, this is the stuff that makes our lives easier.

If you're spinning up a new environment or revisiting your network architecture, give Regional NAT Gateway a look. It's one of those changes that seems small until you realize how much cleaner your diagrams become.

Building an AI-Powered Contact Center Quality Monitoring System on AWS

Robindeva — Tue, 02 Sep 2025 11:29:07 +0000

Introduction

Contact centers generate thousands of hours of customer calls every day. Reviewing them manually is not only time-consuming but also inconsistent. Supervisors often struggle to answer questions like:

How satisfied are customers during calls?
Are agents following compliance scripts?
Can we quickly summarize what happened in each call?

In this blog, I’ll walk you through how I built a serverless, AI-powered pipeline on AWS that automates all of this using:

Amazon S3 → for storing audio recordings
Amazon Transcribe → to convert speech to text
Amazon Bedrock (Titan) → to analyze sentiment, compliance, and summaries
Amazon DynamoDB → to store structured insights
Amazon QuickSight → to visualize the results in dashboards

Solution Architecture

Here’s the high-level flow of the system:

Upload Call Recording → stored in Amazon S3.
Lambda Function 1 (TranscribeLambda) → triggers Amazon Transcribe to convert speech → text.
Lambda Function 2 (BedrockAnalysisLambda) → processes the transcript using Amazon Bedrock (Titan).
Detects Sentiment (Positive/Negative/Neutral).
Runs a Compliance Check (Pass/Fail).
Generates a short summary of the call.
Results are saved in Amazon DynamoDB.
Supervisors view real-time analytics in Amazon QuickSight dashboards.

This modular design ensures reliability and scalability — transcription and AI analysis scale independently.

Setting Up the Pipeline
1. Create S3 Bucket

aws s3 mb s3://contact-center-demo-bucket

This bucket will store:

Input audio recordings
Transcribe-generated transcripts

2. DynamoDB Table
We use DynamoDB to store insights per call:

aws dynamodb create-table \
  --table-name CallAnalysis \
  --attribute-definitions AttributeName=CallID,AttributeType=S \
  --key-schema AttributeName=CallID,KeyType=HASH \
  --billing-mode PAY_PER_REQUEST

3. Lambda #1 — TranscribeLambda
This function runs when an audio file is uploaded.

import boto3, os, time, re

transcribe = boto3.client('transcribe')
s3_bucket = os.environ['S3_BUCKET']

def lambda_handler(event, context):
    file_name = event['Records'][0]['s3']['object']['key']
    base_name = file_name.split("/")[-1]
    safe_name = re.sub(r'[^0-9a-zA-Z._-]', '_', base_name)
    job_name = safe_name + "-" + str(int(time.time()))

    transcribe.start_transcription_job(
        TranscriptionJobName=job_name,
        Media={'MediaFileUri': f"s3://{s3_bucket}/{file_name}"},
        MediaFormat='mp3',
        LanguageCode='en-US',
        OutputBucketName=s3_bucket
    )

    return {"message": f"Started Transcription Job: {job_name}"}

4. Lambda #2 — BedrockAnalysisLambda (Titan)

This function analyzes transcripts via Amazon Titan and stores results in DynamoDB.

import boto3, json, os, urllib.parse

s3 = boto3.client('s3')
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table(os.environ['DDB_TABLE'])
bedrock = boto3.client("bedrock-runtime", region_name="us-east-1")

def lambda_handler(event, context):
    bucket = event['Records'][0]['s3']['bucket']['name']
    key = urllib.parse.unquote_plus(event['Records'][0]['s3']['object']['key'])

    if not key.endswith(".json"):
        return {"message": "Not a transcript JSON file"}

    response = s3.get_object(Bucket=bucket, Key=key)
    transcript_json = json.loads(response['Body'].read())
    transcript_text = transcript_json['results']['transcripts'][0]['transcript']

    prompt = f"""
    Analyze the following customer service call transcript:
    Transcript: {transcript_text}

    Provide the following JSON output:
    {{
      "Sentiment": "Positive | Negative | Neutral | Mixed",
      "ComplianceCheck": "Pass | Fail",
      "Summary": "<short summary of the call in 2 sentences>"
    }}
    """

    response = bedrock.invoke_model(
        modelId="amazon.titan-text-express-v1",
        contentType="application/json",
        accept="application/json",
        body=json.dumps({
            "inputText": prompt,
            "textGenerationConfig": {
                "maxTokenCount": 512,
                "temperature": 0.7,
                "topP": 0.9
            }
        })
    )

    result_str = response['body'].read().decode("utf-8")
    parsed = json.loads(result_str)
    output_text = parsed["results"][0]["outputText"]

    try:
        analysis = json.loads(output_text)
    except:
        analysis = {"RawOutput": output_text}

    table.put_item(Item={
        "CallID": key,
        "Transcript": transcript_text,
        "Sentiment": analysis.get("Sentiment", "Unknown"),
        "ComplianceCheck": analysis.get("ComplianceCheck", "Unknown"),
        "Summary": analysis.get("Summary", output_text)
    })

    return {"message": f"Processed call {key}", "Analysis": analysis}

QuickSight Dashboard

Finally, you can connect QuickSight to DynamoDB to see the data.

Sentiment Pie Chart → Positive/Negative/Neutral split.
Compliance KPI → Percentage of compliant calls.
Summary Table → Quick overview of each call. This gives supervisors real-time visibility into call quality.

Step-by-Step: Add S3 Triggers for Lambda

Open the S3 Bucket
Go to AWS Console → S3.
Find and click your bucket (contact-center-demo-bucket).
Go to the Properties tab.

Add Event Notification for Audio → Transcribe

Scroll down to Event notifications → click Create event notification.
Give it a name: AudioToTranscribe.
Event types: PUT / All object create events.
Prefix filter (optional but recommended): audio/
This keeps recordings in s3://bucket/audio/....
Suffix filter: .mp3 (or .wav if needed).
Destination: Lambda function.
Choose TranscribeLambda.
Save.

Add Event Notification for Transcript → Bedrock

Still in the same bucket → Create event notification again.
Name it: TranscriptToBedrock.
Event types: PUT / All object create events.
Prefix filter (optional): transcripts/
Suffix filter: .json.
Destination: Lambda function.
Choose BedrockAnalysisLambda.
Save.

Sample audio: https://github.com/aws-samples/amazon-transcribe-output-word-document/blob/main/sample-data/example-call.wav

Workflow

Here’s how the demo works in action:

Upload sample_call.mp3 to S3.
Amazon Transcribe job runs → transcript JSON created.
Transcript triggers Bedrock Lambda → Titan generates sentiment, compliance, and summary.
DynamoDB stores structured results.
QuickSight dashboard updates → insights appear instantly.

Business Impact

Faster QA: No need for manual call listening.
Consistency: AI applies rules the same way every time.
Scalability: Works for 100 or 100,000 calls.
Actionable Insights: Supervisors can track compliance and customer satisfaction in real-time.

Challenges & Lessons Learned

I honestly thought this would be a quick build. It wasn’t. A few things went wrong, a few things surprised me, and I spent more time debugging than I expected. But that’s usually how these projects go.

The first headache was with S3 triggers. I connected two triggers to the same bucket — one for audio files and one for transcripts. On paper it looked fine. In reality, events started stepping on each other and throwing errors. After some digging, I simplified it. One trigger, file-based filtering, and the problem disappeared.

Then I hit a strange issue with Transcribe job names. Some jobs were failing for no obvious reason. Turns out Transcribe doesn’t like special characters or slashes in names. My file naming was the real culprit. I cleaned the names with a small regex and added timestamps so every job stayed unique. After that, no more failures.

Bedrock caused some confusion, too. Even with the right IAM permissions, I kept getting Access Denied errors. Everything looked correct, which made it frustrating. Later, I realised that model access needs to be enabled separately in the Bedrock console. Once I enabled it and moved to the Titan Text G1 Express model, things finally worked.

Another lesson came from how Titan returns data. The response is deeply nested JSON. I first dumped everything straight into DynamoDB. Bad idea. Reading and querying that data became painful very quickly. I changed the logic to pull only what I actually needed and stored clean fields like sentiment and summary. That made life much easier.

I also tripped over some basic Lambda mistakes. Using inline code in CloudFormation, I messed up handler names and indentation more than once. Small errors, big waste of time. Now I always double-check the handler and formatting before deploying.

QuickSight had its own personality. It doesn’t like nested DynamoDB structures much. Charts became harder than they needed to be. Flattening the data before storing it solved most of that.

At the end of the day, the build worked, and the system does exactly what I wanted. More importantly, I walked away with a better understanding of where these services can surprise you and where you need to be careful. That hands-on learning was the real win from this project.