DEV Community

Cover image for Voice-to-Insights with Amazon Bedrock, Comprehend & AWS Lambda - (Let's Build ๐Ÿ—๏ธ Series)
awedis for AWS Heroes

Posted on

1

Voice-to-Insights with Amazon Bedrock, Comprehend & AWS Lambda - (Let's Build ๐Ÿ—๏ธ Series)

Letโ€™s build an application that processes audio uploads, such as podcast snippets, meeting recordings, or voice notes, by automatically transcribing the audio to text, summarizing the main points, analyzing sentiment and topics, and delivering the results through an API.

The main parts of this article:
1- ๐Ÿ’ก Tech Stack (AWS Services)
2- ๐ŸŽฏ Technical Part (Code)
3- ๐Ÿ“ Conclusion

๐Ÿ’ก Tech Stack (AWS Services)

We are going to use several AWS services: AWS Lambda, Amazon S3 Bucket, Amazon Bedrock, Amazon Comprehend.

Amazon S3: The bucket that will hold our .mp3 files.

AWS Lambda: Take an uploaded .mp3 or .wav file, call Amazon Transcribe to convert audio to text then it will use Use Claude (via Bedrock) to summarize and highlight key insights. Use Amazon Comprehend to extract sentiment and topics and finally return everything in a beautiful JSON.

Amazon Bedrock: We will use Claude to summarize and highlight key insights.

Amazon Comprehend: Will help us to extract sentiment and topics.

๐ŸŽฏ Technical Part (Code)

import boto3
import json
import time
from urllib.parse import urlparse, unquote_plus

transcribe = boto3.client('transcribe')
s3 = boto3.client('s3')

def lambda_handler(event, context):
    print("Event:", json.dumps(event))

    bucket = event['Records'][0]['s3']['bucket']['name']
    audio_key = unquote_plus(event['Records'][0]['s3']['object']['key'])

    job_name = f"transcription-{int(time.time())}"
    file_uri = f"s3://{bucket}/{audio_key}"

    transcribe.start_transcription_job(
        TranscriptionJobName=job_name,
        Media={'MediaFileUri': file_uri},
        MediaFormat=audio_key.split('.')[-1],
        LanguageCode='en-US',
        OutputBucketName=bucket
    )

    while True:
        status = transcribe.get_transcription_job(TranscriptionJobName=job_name)
        if status['TranscriptionJob']['TranscriptionJobStatus'] in ['COMPLETED', 'FAILED']:
            break
        time.sleep(5)

    if status['TranscriptionJob']['TranscriptionJobStatus'] == 'FAILED':
        raise Exception("Transcription failed")

    transcript_uri = status['TranscriptionJob']['Transcript']['TranscriptFileUri']
    parsed = urlparse(transcript_uri)
    transcript_bucket = parsed.path.split('/')[1]
    transcript_key = '/'.join(parsed.path.split('/')[2:])

    obj = s3.get_object(Bucket=transcript_bucket, Key=transcript_key)
    transcript_data = json.loads(obj['Body'].read())

    transcript_text = transcript_data['results']['transcripts'][0]['transcript']
    print("--> Transcribed Text:\n", transcript_text)
Enter fullscreen mode Exit fullscreen mode

Image description

Once we run the start_transcription_job function, it converts the audio to text and saves the files inside the S3 bucket.

Also you can see that the transcribed file is placed inside the same S3 bucket.

Image description

Perfect! We've successfully converted our .mp3 files to text, now let's take it further by adding Amazon Comprehend for sentiment and topic analysis, and Amazon Bedrock for intelligent summarization.

comprehend = boto3.client('comprehend')
bedrock = boto3.client('bedrock-runtime')
Enter fullscreen mode Exit fullscreen mode
BEDROCK_MODEL_ID = "anthropic.claude-3-sonnet-20240229-v1:0"
Enter fullscreen mode Exit fullscreen mode
sentiment = comprehend.detect_sentiment(Text=transcript_text[:5000], LanguageCode='en')
summary = summarize_with_claude(transcript_text)
Enter fullscreen mode Exit fullscreen mode
def summarize_with_claude(text):
    prompt = f"""Summarize this meeting or speech in clear bullet points, highlight any action items or topics discussed:\n\n{text[:4000]}"""

    body = {
        "prompt": f"\n\nHuman: {prompt}\n\nAssistant:",
        "max_tokens_to_sample": 500,
        "temperature": 0.7,
        "top_k": 250,
        "top_p": 0.9,
        "stop_sequences": ["\n\nHuman:"]
    }

    response = bedrock.invoke_model(
        modelId=BEDROCK_MODEL_ID,
        body=json.dumps(body),
        contentType="application/json",
        accept="application/json"
    )

    response_body = json.loads(response['body'].read())
    return response_body.get("completion", "No summary generated.")
Enter fullscreen mode Exit fullscreen mode

Also make sure the enable the right Model inside the AWS console.

Image description

Now to test the function, input the following json:

{
  "Records": [
    {
      "s3": {
        "bucket": {
          "name": "<YOUR_S3_BUCKET>"
        },
        "object": {
          "key": "AD-CONAF-2019BC-MID-JPWiser-v2.mp3"
        }
      }
    }
  ]
}
Enter fullscreen mode Exit fullscreen mode

Also make sure the Lambda has the required permissions and add the timeout of Lambda function:

{
    "Version": "2012-10-17",
    "Statement": [
        {
          "Effect": "Allow",
          "Action": [
            "s3:GetObject",
            "s3:PutObject",
            "transcribe:StartTranscriptionJob",
            "transcribe:GetTranscriptionJob",
            "comprehend:DetectSentiment",
            "bedrock:InvokeModel"
          ],
          "Resource": "*"
        }
    ]
}
Enter fullscreen mode Exit fullscreen mode

Now let us try to test the code. Go to AWS Lambda Test section, and input the required json and run the script.

Finally here is the full code for our Lambda function:

import boto3
import json
import time
from urllib.parse import urlparse, unquote_plus

transcribe = boto3.client('transcribe')
comprehend = boto3.client('comprehend')
bedrock = boto3.client('bedrock-runtime')
s3 = boto3.client('s3')

BEDROCK_MODEL_ID = "eu.anthropic.claude-3-7-sonnet-20250219-v1:0"

def lambda_handler(event, context):
    print("Event:", json.dumps(event))

    bucket = event['Records'][0]['s3']['bucket']['name']
    audio_key = unquote_plus(event['Records'][0]['s3']['object']['key'])

    job_name = f"transcription-{int(time.time())}"
    file_uri = f"s3://{bucket}/{audio_key}"

    transcribe.start_transcription_job(
        TranscriptionJobName=job_name,
        Media={'MediaFileUri': file_uri},
        MediaFormat=audio_key.split('.')[-1],
        LanguageCode='en-US',
        OutputBucketName=bucket
    )

    while True:
        status = transcribe.get_transcription_job(TranscriptionJobName=job_name)
        if status['TranscriptionJob']['TranscriptionJobStatus'] in ['COMPLETED', 'FAILED']:
            break
        time.sleep(5)

    if status['TranscriptionJob']['TranscriptionJobStatus'] == 'FAILED':
        raise Exception("Transcription failed")

    transcript_uri = status['TranscriptionJob']['Transcript']['TranscriptFileUri']
    parsed = urlparse(transcript_uri)
    transcript_bucket = parsed.path.split('/')[1]
    transcript_key = '/'.join(parsed.path.split('/')[2:])

    obj = s3.get_object(Bucket=transcript_bucket, Key=transcript_key)
    transcript_data = json.loads(obj['Body'].read())

    transcript_text = transcript_data['results']['transcripts'][0]['transcript']
    print("--> Transcribed Text:\n", transcript_text)

    sentiment = comprehend.detect_sentiment(Text=transcript_text[:5000], LanguageCode='en')
    summary = summarize_with_claude(transcript_text)
    return {
        "statusCode": 200,
        "body": {
            "summary": summary,
            "sentiment": sentiment,
            "transcription": transcript_text[:500],
        }
    }

def summarize_with_claude(text):
    body = {
        "anthropic_version": "bedrock-2023-05-31",
        "max_tokens": 200,
        "top_k": 250,
        "stop_sequences": [],
        "temperature": 1,
        "top_p": 0.999,
        "messages": [
            {
                "role": "user",
                "content": [
                    {
                        "type": "text",
                        "text": (
                            "Summarize this meeting or speech in clear bullet points, "
                            "highlight any action items or topics discussed:\n\n"
                            + text[:4000]
                        )
                    }
                ]
            }
        ]
    }

    response = bedrock.invoke_model(
        modelId=BEDROCK_MODEL_ID,
        body=json.dumps(body),
        contentType="application/json",
        accept="application/json",
    )

    response_body = json.loads(response["body"].read())
    return response_body.get("content", "No summary generated.")
Enter fullscreen mode Exit fullscreen mode

Image description

Image description

Perfect, we can see our code generated the Sentiment analysis data, and also summarized bullet points of our audio file.

๐Ÿ“ Conclusion

Combining multiple AWS services allows you to build truly innovative systems. In this example, we demonstrated how using a few straightforward services can result in a production-ready application with real-world applications across various industries. I hope you found this article helpful!

Happy coding ๐Ÿ‘จ๐Ÿปโ€๐Ÿ’ป

๐Ÿ’ก Enjoyed this? Letโ€™s connect and geek out some more on LinkedIn.

Image of Datadog

Optimize UX with Real User Monitoring

Learn how Real User Monitoring (RUM) and Synthetic Testing provide full visibility into web and mobile performance. See best practices in action and discover why Datadog was named a Leader in the 2024 Gartner MQ for Digital Experience Monitoring.

Tap into UX Best Practices

Top comments (2)

Collapse
 
deividas_strole profile image
Deividas Strole โ€ข

This Voice-to-Insights demo using Amazon Bedrock, Comprehend, and AWS Lambda is awesome! Itโ€™s amazing to see how voice data can be transformed into actionable insights in real timeโ€”truly a game changer for building smarter application.

Collapse
 
awedis profile image
awedis โ€ข

Thank you, glad you liked the article :)

Image of Quadratic

Free AI chart generator

Upload data, describe your vision, and get Python-powered, AI-generated charts instantly.

Try Quadratic free

๐Ÿ‘‹ Kindness is contagious

If you found this post helpful, please leave a โค๏ธ or a friendly comment below!

Okay