Forem: Jhon Robert Quintero Hurtado

How I Passed the AWS Machine Learning Associate Exam: Real Questions, Real Lessons

Jhon Robert Quintero Hurtado — Sun, 01 Mar 2026 13:25:00 +0000

Notes from someone who's been through it — what actually shows up, and what you need to know.

I recently cleared the MLA-C01: AWS Certified Machine Learning Engineer - Associate exam, and I wanted to share everything I learned along the way. Not the sanitized "read the docs" advice — the real stuff. The patterns I saw repeated across questions, the services I kept confusing, and the mental models that finally made things click.

This isn't a beginner exam. It expects you to know when to use what, why one approach beats another, and how AWS services fit together in real ML workflows. If you're coming from the AI Practitioner exam or just diving directly into Associate, buckle up — here's what I wish I knew before sitting down.

How I Prepared

My preparation was hands-on and scenario-focused. The exam doesn't care if you can recite definitions — it wants to know if you can solve problems.

What worked for me:

Practice questions with detailed explanations — I went through scenario-based questions and studied why answers were right or wrong, not just what the answer was.
AWS Documentation deep-dives — Especially for SageMaker built-in algorithms, Model Monitor, Clarify, and Data Wrangler. The docs tell you exactly what each service does and doesn't do.
Building mental models — I stopped memorizing and started asking "what problem does this solve?" for every service.
Hands-on labs — Actually deploying endpoints, running training jobs, and breaking things taught me more than any video.

What Actually Showed Up on the Exam

The exam is heavily scenario-based. You'll read a paragraph describing a company's situation, then pick the best solution. Here's what kept coming up:

Data Aggregation & Preparation

AWS Lake Formation for aggregating data from multiple sources (S3, on-premises databases) into a unified data lake
AWS Glue for ETL pipelines, schema discovery, and the Data Catalog
AWS Glue FindMatches for ML-powered deduplication with minimal code
AWS Glue DataBrew for no-code data transformations like one-hot encoding
SageMaker Data Wrangler for ML-specific data prep, anomaly detection, and visualization

Key insight: DataBrew can't process mixed file types (CSV, JSON, Parquet) in the same folder. Separate them first.

Algorithms — Know When to Use What

This is where the exam gets tricky. You need to match algorithms to problem characteristics:

Scenario	Algorithm
Classification with class imbalance + feature interactions	LightGBM or XGBoost
Recommendations with high-dimensional sparse data	Factorization Machines
Time series forecasting	DeepAR (uses JSON Lines or Parquet, NOT RecordIO-Protobuf)
Ranking customers by probability	XGBoost (outputs probability scores)

Watch out: The exam loves testing whether you know that DeepAR does NOT use RecordIO-Protobuf. That format is for Linear Learner, K-Means, and Factorization Machines.

Bias Detection & Fairness

SageMaker Clarify came up repeatedly:

Pre-training bias metrics like DPL (Difference in Proportions of Labels)
Post-deployment bias monitoring via Lambda + Clarify jobs
If DPL is +0.9 for a facet → that facet is overrepresented → undersample that group

Remember: Clarify = bias and explainability. Model Monitor = data quality and drift. Don't confuse them.

Model Monitoring & Drift

This was a big theme:

"Model worked for months, suddenly degraded" → Think data drift
Baseline violations after model update → Create a new baseline from new training data
ModelSetupTime metric → For diagnosing serverless endpoint cold starts

Overfitting Questions

Classic pattern: "Training accuracy 99%, validation accuracy 82%"

Answer: Dropout + L1/L2 regularization + cross-validation

Never: Add more layers (makes it worse)

Deployment Strategies

Scenario	Strategy
Limited instances + zero downtime	Rolling deployment
Different ML frameworks in one endpoint	Multi-container endpoint
Variable/unpredictable traffic	Serverless inference
Testing new model on live traffic	Shadow variant

Evaluation Metrics

"Catch as many fraud cases as possible" → Recall
"Minimize false alarms" → Precision
Continuous numeric predictions → RMSE (not accuracy — that's classification)

Common Pitfalls to Avoid

These confused me during practice, but don't let them confuse you:

Confusing Similar Services

Service	What It Does
Transcribe	Speech → Text
Comprehend	NLP on text (sentiment, entities)
Rekognition	Image/video analysis (faces, objects, eye gaze)
Textract	Extract text from documents
Macie	Discover sensitive data in S3

The trap: "Convert audio to text" is Transcribe, not Rekognition or Comprehend.

Security Groups vs Network ACLs

Security groups only allow traffic
Network ACLs can explicitly deny traffic
Need to block a specific IP? → Network ACL

"Least Operational Overhead" Questions

When you see this phrase, pick the managed service:

Data quality checks → AWS Glue Data Quality (declarative rules, no code)
Sensitive data detection → Amazon Macie
Model deployment → SageMaker JumpStart
Data labeling → SageMaker Ground Truth (automated labeling)

Model Registry Concepts

Model Groups → Versions of the same model
Collections → Organize model groups by category (without affecting existing groupings)
Tags → Metadata, but don't provide hierarchical structure

Key Topics to Focus On

SageMaker Services Deep Dive

Service	Purpose
Data Wrangler	Data prep, imputation, anomaly detection, visualization
Clarify	Bias detection, explainability, fairness metrics
Model Monitor	Production monitoring (data quality, model quality, drift)
Debugger	Training job debugging (tensors, gradients)
Ground Truth	Data labeling with automated labeling
JumpStart	Pre-trained models, LCNC fine-tuning
Pipelines	ML workflow orchestration
Model Registry	Version management, approval workflows

Feature Engineering

One-hot encoding → Nominal categorical data to binary
Mode imputation → Missing categorical values
Mean imputation → Missing numerical values
Data augmentation with noise → When training works but production fails due to image quality variations

Auto Scaling for Endpoints

For maximum responsiveness to sudden traffic:

High-resolution metrics (10-second intervals) → Faster detection
Longer scale-in cooldown (600 seconds) → Maintains capacity, prevents yo-yo effect

Mental Models That Saved Me

The "What Problem Does It Solve?" Framework

Instead of memorizing services, I asked: What problem is this solving?

Need to block an IP? → Network ACL (only thing that can deny)
Need always up-to-date data? → Direct connections (real-time query)
Need to reduce labeling time? → Ground Truth with automated labeling
Need private connectivity to S3? → VPC endpoints

The "Least Overhead" Hierarchy

When AWS asks for "least operational overhead," they usually want:

Fully managed service with built-in feature
Serverless solution
Managed service with some configuration
Custom code on managed compute
Custom code on EC2 (almost never the answer)

The Drift Detection Flow

Tips That Helped Me Pass

Understand services by use case, not definitions — The exam gives you scenarios, not vocabulary tests
Learn the common mistakes — DeepAR doesn't use RecordIO-Protobuf, security groups can't deny traffic, DataBrew needs homogeneous file types
Practice elimination — Most questions have two obviously wrong answers and two possible ones. Learn why the possible wrong answer is incorrect.
Read for keywords — "Least operational overhead," "most cost-effective," "LEAST amount of time" all point to different answers
Don't overthink — If a question mentions a specific service capability (like "document attribute filter"), that's usually the answer
Time management — Flag tough questions and move on. Some questions are intentionally time-consuming.

Final Thoughts

The ML Associate exam is challenging, but it's passable with focused preparation. It's not about memorizing every SageMaker feature — it's about understanding how AWS ML services work together to solve real problems.

The exam rewards practical thinking. When you read a scenario, ask yourself: "What would I actually do here?" Usually, that instinct (backed by solid knowledge of what each service does) will guide you to the right answer.

With 6-8 weeks of focused study and lots of practice questions, you can do this.

Good luck — and feel free to reach out if you have questions!

Did this help? Have questions about specific topics? Drop a comment below or connect with me on LinkedIn.

GenAI: Using Amazon Bedrock, API Gateway, Lambda and S3

Jhon Robert Quintero Hurtado — Mon, 20 May 2024 13:42:33 +0000

About Me: Jhon Robert Quintero H.

Introduction

I was thrilled to be a speaker at the XIII Meeting SOFTHARD 2024, hosted by the Faculty of Engineering of the Institución Universitaria Antonio José Camacho . Here's a snapshot of the key takeaways from my presentation on Generative AI, focusing on Amazon Bedrock, API Gateway, Lambda, and S3. Buckle up for an exciting ride!

Amazon Bedrock

Picture this: Amazon Bedrock is your magic wand for building and scaling generative AI applications with foundation models. It's like having a fully managed AI factory at your fingertips!

Fully Managed Service: Say goodbye to the nitty-gritty of managing infrastructure.
Choose Your Model: It’s like a buffet of the best AI models from AI21 Labs, Anthropic, Cohere, Meta, and Stability. Just pick what you need!
Customize with Your Data: Make your model truly yours.

For instance, Anthropic’s Claude model is the hotshot for tasks like summarization and complex reasoning, while Stability AI’s Stable Diffusion model is your go-to for generating unique images, art, and designs. Need copywriting help? Cohere Command's got your back!

Anthropic’s Claude 3 Model

Meet Claude 3, the genius in the room! It's smarter, faster, and cheaper than GPT 3.5T, and it’s got vision, literally!

With Claude 3, you can do:

Dialogue and role-play
Summarization and Q&A
Translation
Database querying and retrieval
Coding-related tasks
Classification, metadata extraction, and analysis
Text and content generation
Content moderation

Amazon’s Titan Model

The Titan model is another powerhouse. Here’s what it can do:

Text Embeddings: Converts text into numerical form for searching, retrieving, and clustering. Think of it as translating words into a secret numerical language.
Image Generator: Create stunning images with just a few words. It’s like having a personal artist at your service!

Customizing Amazon Titan models

You can now customize foundation models (FMs) with your own data in Amazon Bedrock to build applications that are specific to your domain, organization, and use case. With custom models, you can create unique user experiences that reflect your company’s style, voice, and services.

With fine-tuning, you can increase model accuracy by providing your own task-specific labeled training dataset and further specialize your FMs. You can train models using your own data in a secure environment with customer-managed keys. Continued pre-training helps models become more specific to your domain.

Knowledge Bases for Amazon Bedrock

It lets you put text documents, like articles or reports, into a knowledge base. It also automatically creates vector representations of text documents, which are called embeddings. These embeddings can be used for retrieval-augmented generation. This is a key feature of the Amazon Bedrock service. It lets you use your own data to enhance foundation models.

Stores embeddings in your vector database (Amazon OpenSearch). Retrieves embeddings and augments prompts.

Solution Design: Text Summarization

Let’s dive into the cool stuff! Here’s how I designed a text summarization solution using Bedrock, API Gateway, Lambda, and S3.

Bedrock: Our star player.
API Gateway: Acts as the doorman, directing client requests to the right Lambda function.
Lambda Function: Processes the text and sends it to Bedrock for summarization.
S3 Bucket: Stores the summarized text.

Check out the source code below. It’s like a recipe for your favorite dish – just follow the steps and enjoy the result!

Python Source Code

import boto3
import botocore.config
import json
import base64
from datetime import datetime
from email import message_from_bytes


def extract_text_from_multipart(data):
    msg = message_from_bytes(data)

    text_content = ''

    if msg.is_multipart():
        for part in msg.walk():
            if part.get_content_type() == "text/plain":
                text_content += part.get_payload(decode=True).decode('utf-8') + "\n"

    else:
        if msg.get_content_type() == "text/plain":
            text_content = msg.get_payload(decode=True).decode('utf-8')

    return text_content.strip() if text_content else None


def generate_summary_from_bedrock(content:str) ->str:
    prompt = f"""Summarize the following meeting notes: {content} """

    body = json.dumps({"inputText": prompt, 
                       "textGenerationConfig":{
                           "maxTokenCount":4096,
                           "stopSequences":[],
                           "temperature":0,
                           "topP":1
                       },
                      }) 

    modelId = 'amazon.titan-tg1-large' # change this to use a different version from the model provider
    accept = 'application/json'
    contentType = 'application/json'
        
    try:
        boto3_bedrock = boto3.client('bedrock-runtime')
        response = boto3_bedrock.invoke_model(body=body, modelId=modelId, accept=accept, contentType=contentType)
        response_body = json.loads(response.get('body').read())
    
        summary = response_body.get('results')[0].get('outputText')
        return summary

    except Exception as e:
        print(f"Error generating the summary: {e}")
        return ""

def save_summary_to_s3_bucket(summary, s3_bucket, s3_key):

    s3 = boto3.client('s3')

    try:
        s3.put_object(Bucket = s3_bucket, Key = s3_key, Body = summary)
        print("Summary saved to s3")

    except Exception as e:
        print("Error when saving the summary to s3")


def lambda_handler(event,context):

    decoded_body = base64.b64decode(event['body'])

    text_content = extract_text_from_multipart(decoded_body)

    if not text_content:
        return {
            'statusCode':400,
            'body':json.dumps("Failed to extract content")
        }


    summary = generate_summary_from_bedrock(text_content)

    if summary:
        current_time = datetime.now().strftime('%H%M%S') #UTC TIME, NOT NECCESSARILY YOUR TIMEZONE
        s3_key = f'summary-output/{current_time}.txt'
        s3_bucket = 'bedrock-analisys-co'

        save_summary_to_s3_bucket(summary, s3_bucket, s3_key)

    else:
        print("No summary was generated")


    return {
        'statusCode':200,
        'body':json.dumps("Summary generation finished")
    }

Text Generation Configuration

Understanding the textGenerationConfig parameters is key for tweaking the model's behavior:

maxTokenCount:

This sets the maximum number of tokens (words or subwords) that the text generation model can produce. It helps control the length of the generated text.

Think of this as setting the maximum length for a speech. Just like a speechwriter might say, "Your speech should be no longer than 500 words," the maxTokenCount controls the length of the generated text, ensuring it doesn’t run on indefinitely.

stopSequences:

This is a list of sequences (e.g., newline characters) that, if encountered during text generation, will cause the generation to stop.

Imagine you’re listening to a song that has a specific note that signals the end, like a grand finale. Similarly, stopSequences act like these finishing notes; they are predefined sequences that, when detected, tell the model to stop generating more text.

temperature:

This parameter controls the "creativity" or "randomness" of the text generation. A lower temperature (e.g., 0) will result in more conservative, predictable text, while a higher temperature (e.g., 1) will produce more diverse and potentially more creative text.

Picture a painter with a palette of colors. A lower temperature is like having only a few colors to choose from, resulting in a more predictable and uniform painting. A higher temperature is like having a wide variety of colors, leading to a more vibrant and unexpected artwork.

topP:

This is a technique called "nucleus sampling" that limits the text generation to the most likely tokens, based on the model's probability distribution. A value of 1 means no filtering, while a lower value (e.g., 0.9) will restrict the generation to the top 90% of the most likely tokens.

Imagine a buffet where you want to sample the most popular dishes. A topP value of 1 means you can choose from the entire spread. A lower topP value, like 0.9, means you’re only selecting from the top 90% of the most popular dishes, skipping the least likely choices to ensure a satisfying and high-quality meal.

These parameters let you fine-tune the model’s behavior to suit your needs. Feel free to experiment and find the perfect balance!

Video Demo

Stay tuned for the video demo where I'll walk you through the entire process. Seeing is believing, right?