Forem: Alexander Bolaño

Stop Digging Logs: How to Turn Airflow Failures into Contextual Learning (with Bedrock & S3 Vectors)

Alexander Bolaño — Tue, 16 Dec 2025 22:22:06 +0000

A practical demonstration showing how to leverage S3 Vectors (Vector Store) and Cohere Embeddings to provide data teams with contextual, historical fixes directly within failed Airflow task logs

As Data Engineer, we spend too much time hunting through logs when workflow fails, specially when you are the newbie member in the team, What if orchestration was not only about automation but also learning from failures and guiding your team through them ?

In this article, I'll show you how to build an intelligent Airflow DAG that uses AWS S3 vectors, embeddings, and vector search to capture historical failure wisdom and actionable fix hints to try to avoid the manual log debugging.

Stack

Airflow
S3 Vector
AWS Bedrock
cohere.embed-v4:0

What Problem are we solving ?

Everybody knows that Airflow workflows can inevitably fail in anytime due to several reasons like data quality, dependency conflicts, permission errors, timeout, whatever. Traditionally, engineers looking for the logs as first step, but what if exist a better option to do that.

This project pretend uses Cohere (AWS Bedrock) embeddings and S3 vectors to index past errors and search for similar failure patterns. It's mean once any specific task fails, we :

Capture the error
Create an error summary dict
Generate a semantic vector embedding
Query a vector index stored in S3 vectors
Retrieve and suggest the most relevant solution

How it Works

At a high level, I built a simulated DAG to demonstrate the idea by generating common errors on purpose so you can clearly see the value because in a real project you will face your own unique, varied, and countless failures, in that case the Airflow DAG simulates failures for common problem types, I propose those one :

Division by Zero
Data validation failures (syntax)
S3 permissions
Database connections issues (Timeout)

That being said, when a task fails, the error message is captured and embedded using a LLM model hosted on AWS Bedrock, that embedding is used to query a vector index in S3 Vectors, which stores previously seen errors and their solutions (read the NOTE section), S3 vectors lets you perform similarity search directly on S3 without managing a separate vector database, finally in the task call 'hint_to_solve' the system returns the closest match and suggests the corresponding solution right in the Airflow logs, here is an example of the DAG functionality :

NOTE: Data ingestion into the S3 Vector Index is out of scope for this article, as it is straightforward and well covered in the 👉🏻 AWS documentation.

For reference, the simulated error records ingested into the index are available here:

https://github.com/alexbonella/Airflow-S3-Vector-Guide/blob/main/airflow_simulation_error.json

How looks like a hints DAG 👇🏻 ?

[2025-12-16, 19:02:53 UTC] {local_task_job_runner.py:123} ▶ Pre task execution logs
[2025-12-16, 19:02:54 UTC] {smart_airflow_dag.py:206} INFO - INFO: Generating embedding for error: RuntimeError: Dependency 'requests' too old: 2.25.0 < 2.32.0 (at smart_airflow_dag.py, line 163)
[2025-12-16, 19:02:56 UTC] {smart_airflow_dag.py:71} INFO - ✅ Embedding successfully generated. Dimension: 1536
[2025-12-16, 19:02:56 UTC] {smart_airflow_dag.py:215} INFO - ⏳: Querying vector database for similar errors...
[2025-12-16, 19:02:57 UTC] {smart_airflow_dag.py:241} INFO - 💡 How to Solve this error: 👇🏻

[2025-12-16, 19:02:57 UTC] {smart_airflow_dag.py:242} INFO - {
  "suggestion": "Update the 'requests' package in the `requirements.txt` file to a version greater than or equal to 2.32.0 and redeploy the environment.",
  "similarity_score": 0.0162
}

[2025-12-16, 19:02:57 UTC] {smart_airflow_dag.py:243} INFO - 
[2025-12-16, 19:02:57 UTC] {python.py:240} INFO - Done. Returned value was: None
[2025-12-16, 19:02:57 UTC] {taskinstance.py:349} ▶ Post task execution logs

Why This Matters

Traditional Airflow error handling is reactive and manual but with semantic search over historical errors:

Your data team save time on debugging, specially the new members
Organizational knowledge errors are codified and reusable
Workflows become self-aware and proactive instead of being in orchestration zombie mode

Turns Failures into Knowledge

This guide demonstrates a real-world use case for embedding searchable failure knowledge directly into Airflow. So if you're leading or scaling a data team, imagine the impact of pipelines (Dags) that do not just report errors, but they guide you to solve them.

Feel free to check out the complete code and adapt it to your environment or models!

👉 GitHub: https://github.com/alexbonella/Airflow-S3-Vector-Guide

Twilio Challenge: Tweet Magic App

Alexander Bolaño — Thu, 20 Jun 2024 11:42:43 +0000

This is a submission for the Twilio Challenge

What I Built

As someone fascinated by the power of data, I want to share how it's being used AI and Twilio to help to Social media enthusiasts looking to add a creative twist to their online communication, this is the reason why I built the Twilio Tweet Magic 🪄, This is an unique app designed to help you generate captivating tweets from URLs and emotions. Whether you want to summarize an article, express a feeling, or just have some fun, Twilio Tweet Magic makes it effortless.

How It Works:

Generate Tweets: Simply input a URL of a news item and select an emotion. Twilio Tweet Magic uses the Gemini AI power to craft a tweet that captures the essence of the content and your chosen mood.
Create Stunning Images: Alongside your tweet, TwilioTweetMagic can generate visually appealing images that complement the message, adding an extra layer of creativity to your social media posts.
Seamless Integration with Twilio: Leveraging Twilio's robust messaging service, you can send these unique tweets and images directly to WhatsApp. Instantly share your thoughts, feelings, and creative expressions with friends, family, or followers.

AI Services used

AWS Bedrock
Gemini

Demo

Twilio and AI

To create this app, I utilized the Gemini API to generate tweets based on news URL and user-specified emotions, bringing a personalized touch to each tweet. Then, I harnessed the power of AWS Bedrock to build realistic images associated with these tweets, enhancing their visual appeal. However, this innovative functionality wouldn't be possible without Twilio's robust services.

In today's digital age, verifying user authenticity is crucial, and Twilio Verify ensures that only real users gain access to our app. Once verified, users can effortlessly send their custom tweets and images directly to a WhatsApp number, thanks to Twilio's WhatsApp Sandbox.

My submission qualifies for the following additional prize categories:

Twilio Times Two: The project uses Twilio Programmable Messaging (WhatsApp Sandbox) and Twilio Verify.
Entertaining Endeavors: TwilioTweetMagic is perfect for social media enthusiasts seeking to enhance their posts, content creators aiming to share engaging summaries of articles, and anyone looking to add a creative twist to their online communication.

Source code

alexbonella / challenge-twilio-tweet-magic-app

Generate and share tweets with emotion and images. Powered by Twilio.

App Name:

Twilio Tweet Magic 🪄 : Generate and share tweets with emotion and images. Powered by Twilio.

Twilio Services :

Twilio Verify - SMS - OTP
Twilio Programmable Messaging (WhatsApp Sandbox)

Hit the Start! ⭐

If you plan to use this repo for learning or find this content helpful, please hit the start. Thanks! 🙌🏻

Description:

Welcome to TwilioTweetMagic! This app allows you to generate tweets from URLs and feelings, combining the power of natural language processing with the creativity of image generation. Leveraging Twilio's robust messaging service, you can easily send these unique tweets and images directly to WhatsApp. Whether you're looking to share a moment, express a mood, or simply create something fun, TwilioTweetMagic has got you covered.

Features:

Generate tweets based on URLs and specified feelings using Gemini.
Create and send images associated with the generated tweets using AWS Bedrock.
Seamlessly send your…

View on GitHub

<!-- Thanks for participating! →

How to deploy Apache Druid on AWS EC2 Instance

Alexander Bolaño — Wed, 07 Dec 2022 18:48:09 +0000

An easy way to deploy Apache Druid on EC2 in order to load data from any source .

Introduction

Currently, real-time analysis plays a large role and is a symbol of competitiveness in the technology sector due to the fact the amount of data grows exponentially and the same way the great variety of tools, for this reason, I want to show you how we can use one of them call Apache Druid and how you can deploy it on EC2 instances as easy as a fast way.

Apache Druid

Druid is a high-performance real-time analytics database. Druid’s main value add is to reduce time to insight and action.

Druid is designed for workflows where fast queries and ingest really matter. Druid excels at powering UIs, running operational (ad-hoc) queries, or handling high concurrency. Consider Druid as an open-source alternative to data warehouses for a variety of use cases. The design documentation explains the key concepts.

Step by step for deploying:

Go to the AWS EC2 console
Create a new EC2 instance
Install Apache Druid
Run & Open Druid on your browser

Here we go!

Before launching an EC2 instance you keeping in mind this Quickstart documentation where we must consider a virtual server with 16 GiB of RAM for this reason we going to choose a t2.xlarge with 4 vCPUs & 16 RAM (GiB).

Create a new EC2 instance

We are ready to create an EC2 instance, as follows :

OS 👉 Ubuntu 22.04
Instance Type 👉 t2.xlarge
Create a Security Group with the Inbound rules indicated in the image
Launch instance

Install Apache Druid

Now, we are going to connect to your instance recently created from SSH and configure it with this little step-by-step:

1) sudo apt update -y
2) sudo apt install openjdk-8-jdk -y
3) wget https://dlcdn.apache.org/druid/29.0.1/apache-druid-29.0.1-bin.tar.gz (Last updated version)
4) tar -xzf apache-druid-29.0.1-bin.tar.gz
5) cd apache-druid-29.0.1
6) export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
7) export DRUID_HOME=/home/ubuntu/apache-druid-29.0.1
8) PATH=$JAVA_HOME/bin:$DRUID_HOME/bin:$PATH

Run Apache Druid

Finally, we can run Apache Druid from the EC2 instance with the command

./bin/start-micro-quickstart

Apache Druid in action 🚀

Now, you can open your browser in order to see the web console in the URL 👉 AWS Public IPv4 address:8888

Summary

As you can see deploying Apache Druid on an EC2 instance is so easy, on the other hand, is one of the best ways to analyze data in real-time from Kafka topics by applying simple SQL queries for free because is open source.

Thank you for reading this far. If you find this article useful, like and share this article. Someone could find it useful too and why not invite me for a coffee.

How to Send a CSV File from S3 into Redshift with an AWS Lambda Function

Alexander Bolaño — Sat, 26 Mar 2022 15:24:00 +0000

Introduction

Nowadays is a must to automate everything and cloud jobs are not the exceptions, as Data Engineer We need to acquire the skill of move data wherever needed, if you want to know how to start facing AWS tools in your daily routine like a data professional, this post is for you.

Step By Step

After collecting data, the next step is to design an ETL in order to extract, transform and load your data before you want to move it into an analytics platform like Amazon Redshift but in this case, only We going to move data from S3 into a Redshift Cluster using for AWS free tier.

To do that, I’ve tried to approach the study case as follows :

Create an S3 bucket.
Create a Redshift cluster.
Connect to Redshift from DBeaver or whatever you want.
Create a table in your database.
Create a virtual environment in Python with dependencies needed.
Create your Lambda Function.
Someone uploads data to S3.
Query your data.

¡¡Let’s get started !!

Later you have finished step 1 and 2 let’s to connect to our database with the help of SQL client DBeaver or whatever you want, for this We need to remember the following data from Redshift Cluster configuration:

HOST = "xyz.redshift.amazonaws.com"
PORT = "5439"
DATABASE = "mydatabase"
USERNAME = "myadmin"
PASSWORD = "XYZ"
TABLE = "mytable"

Now when We connect to our database let’s create a new table

CREATE TABLE mytable (
id      INT4 distkey sortkey,
col 1     VARCHAR (30) NOT NULL,
col 2         VARCHAR(100) NOT NULL,
col 3 VARCHAR(100) NOT NULL,
col 4        INTEGER NOT NULL,
col 5  INTEGER NOT NULL,
col 6           INTEGER NOT NULL);

For this tutorial, our Lambda function will need some Python libraries like Sqalchemy, Psycopg2, So you need to create a virtual environment in Python with these dependencies as well as Lambda Script before compressing the .zip file that you’ll upload into AWS.

At this point all you need to configure your Lambda Function into AWS is a Python Script and trigger your Lambda each time someone uploads a new object to an S3 bucket, you need to configure the following resources:

Upload your lambda_function.zip (Python script and dependencies or yo can add aws custom layer) and use the code example from bellow to send data into redshift lambda_function.py.
Attach an IAM role to the Lambda function, which grants access to AWSLambdaVPCAccesExcecutionRole
For this case, you need to add VPC default in the Lambda function or any other you have.
Add environment variables “CON” and “Table”

CON = "postgresql://USERNAME:PASSWORD@clustername.xyz.redshift.amazonaws.com:5439/DATABASE"
Table = "mytable"

Create an S3 Event Notification that invokes the Lambda function each time someone uploads an object to your S3 bucket.
You can configure a timeout ≥ 3 min.

Let's go to the code Here 👇

import sqlalchemy 
import psycopg2
from sqlalchemy import create_engine 
from sqlalchemy.orm import scoped_session, sessionmaker
from datetime import datetime,timedelta
import os

def handler(event, context):
   for record in event['Records']:

      S3_BUCKET = record['s3']['bucket']['name']
      S3_OBJECT = record['s3']['object']['key']


    # Arguments
    DBC= os.environ["CON"]
    RS_TABLE = os.environ["Table"]
    RS_PORT = "5439"
    DELIMITER = "','"
    REGION = "'us-east-1' "
    # Connection
    engine = create_engine(DBC)
    db = scoped_session(sessionmaker(bind=engine))
    # Send files from S3 into redshift
    copy_query = "COPY "+RS_TABLE+" from 's3://"+   S3_BUCKET+'/'+S3_OBJECT+"' iam_role 'arn:aws:iam::11111111111:role/youroleredshift' delimiter "+DELIMITER+" IGNOREHEADER 1 REGION " + REGION
    # Execute querie
    db.execute(copy_query)
    db.commit()
    db.close()

Before you’re ready to upload a CSV file to your S3 bucket, keep in mind you’ve created a table first, so after you’ve implemented your lambda function and configured it correctly, you can upload data to S3 and go to DBeaver to query data in your table.

Summary

AWS Lambda is an easy way to automate your process but We need to understand which moment can’t use it, for example, AWS Lambda has a 6MB payload limit, so it is not practical to migrate very large tables this way.
On the other hand, the main advantage to use this service is that is a whole solution Serverless!! , So No need to manage any EC2 instances.

Thank you for reading this far. If you find this article useful, like and share this article. Someone could find it useful too and why not invite me for a coffee.

Follow me 👉 LinkedIn
Follow me 👉 Twitter
Contact: alexbonella2806@gmail.com