<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Alexander Bolaño</title>
    <description>The latest articles on Forem by Alexander Bolaño (@datexland).</description>
    <link>https://forem.com/datexland</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F837199%2F901df96b-e698-4553-a32c-d0246c752986.jpeg</url>
      <title>Forem: Alexander Bolaño</title>
      <link>https://forem.com/datexland</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/datexland"/>
    <language>en</language>
    <item>
      <title>Stop Digging Logs: How to Turn Airflow Failures into Contextual Learning (with Bedrock &amp; S3 Vectors)</title>
      <dc:creator>Alexander Bolaño</dc:creator>
      <pubDate>Tue, 16 Dec 2025 22:22:06 +0000</pubDate>
      <link>https://forem.com/aws-builders/stop-digging-logs-how-to-turn-airflow-failures-into-contextual-learning-with-bedrock-s3-vectors-485k</link>
      <guid>https://forem.com/aws-builders/stop-digging-logs-how-to-turn-airflow-failures-into-contextual-learning-with-bedrock-s3-vectors-485k</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;A practical demonstration showing how to leverage S3 Vectors (Vector Store) and Cohere Embeddings to provide data teams with contextual, historical fixes directly within failed Airflow task logs&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;As Data Engineer, we spend too much time hunting through logs when workflow fails, specially when you are the newbie member in the team, What if orchestration was not only about automation but also learning from failures and guiding your team through them ? &lt;/p&gt;

&lt;p&gt;In this article, I'll show you how to build an intelligent Airflow DAG that uses AWS S3 vectors, embeddings, and vector search to capture historical failure wisdom and actionable fix hints to try to avoid the manual log debugging. &lt;/p&gt;

&lt;h2&gt;
  
  
  Stack
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Airflow&lt;/li&gt;
&lt;li&gt;S3 Vector &lt;/li&gt;
&lt;li&gt;AWS Bedrock&lt;/li&gt;
&lt;li&gt;cohere.embed-v4:0&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What Problem are we solving ?
&lt;/h2&gt;

&lt;p&gt;Everybody knows that Airflow workflows can inevitably fail in anytime due to several reasons like data quality, dependency conflicts, permission errors, timeout, whatever. Traditionally, engineers looking for the logs as first step, but what if exist a better option to do that. &lt;/p&gt;

&lt;p&gt;This project pretend uses &lt;strong&gt;Cohere&lt;/strong&gt;  &lt;em&gt;(AWS Bedrock)&lt;/em&gt; embeddings and S3 vectors to index past errors and search for similar failure patterns. It's mean once any specific task fails, we : &lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Capture the error&lt;/li&gt;
&lt;li&gt;Create an error summary dict&lt;/li&gt;
&lt;li&gt;Generate a semantic vector embedding &lt;/li&gt;
&lt;li&gt;Query a vector index stored in S3 vectors&lt;/li&gt;
&lt;li&gt;Retrieve and suggest the most relevant solution&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  How it Works
&lt;/h2&gt;

&lt;p&gt;At a high level, I built a simulated DAG to demonstrate the idea by generating common errors on purpose so you can clearly see the value because in a real project you will face your own unique, varied, and countless failures, in that case the Airflow DAG simulates failures for common problem types, I propose those one : &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Division by Zero&lt;/li&gt;
&lt;li&gt;Data validation failures &lt;em&gt;(syntax)&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;S3 permissions&lt;/li&gt;
&lt;li&gt;Database connections issues &lt;em&gt;(Timeout)&lt;/em&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That being said, when a task fails, the error message is captured and embedded using a LLM model hosted on AWS Bedrock, that embedding is used to query a vector index in S3 Vectors, which stores previously seen errors and their solutions &lt;em&gt;(read the NOTE section)&lt;/em&gt;, S3 vectors lets you perform similarity search directly on S3 without managing a separate vector database, finally in the task call &lt;em&gt;&lt;strong&gt;'hint_to_solve'&lt;/strong&gt;&lt;/em&gt; the system returns the closest match and suggests the corresponding solution right in the Airflow logs, here is an example of the DAG functionality : &lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;NOTE:&lt;/strong&gt; Data ingestion into the S3 Vector Index is out of scope for this article, as it is straightforward and well covered in the 👉🏻 &lt;a href="https://docs.aws.amazon.com/AmazonS3/latest/userguide/s3-vectors.html" rel="noopener noreferrer"&gt;AWS documentation&lt;/a&gt;.&lt;br&gt;&lt;br&gt;
For reference, the simulated error records ingested into the index are available here:&lt;br&gt;&lt;br&gt;
&lt;a href="https://github.com/alexbonella/Airflow-S3-Vector-Guide/blob/main/airflow_simulation_error.json" rel="noopener noreferrer"&gt;https://github.com/alexbonella/Airflow-S3-Vector-Guide/blob/main/airflow_simulation_error.json&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  How looks like a hints DAG 👇🏻 ?
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F38d1ri93msrf9gr5eijm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F38d1ri93msrf9gr5eijm.png" alt="Hint to solve from DAG" width="800" height="1000"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[2025-12-16, 19:02:53 UTC] {local_task_job_runner.py:123} ▶ Pre task execution logs
[2025-12-16, 19:02:54 UTC] {smart_airflow_dag.py:206} INFO - INFO: Generating embedding for error: RuntimeError: Dependency 'requests' too old: 2.25.0 &amp;lt; 2.32.0 (at smart_airflow_dag.py, line 163)
[2025-12-16, 19:02:56 UTC] {smart_airflow_dag.py:71} INFO - ✅ Embedding successfully generated. Dimension: 1536
[2025-12-16, 19:02:56 UTC] {smart_airflow_dag.py:215} INFO - ⏳: Querying vector database for similar errors...
[2025-12-16, 19:02:57 UTC] {smart_airflow_dag.py:241} INFO - 💡 How to Solve this error: 👇🏻

[2025-12-16, 19:02:57 UTC] {smart_airflow_dag.py:242} INFO - {
  "suggestion": "Update the 'requests' package in the `requirements.txt` file to a version greater than or equal to 2.32.0 and redeploy the environment.",
  "similarity_score": 0.0162
}

[2025-12-16, 19:02:57 UTC] {smart_airflow_dag.py:243} INFO - 
[2025-12-16, 19:02:57 UTC] {python.py:240} INFO - Done. Returned value was: None
[2025-12-16, 19:02:57 UTC] {taskinstance.py:349} ▶ Post task execution logs
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Why This Matters
&lt;/h2&gt;

&lt;p&gt;Traditional Airflow error handling is reactive and manual but with semantic search over historical errors: &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Your data team save time on debugging, specially the new members&lt;/li&gt;
&lt;li&gt;Organizational knowledge errors are codified and reusable &lt;/li&gt;
&lt;li&gt;Workflows become self-aware and proactive instead of being in orchestration zombie mode &lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Turns Failures into Knowledge
&lt;/h2&gt;

&lt;p&gt;This guide demonstrates a real-world use case for embedding searchable failure knowledge directly into Airflow. So if you're leading or scaling a data team, imagine the impact of pipelines (Dags) that do not just report errors, but they guide you to solve them.&lt;/p&gt;

&lt;p&gt;Feel free to check out the complete code and adapt it to your environment or models!&lt;/p&gt;

&lt;p&gt;👉 GitHub: &lt;a href="https://github.com/alexbonella/Airflow-S3-Vector-Guide" rel="noopener noreferrer"&gt;https://github.com/alexbonella/Airflow-S3-Vector-Guide&lt;/a&gt;&lt;/p&gt;

</description>
      <category>s3vectors</category>
      <category>airflow</category>
      <category>aws</category>
      <category>ai</category>
    </item>
    <item>
      <title>Twilio Challenge: Tweet Magic App</title>
      <dc:creator>Alexander Bolaño</dc:creator>
      <pubDate>Thu, 20 Jun 2024 11:42:43 +0000</pubDate>
      <link>https://forem.com/datexland/twilio-challenge-tweet-magic-app-gdk</link>
      <guid>https://forem.com/datexland/twilio-challenge-tweet-magic-app-gdk</guid>
      <description>&lt;p&gt;&lt;em&gt;This is a submission for the &lt;a href="https://dev.to/challenges/twilio"&gt;Twilio Challenge &lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Built
&lt;/h2&gt;

&lt;p&gt;As someone fascinated by the power of data, I want to share how it's being used AI and Twilio to help to Social media enthusiasts looking to add a creative twist to their online communication, this is the reason why I built the &lt;strong&gt;&lt;code&gt;Twilio Tweet Magic&lt;/code&gt;&lt;/strong&gt; 🪄, This is an unique app designed to help you generate captivating tweets from URLs and emotions. Whether you want to summarize an article, express a feeling, or just have some fun, Twilio Tweet Magic makes it effortless.&lt;/p&gt;

&lt;h3&gt;
  
  
  How It Works:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Generate Tweets:&lt;/strong&gt; Simply input a URL of a news item and select an emotion. Twilio Tweet Magic uses the Gemini AI power to craft a tweet that captures the essence of the content and your chosen mood.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Create Stunning Images:&lt;/strong&gt; Alongside your tweet, TwilioTweetMagic can generate visually appealing images that complement the message, adding an extra layer of creativity to your social media posts.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Seamless Integration with Twilio:&lt;/strong&gt; Leveraging Twilio's robust messaging service, you can send these unique tweets and images directly to WhatsApp. Instantly share your thoughts, feelings, and creative expressions with friends, family, or followers.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  AI Services used
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;em&gt;AWS Bedrock&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Gemini&lt;/em&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Demo
&lt;/h2&gt;

&lt;p&gt;&lt;iframe width="710" height="399" src="https://www.youtube.com/embed/vrVCNIXE0_0"&gt;
&lt;/iframe&gt;
&lt;/p&gt;

&lt;h2&gt;
  
  
  Twilio and AI
&lt;/h2&gt;

&lt;p&gt;To create this app, I utilized the Gemini API to generate tweets based on news URL and user-specified emotions, bringing a personalized touch to each tweet. Then, I harnessed the power of AWS Bedrock to build realistic images associated with these tweets, enhancing their visual appeal. However, this innovative functionality wouldn't be possible without Twilio's robust services.&lt;/p&gt;

&lt;p&gt;In today's digital age, verifying user authenticity is crucial, and Twilio Verify ensures that only real users gain access to our app. Once verified, users can effortlessly send their custom tweets and images directly to a WhatsApp number, thanks to Twilio's WhatsApp Sandbox.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F21wdt710fheolfzseffa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F21wdt710fheolfzseffa.png" alt=" " width="800" height="547"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fraf48p2huo8c2gthu0el.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fraf48p2huo8c2gthu0el.png" alt=" " width="288" height="283"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;My submission qualifies for the following additional prize categories:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;Twilio Times Two:&lt;/code&gt; The project uses Twilio Programmable Messaging (WhatsApp Sandbox) and Twilio Verify.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;code&gt;Entertaining Endeavors:&lt;/code&gt; TwilioTweetMagic is perfect for social media enthusiasts seeking to enhance their posts, content creators aiming to share engaging summaries of articles, and anyone looking to add a creative twist to their online communication.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Source code
&lt;/h2&gt;


&lt;div class="ltag-github-readme-tag"&gt;
  &lt;div class="readme-overview"&gt;
    &lt;h2&gt;
      &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fassets.dev.to%2Fassets%2Fgithub-logo-5a155e1f9a670af7944dd5e12375bc76ed542ea80224905ecaf878b9157cdefc.svg" alt="GitHub logo"&gt;
      &lt;a href="https://github.com/alexbonella" rel="noopener noreferrer"&gt;
        alexbonella
      &lt;/a&gt; / &lt;a href="https://github.com/alexbonella/challenge-twilio-tweet-magic-app" rel="noopener noreferrer"&gt;
        challenge-twilio-tweet-magic-app
      &lt;/a&gt;
    &lt;/h2&gt;
    &lt;h3&gt;
      Generate and share tweets with emotion and images. Powered by Twilio.
    &lt;/h3&gt;
  &lt;/div&gt;
  &lt;div class="ltag-github-body"&gt;
    
&lt;div id="readme" class="md"&gt;
&lt;p&gt;
  &lt;a rel="noopener noreferrer nofollow" href="https://camo.githubusercontent.com/d87720e5eabc3488d3b4cef0f4b28b9ed0701db6d2ac609b79ab7cfdc776f765/68747470733a2f2f6d656469612e6465762e746f2f63646e2d6367692f696d6167652f77696474683d313030302c6865696768743d3432302c6669743d636f7665722c677261766974793d6175746f2c666f726d61743d6175746f2f68747470732533412532462532466465762d746f2d75706c6f6164732e73332e616d617a6f6e6177732e636f6d25324675706c6f61647325324661727469636c657325324674766d32643964386764703337717876306c75392e706e67"&gt;&lt;img width="400" alt="image" src="https://camo.githubusercontent.com/d87720e5eabc3488d3b4cef0f4b28b9ed0701db6d2ac609b79ab7cfdc776f765/68747470733a2f2f6d656469612e6465762e746f2f63646e2d6367692f696d6167652f77696474683d313030302c6865696768743d3432302c6669743d636f7665722c677261766974793d6175746f2c666f726d61743d6175746f2f68747470732533412532462532466465762d746f2d75706c6f6164732e73332e616d617a6f6e6177732e636f6d25324675706c6f61647325324661727469636c657325324674766d32643964386764703337717876306c75392e706e67"&gt;&lt;/a&gt;
&lt;/p&gt;

&lt;div class="markdown-heading"&gt;
&lt;h2 class="heading-element"&gt;App Name:&lt;/h2&gt;
&lt;/div&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;Twilio Tweet Magic&lt;/code&gt; 🪄 : Generate and share tweets with emotion and images. &lt;code&gt;Powered by Twilio&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="markdown-heading"&gt;
&lt;h2 class="heading-element"&gt;Twilio Services :&lt;/h2&gt;
&lt;/div&gt;

&lt;ul&gt;
&lt;li&gt;Twilio Verify - SMS - OTP&lt;/li&gt;
&lt;li&gt;Twilio Programmable Messaging &lt;em&gt;(WhatsApp Sandbox)&lt;/em&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="markdown-heading"&gt;
&lt;h2 class="heading-element"&gt;Hit the Start! ⭐&lt;/h2&gt;
&lt;/div&gt;

&lt;p&gt;If you plan to use this repo for learning or find this content helpful, please hit the start. Thanks! 🙌🏻&lt;/p&gt;

&lt;div class="markdown-heading"&gt;
&lt;h2 class="heading-element"&gt;Description:&lt;/h2&gt;

&lt;/div&gt;

&lt;p&gt;Welcome to TwilioTweetMagic! This app allows you to generate tweets from URLs and feelings, combining the power of natural language processing with the creativity of image generation. Leveraging Twilio's robust messaging service, you can easily send these unique tweets and images directly to WhatsApp. Whether you're looking to share a moment, express a mood, or simply create something fun, TwilioTweetMagic has got you covered.&lt;/p&gt;

&lt;div class="markdown-heading"&gt;
&lt;h2 class="heading-element"&gt;Features:&lt;/h2&gt;

&lt;/div&gt;


&lt;ul&gt;

&lt;li&gt;Generate tweets based on URLs and specified feelings using &lt;strong&gt;&lt;code&gt;Gemini&lt;/code&gt;&lt;/strong&gt;.&lt;/li&gt;

&lt;li&gt;Create and send images associated with the generated tweets using &lt;strong&gt;&lt;code&gt;AWS Bedrock&lt;/code&gt;&lt;/strong&gt;.&lt;/li&gt;

&lt;li&gt;Seamlessly send your…&lt;/li&gt;

&lt;/ul&gt;
&lt;/div&gt;
&lt;br&gt;
  &lt;/div&gt;
&lt;br&gt;
  &lt;div class="gh-btn-container"&gt;&lt;a class="gh-btn" href="https://github.com/alexbonella/challenge-twilio-tweet-magic-app" rel="noopener noreferrer"&gt;View on GitHub&lt;/a&gt;&lt;/div&gt;
&lt;br&gt;
&lt;/div&gt;
&lt;br&gt;


&lt;p&gt;&amp;lt;!-- Thanks for participating! →&lt;/p&gt;

</description>
      <category>devchallenge</category>
      <category>twiliochallenge</category>
      <category>ai</category>
      <category>twilio</category>
    </item>
    <item>
      <title>How to deploy Apache Druid on AWS EC2 Instance</title>
      <dc:creator>Alexander Bolaño</dc:creator>
      <pubDate>Wed, 07 Dec 2022 18:48:09 +0000</pubDate>
      <link>https://forem.com/aws-builders/how-to-deploy-apache-druid-on-aws-ec2-instance-5hib</link>
      <guid>https://forem.com/aws-builders/how-to-deploy-apache-druid-on-aws-ec2-instance-5hib</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;An easy way to deploy Apache Druid on EC2 in order to load data from any source .&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Currently, real-time analysis plays a large role and is a symbol of competitiveness in the technology sector due to the fact the amount of data grows exponentially and the same way the great variety of tools, for this reason, I want to show you how we can use one of them call Apache Druid and how you can deploy it on EC2 instances as easy as a fast way.&lt;/p&gt;




&lt;h2&gt;
  
  
  Apache Druid
&lt;/h2&gt;

&lt;p&gt;Druid is a high-performance real-time analytics database. Druid’s main value add is to reduce time to insight and action.&lt;/p&gt;

&lt;p&gt;Druid is designed for workflows where fast queries and ingest really matter. Druid excels at powering UIs, running operational (ad-hoc) queries, or handling high concurrency. Consider Druid as an open-source alternative to data warehouses for a variety of use cases. The &lt;a href="https://druid.apache.org/docs/latest/design/architecture.html" rel="noopener noreferrer"&gt;&lt;strong&gt;&lt;em&gt;design documentation&lt;/em&gt;&lt;/strong&gt;&lt;/a&gt; explains the key concepts.&lt;/p&gt;




&lt;h2&gt;
  
  
  Step by step for deploying:
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Go to the AWS EC2 console&lt;/li&gt;
&lt;li&gt;Create a new EC2 instance&lt;/li&gt;
&lt;li&gt;Install Apache Druid&lt;/li&gt;
&lt;li&gt;Run &amp;amp; Open Druid on your browser&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Here we go!
&lt;/h2&gt;

&lt;p&gt;Before launching an EC2 instance you keeping in mind this &lt;a href="https://druid.apache.org/docs/latest/tutorials/index.html" rel="noopener noreferrer"&gt;&lt;em&gt;&lt;strong&gt;Quickstart documentation&lt;/strong&gt;&lt;/em&gt;&lt;/a&gt; where we must consider a virtual server with 16 GiB of RAM for this reason we going to choose a &lt;em&gt;t2.xlarge&lt;/em&gt; with &lt;em&gt;4 vCPUs &amp;amp; 16 RAM (GiB)&lt;/em&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Create a new EC2 instance
&lt;/h2&gt;

&lt;p&gt;We are ready to create an EC2 instance, as follows :&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;OS 👉 Ubuntu 22.04&lt;/li&gt;
&lt;li&gt;Instance Type 👉 t2.xlarge&lt;/li&gt;
&lt;li&gt;Create a Security Group with the Inbound rules indicated in the image&lt;/li&gt;
&lt;li&gt;Launch instance&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn3d5c4dorpuyeixdggnw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn3d5c4dorpuyeixdggnw.png" alt=" " width="800" height="459"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fopx9nfuswhiwrirt5l6e.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fopx9nfuswhiwrirt5l6e.png" alt="Choose OS &amp;amp; Instance Type" width="640" height="501"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Feddpn04rexbkiwu8ikyj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Feddpn04rexbkiwu8ikyj.png" alt="Inbound Rules" width="800" height="441"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Install Apache Druid
&lt;/h2&gt;

&lt;p&gt;Now, we are going to connect to your instance recently created from SSH and configure it with this little step-by-step:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;1) sudo apt update -y
2) sudo apt install openjdk-8-jdk -y
3) wget https://dlcdn.apache.org/druid/29.0.1/apache-druid-29.0.1-bin.tar.gz (Last updated version)
4) tar -xzf apache-druid-29.0.1-bin.tar.gz
5) cd apache-druid-29.0.1
6) export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
7) export DRUID_HOME=/home/ubuntu/apache-druid-29.0.1
8) PATH=$JAVA_HOME/bin:$DRUID_HOME/bin:$PATH

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Run Apache Druid
&lt;/h2&gt;

&lt;p&gt;Finally, we can run Apache Druid from the EC2 instance with the command&lt;/p&gt;

&lt;p&gt;&lt;code&gt;./bin/start-micro-quickstart&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcer9ix48p1olxtzr3bdq.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcer9ix48p1olxtzr3bdq.gif" alt="Run Druid on EC2 Instances" width="760" height="427"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Apache Druid in action 🚀
&lt;/h2&gt;

&lt;p&gt;Now, you can open your browser in order to see the web console in the URL 👉 &lt;em&gt;&lt;strong&gt;AWS Public IPv4 address:8888&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1smetgn6p2gjc2dnwew3.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1smetgn6p2gjc2dnwew3.gif" alt=" " width="" height=""&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;As you can see deploying Apache Druid on an EC2 instance is so easy, on the other hand, is one of the best ways to analyze data in real-time from Kafka topics by applying simple SQL queries for free because is open source.&lt;/p&gt;

&lt;p&gt;Thank you for reading this far. If you find this article useful, like and share this article. Someone could find it useful too and why not invite me for a coffee.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.paypal.com/donate/?hosted_button_id=GBVXVLXMETRHE" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fsinestesiaradio.net%2Fimages%2Fpaypal-donate-button-high-quality-png-1_orig.png" alt="Sponsor 💵" width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>watercooler</category>
    </item>
    <item>
      <title>How to Send a CSV File from S3 into Redshift with an AWS Lambda Function</title>
      <dc:creator>Alexander Bolaño</dc:creator>
      <pubDate>Sat, 26 Mar 2022 15:24:00 +0000</pubDate>
      <link>https://forem.com/aws-builders/how-to-send-a-csv-file-from-s3-into-redshift-with-an-aws-lambda-function-4534</link>
      <guid>https://forem.com/aws-builders/how-to-send-a-csv-file-from-s3-into-redshift-with-an-aws-lambda-function-4534</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Nowadays is a must to automate everything and cloud jobs are not the exceptions, as Data Engineer We need to acquire the skill of move data wherever needed, if you want to know how to start facing AWS tools in your daily routine like a data professional, this post is for you.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step By Step
&lt;/h2&gt;

&lt;p&gt;After collecting data, the next step is to design an ETL in order to extract, transform and load your data before you want to move it into an analytics platform like Amazon Redshift but in this case, only We going to move data from S3 into a Redshift Cluster using for AWS free tier.&lt;/p&gt;

&lt;p&gt;To do that, I’ve tried to approach the study case as follows :&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Create an S3 bucket.&lt;/li&gt;
&lt;li&gt;Create a Redshift cluster.&lt;/li&gt;
&lt;li&gt;Connect to Redshift from DBeaver or whatever you want.&lt;/li&gt;
&lt;li&gt;Create a table in your database.&lt;/li&gt;
&lt;li&gt;Create a virtual environment in Python with dependencies needed.&lt;/li&gt;
&lt;li&gt;Create your Lambda Function.&lt;/li&gt;
&lt;li&gt;Someone uploads data to S3.&lt;/li&gt;
&lt;li&gt;Query your data.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz7w48pfzf9b9g61jnj92.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz7w48pfzf9b9g61jnj92.png" alt="Infraestructure" width="629" height="411"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;¡¡Let’s get started !!&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Later you have finished step 1 and 2 let’s to connect to our database with the help of SQL client &lt;strong&gt;&lt;a href="https://dbeaver.io/" rel="noopener noreferrer"&gt;DBeaver&lt;/a&gt;&lt;/strong&gt; or whatever you want, for this We need to remember the following data from Redshift Cluster configuration:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;HOST = "xyz.redshift.amazonaws.com"
PORT = "5439"
DATABASE = "mydatabase"
USERNAME = "myadmin"
PASSWORD = "XYZ"
TABLE = "mytable"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo7qslkz5v3cjg3qaa25y.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo7qslkz5v3cjg3qaa25y.png" alt="Connect to database" width="700" height="434"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Now when We connect to our database let’s create a new table&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;CREATE TABLE mytable (
id      INT4 distkey sortkey,
col 1     VARCHAR (30) NOT NULL,
col 2         VARCHAR(100) NOT NULL,
col 3 VARCHAR(100) NOT NULL,
col 4        INTEGER NOT NULL,
col 5  INTEGER NOT NULL,
col 6           INTEGER NOT NULL);
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For this tutorial, our Lambda function will need some Python libraries like &lt;strong&gt;&lt;a href="https://pypi.org/project/SQLAlchemy/" rel="noopener noreferrer"&gt;Sqalchemy&lt;/a&gt;&lt;/strong&gt;, &lt;strong&gt;&lt;a href="https://pypi.org/project/psycopg2/" rel="noopener noreferrer"&gt;Psycopg2&lt;/a&gt;&lt;/strong&gt;, So you need to create a virtual environment in Python with these dependencies as well as Lambda Script before compressing the .zip file that you’ll upload into AWS.&lt;/p&gt;

&lt;p&gt;At this point all you need to configure your Lambda Function into AWS is a Python Script and trigger your Lambda each time someone uploads a new object to an S3 bucket, you need to configure the following resources:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Upload your lambda_function.zip (&lt;em&gt;&lt;strong&gt;Python script and dependencies or yo can add aws custom layer&lt;/strong&gt;&lt;/em&gt;) and use the code example from bellow to send data into redshift &lt;code&gt;lambda_function.py&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Attach an IAM role to the Lambda function, which grants access to &lt;code&gt;AWSLambdaVPCAccesExcecutionRole&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;For this case, you need to add VPC default in the Lambda function or any other you have.&lt;/li&gt;
&lt;li&gt;Add &lt;strong&gt;&lt;a href="https://docs.aws.amazon.com/lambda/latest/dg/configuration-envvars.html" rel="noopener noreferrer"&gt;environment variables &lt;/a&gt;&lt;/strong&gt; &lt;em&gt;“CON”&lt;/em&gt; and &lt;em&gt;“Table”&lt;/em&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;CON = "postgresql://USERNAME:PASSWORD@clustername.xyz.redshift.amazonaws.com:5439/DATABASE"
Table = "mytable"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;Create an &lt;strong&gt;&lt;a href="https://docs.aws.amazon.com/AmazonS3/latest/userguide/NotificationHowTo.html" rel="noopener noreferrer"&gt;S3 Event Notification&lt;/a&gt;&lt;/strong&gt; that invokes the Lambda function each time someone uploads an object to your S3 bucket.&lt;/li&gt;
&lt;li&gt;You can configure a timeout ≥ 3 min.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Let's go to the code Here 👇&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import sqlalchemy 
import psycopg2
from sqlalchemy import create_engine 
from sqlalchemy.orm import scoped_session, sessionmaker
from datetime import datetime,timedelta
import os

def handler(event, context):
   for record in event['Records']:

      S3_BUCKET = record['s3']['bucket']['name']
      S3_OBJECT = record['s3']['object']['key']


    # Arguments
    DBC= os.environ["CON"]
    RS_TABLE = os.environ["Table"]
    RS_PORT = "5439"
    DELIMITER = "','"
    REGION = "'us-east-1' "
    # Connection
    engine = create_engine(DBC)
    db = scoped_session(sessionmaker(bind=engine))
    # Send files from S3 into redshift
    copy_query = "COPY "+RS_TABLE+" from 's3://"+   S3_BUCKET+'/'+S3_OBJECT+"' iam_role 'arn:aws:iam::11111111111:role/youroleredshift' delimiter "+DELIMITER+" IGNOREHEADER 1 REGION " + REGION
    # Execute querie
    db.execute(copy_query)
    db.commit()
    db.close()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Before you’re ready to upload a CSV file to your S3 bucket, keep in mind you’ve created a table first, so after you’ve implemented your lambda function and configured it correctly, you can upload data to S3 and go to DBeaver to query data in your table.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7voyjgsafsg375wyffna.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7voyjgsafsg375wyffna.jpeg" alt="data uploaded later of executing Lambda function" width="700" height="700"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;AWS Lambda is an easy way to automate your process but We need to understand which moment can’t use it, for example, AWS Lambda has a 6MB payload limit, so it is not practical to migrate very large tables this way.&lt;br&gt;
On the other hand, the main advantage to use this service is that is a whole solution Serverless!! , So No need to manage any EC2 instances.&lt;/p&gt;




&lt;p&gt;Thank you for reading this far. If you find this article useful, like and share this article. Someone could find it useful too and why not invite me for a coffee.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.paypal.com/donate/?hosted_button_id=GBVXVLXMETRHE" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fsinestesiaradio.net%2Fimages%2Fpaypal-donate-button-high-quality-png-1_orig.png" alt="Sponsor 💵" width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;Follow me 👉 &lt;a href="https://www.linkedin.com/in/alexanderbolano/" rel="noopener noreferrer"&gt;&lt;strong&gt;LinkedIn&lt;/strong&gt;&lt;/a&gt;&lt;br&gt;
Follow me 👉 &lt;a href="https://twitter.com/Alex_bonella" rel="noopener noreferrer"&gt;&lt;strong&gt;Twitter&lt;/strong&gt;&lt;/a&gt;&lt;br&gt;
Contact: &lt;strong&gt;&lt;a href="mailto:alexbonella2806@gmail.com"&gt;alexbonella2806@gmail.com&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

</description>
    </item>
  </channel>
</rss>
