DEV Community

Cover image for My First Data Pipeline Project Using Airflow, Docker & Postgres (COVID API Edition)
Mohamed Hussain S
Mohamed Hussain S

Posted on • Edited on

1 1 1

My First Data Pipeline Project Using Airflow, Docker & Postgres (COVID API Edition)

Hey Devs đź‘‹,

If you’re starting out in data engineering or curious how real-world data pipelines work, this post is for you.

As an Associate Data Engineer Intern, I wanted to go beyond watching tutorials and actually build a working pipeline — one that pulls real-world data daily, processes it, stores it, and is fully containerized.
So I picked something simple but meaningful: global COVID-19 stats.

Here’s a breakdown of what I built, how it works, and what I learned.


📊 What This Pipeline Does

This mini-project automates the following:

âś… Pulls daily global COVID-19 stats from a public API
âś… Uses Airflow to schedule and monitor the task
âś… Stores the results in a PostgreSQL database
âś… Runs everything inside Docker containers

It's a beginner-friendly, end-to-end project to get your hands dirty with core data engineering tools.


đź§° The Tech Stack

  • Python — for the main fetch/store logic
  • Airflow — to orchestrate and schedule tasks
  • PostgreSQL — for storing daily data
  • Docker — to containerize and simplify setup
  • disease.sh API — open-source COVID-19 stats API

âś… Want this as a .md file to post on Dev.to?

Let me know — I can prep and format it for you in one go.
Also happy to help with a LinkedIn summary or visual for carousels if you're planning to cross-post.

⚙️ How It Works (Behind the Scenes)

  1. Airflow DAG triggers once per day
  2. A Python script sends a request to the COVID-19 API
  3. Parses the JSON response
  4. Inserts the cleaned data into a PostgreSQL table
  5. Logs everything (success/failure) into Airflow's UI

Everything runs locally via docker-compose — one command and you're up and running.


🗂️ Project Structure

airflow-docker/
├── dags/               # Airflow DAG (main logic)
├── scripts/            # Python file to fetch + insert data
├── docker-compose.yaml # Setup for Airflow + Postgres
├── logs/               # Logs generated by Airflow
└── plugins/            # (Optional) Airflow plugins
Enter fullscreen mode Exit fullscreen mode

You can check the full repo here:
👉 GitHub: mohhddhassan/covid-data-pipeline


đź§  Key Learnings

âś… How to build and run a simple Airflow DAG
âś… Using Docker to spin up services like Postgres & Airflow
âś… How Python connects to a DB and inserts structured data
âś… Observing how tasks are logged, retried, and managed in Airflow

This small project gave me confidence in how the core parts of a pipeline talk to each other.


🔍 Sample Output from API

Here’s a snippet of the JSON response from the API:

{
  "cases": 708128930,
  "deaths": 7138904,
  "recovered": 0,
  "updated": 1717689600000
}
Enter fullscreen mode Exit fullscreen mode

And here’s a sample SQL insert triggered via Python:

INSERT INTO covid_stats (date, total_cases, total_deaths, recovered)
VALUES ('2025-06-06', 708128930, 7138904, 0);
Enter fullscreen mode Exit fullscreen mode

🔧 What’s Next?

I’m planning to:

🚧 Add deduplication logic (so it doesn’t insert same data daily)
📊 Maybe create a Streamlit dashboard on top of the database
⚙️ Play with sensors, templates, and XComs in Airflow
⚡ Extend the pipeline with ClickHouse for OLAP-style analytics


📌 Why You Should Try Something Like This

If you're learning data engineering:

  • Start small, but make it real
  • Use public APIs to practice fetching and storing data
  • Wrap it with orchestration + containerization — it’s closer to the real thing

This project taught me way more than passively following courses ever could.


🙋‍♂️ About Me

Mohamed Hussain S
Associate Data Engineer Intern
LinkedIn | GitHub


🚀 Learning in public, one pipeline at a time.

Redis image

Short-term memory for faster
AI agents

AI agents struggle with latency and context switching. Redis fixes it with a fast, in-memory layer for short-term context—plus native support for vectors and semi-structured data to keep real-time workflows on track.

Start building

Top comments (0)

đź‘‹ Kindness is contagious

Discover fresh viewpoints in this insightful post, supported by our vibrant DEV Community. Every developer’s experience matters—add your thoughts and help us grow together.

A simple “thank you” can uplift the author and spark new discussions—leave yours below!

On DEV, knowledge-sharing connects us and drives innovation. Found this useful? A quick note of appreciation makes a real impact.

Okay