<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Shivangi Varshney</title>
    <description>The latest articles on Forem by Shivangi Varshney (@shivangi_varshney_4dfe1c5).</description>
    <link>https://forem.com/shivangi_varshney_4dfe1c5</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3591140%2Fb3f921f2-296e-438d-a5b9-66f4577b13df.jpg</url>
      <title>Forem: Shivangi Varshney</title>
      <link>https://forem.com/shivangi_varshney_4dfe1c5</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/shivangi_varshney_4dfe1c5"/>
    <language>en</language>
    <item>
      <title>Integrating Airflow, dbt, Postgres and Docker: Building a Modern, Scalable Data Workflow</title>
      <dc:creator>Shivangi Varshney</dc:creator>
      <pubDate>Fri, 31 Oct 2025 11:43:48 +0000</pubDate>
      <link>https://forem.com/shivangi_varshney_4dfe1c5/integrating-airflow-dbt-postgres-and-docker-building-a-modern-scalable-data-workflow-168a</link>
      <guid>https://forem.com/shivangi_varshney_4dfe1c5/integrating-airflow-dbt-postgres-and-docker-building-a-modern-scalable-data-workflow-168a</guid>
      <description>&lt;h2&gt;
  
  
  How to effectively orchestrate transformation pipelines using open-source tools, containerization, and engineering best practices.
&lt;/h2&gt;




&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Building reliable, repeatable, and scalable data pipelines is now foundational to analytics engineering. By combining Apache Airflow for orchestration, dbt for SQL-based transformations, and Docker for reproducible environments, data teams can design workflows that are modular, maintainable, and cloud-ready.&lt;/p&gt;

&lt;p&gt;This guide covers how each tool fits together and walks through a complete example that loads, transforms, and tests data automatically with Airflow and dbt inside Docker.&lt;/p&gt;




&lt;h2&gt;
  
  
  Tooling Foundations
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Airflow: Orchestration Engine
&lt;/h3&gt;

&lt;p&gt;Airflow is an open-source workflow manager that lets you schedule and monitor data pipelines programmatically. It handles task dependencies, retries, and scheduling, becoming the control plane connecting ingestion, transformation, and reporting.&lt;/p&gt;

&lt;h3&gt;
  
  
  dbt: Transformations as Code
&lt;/h3&gt;

&lt;p&gt;dbt brings the principles of software engineering—modularity, testing, and version control—to analytics.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Core concepts:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Models:&lt;/strong&gt; SQL files that define how raw data should be shaped into usable tables or views.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tests:&lt;/strong&gt; Defined in YAML, they validate your data for quality. Common types include &lt;code&gt;unique&lt;/code&gt;, &lt;code&gt;not_null&lt;/code&gt;, &lt;code&gt;accepted_values&lt;/code&gt;, and &lt;code&gt;relationships&lt;/code&gt;.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Materializations:&lt;/strong&gt; Control how dbt models are built and stored in the warehouse. Common materializations include:

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;view:&lt;/strong&gt; The model is saved as a database view (default). It always reflects the latest data but might be slower for complex queries.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;table:&lt;/strong&gt; The model is built as a physical table, which can improve performance but requires refreshing.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;incremental:&lt;/strong&gt; Only new or updated data is processed and added to an existing table.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ephemeral:&lt;/strong&gt; The model is not stored in the database but is embedded as a subquery in downstream models.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;Choosing the right materialization depends on your data freshness requirements and warehouse performance.&lt;/p&gt;

&lt;h3&gt;
  
  
  PostgreSQL:
&lt;/h3&gt;

&lt;p&gt;PostgreSQL is an open-source, highly extensible relational database that serves effectively as a data warehouse. It handles large volumes of data, supports complex queries, and integrates well with analytics tools, making it ideal for scalable and reliable data workflows.&lt;/p&gt;

&lt;h3&gt;
  
  
  Docker: Consistent Environments
&lt;/h3&gt;

&lt;p&gt;Docker packages applications, dependencies, and configurations together—ensuring the same setup runs identically on local machines, CI/CD pipelines, or cloud VMs.&lt;/p&gt;




&lt;h2&gt;
  
  
  Hands-On Example: Airflow, dbt, and Docker Pipeline
&lt;/h2&gt;

&lt;p&gt;In this example, you’ll set up a local workflow with Airflow orchestrating dbt commands within Docker. The pipeline will load seed data, run a dbt model, test the results, and display everything in Airflow’s UI.&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 1: Set Up Project Structure
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Goal:&lt;/strong&gt; Set up the folders that organize both dbt and Airflow files.&lt;/p&gt;

&lt;p&gt;Create your working directory and required folders:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;mkdir dbt_sample_project
cd dbt_sample_project
mkdir airflow dbt_project
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Your directory layout:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;dbt_sample_project/
├── airflow/
└── dbt_project/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Step 2: Configure dbt
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Goal:&lt;/strong&gt; Connect dbt to your Postgres database and define the base project structure.&lt;/p&gt;

&lt;p&gt;Inside the &lt;code&gt;dbt_project&lt;/code&gt; directory, create the following configuration files.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;profiles.yml&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;default&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;target&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;dev&lt;/span&gt;
  &lt;span class="na"&gt;outputs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;dev&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;postgres&lt;/span&gt;
      &lt;span class="na"&gt;host&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;postgres&lt;/span&gt;
      &lt;span class="na"&gt;user&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;postgres&lt;/span&gt;
      &lt;span class="na"&gt;password&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;password&lt;/span&gt;
      &lt;span class="na"&gt;port&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;5432&lt;/span&gt;
      &lt;span class="na"&gt;dbname&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;sampledb&lt;/span&gt;
      &lt;span class="na"&gt;schema&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;public&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;What it does:&lt;/strong&gt;&lt;br&gt;
Defines connection details so dbt knows which database and schema to write models into.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;dbt_project.yml&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;dbt_project'&lt;/span&gt;
&lt;span class="na"&gt;version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;1.0'&lt;/span&gt;

&lt;span class="na"&gt;config-version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;2&lt;/span&gt;

&lt;span class="na"&gt;profile&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;default'&lt;/span&gt;

&lt;span class="na"&gt;model-paths&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;models"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;What it does:&lt;/strong&gt;&lt;br&gt;
Defines project-level settings such as version, folder paths, and linked profile.&lt;/p&gt;


&lt;h2&gt;
  
  
  Step 3: Seed Sample Data
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Goal:&lt;/strong&gt; Create a source dataset to load into Postgres using dbt’s seeding feature.&lt;/p&gt;

&lt;p&gt;Inside &lt;code&gt;dbt_project/seeds/&lt;/code&gt;, add a CSV file named &lt;code&gt;raw_orders.csv&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;order_id,customer_id,order_date,total_amount,order_status
1,101,2025-10-01,250,completed
2,102,2025-10-03,450,pending
3,103,2025-10-05,100,completed
4,104,2025-10-10,300,completed
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;What it does:&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
This CSV simulates your “raw orders” dataset. Running &lt;code&gt;dbt seed&lt;/code&gt; will upload it into the database as a base table for transformations.&lt;/p&gt;


&lt;h2&gt;
  
  
  Step 4: Create a dbt Model and Tests
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Goal:&lt;/strong&gt; Transform data and define validation tests to ensure accuracy.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Model:&lt;/strong&gt; &lt;code&gt;dbt_project/models/orders.sql&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="p"&gt;{{&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;materialized&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'table'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;}}&lt;/span&gt;

&lt;span class="k"&gt;select&lt;/span&gt;
    &lt;span class="n"&gt;order_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;order_date&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;total_amount&lt;/span&gt;
&lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="p"&gt;{{&lt;/span&gt; &lt;span class="k"&gt;ref&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'raw_orders'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;}}&lt;/span&gt;
&lt;span class="k"&gt;where&lt;/span&gt; &lt;span class="n"&gt;order_status&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'completed'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;What it does:&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Transforms the raw orders into a curated table containing only completed orders.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Schema Tests:&lt;/strong&gt; &lt;code&gt;dbt_project/models/orders.yml&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;2&lt;/span&gt;

&lt;span class="na"&gt;models&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;orders&lt;/span&gt;
    &lt;span class="na"&gt;columns&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;order_id&lt;/span&gt;
        &lt;span class="na"&gt;tests&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;unique&lt;/span&gt;
          &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;not_null&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;total_amount&lt;/span&gt;
        &lt;span class="na"&gt;tests&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;not_null&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;What it does:&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Defines tests that run after the transformation to verify data integrity.&lt;/p&gt;


&lt;h2&gt;
  
  
  Step 5: Add the Airflow DAG
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Goal:&lt;/strong&gt; Schedule and manage your dbt steps (seed, run, and test) through Airflow.&lt;/p&gt;

&lt;p&gt;In &lt;code&gt;airflow/dags/dbt_dag.py&lt;/code&gt;, add:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;airflow&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;DAG&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;airflow.operators.bash&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BashOperator&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;timedelta&lt;/span&gt;

&lt;span class="n"&gt;default_args&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;owner&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;airflow&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;start_date&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2025&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;31&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;depends_on_past&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;retries&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;retry_delay&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;timedelta&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;minutes&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nc"&gt;DAG&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;dbt_workflow&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
         &lt;span class="n"&gt;default_args&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;default_args&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
         &lt;span class="n"&gt;schedule_interval&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;@daily&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
         &lt;span class="n"&gt;catchup&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;dag&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;

    &lt;span class="n"&gt;run_dbt_seed&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;BashOperator&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;task_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;dbt_seed&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;bash_command&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;dbt seed --profiles-dir /dbt --project-dir /dbt&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;run_dbt_run&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;BashOperator&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;task_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;dbt_run&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;bash_command&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;dbt run --profiles-dir /dbt --project-dir /dbt&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;run_dbt_test&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;BashOperator&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;task_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;dbt_test&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;bash_command&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;dbt test --profiles-dir /dbt --project-dir /dbt&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;run_dbt_seed&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;run_dbt_run&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;run_dbt_test&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;What it does:&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Defines a DAG (Directed Acyclic Graph) where:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;code&gt;dbt_seed&lt;/code&gt; loads the CSVs into the database.
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;dbt_run&lt;/code&gt; executes your dbt models.
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;dbt_test&lt;/code&gt; validates data integrity.
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The arrows indicate dependency — each step runs only after the previous one completes.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxx9s0zo3agiekhocndwc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxx9s0zo3agiekhocndwc.png" alt="DAG Graph" width="800" height="391"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Also, create an empty folder for storing Airflow logs.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;mkdir &lt;/span&gt;airflow/logs
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs1p9tprf1mnis8jtf3n0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs1p9tprf1mnis8jtf3n0.png" alt="DAG Logs" width="800" height="428"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 6: Build the Airflow + dbt Docker Image
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Goal:&lt;/strong&gt; Create a Docker image containing both Airflow and dbt for consistent execution environments.&lt;/p&gt;

&lt;p&gt;Inside &lt;code&gt;airflow&lt;/code&gt;, create a Dockerfile:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="s"&gt; apache/airflow:2.5.3-python3.8&lt;/span&gt;

&lt;span class="k"&gt;USER&lt;/span&gt;&lt;span class="s"&gt; root&lt;/span&gt;
&lt;span class="c"&gt;# Optional but handy for compiling deps&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;apt-get update &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; apt-get &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-y&lt;/span&gt; &lt;span class="nt"&gt;--no-install-recommends&lt;/span&gt; build-essential &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;rm&lt;/span&gt; &lt;span class="nt"&gt;-rf&lt;/span&gt; /var/lib/apt/lists/&lt;span class="k"&gt;*&lt;/span&gt;
&lt;span class="k"&gt;USER&lt;/span&gt;&lt;span class="s"&gt; airflow&lt;/span&gt;
&lt;span class="c"&gt;# Install dbt (pin versions that match your adapter)&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;--no-cache-dir&lt;/span&gt; &lt;span class="se"&gt;\
&lt;/span&gt;    dbt-core&lt;span class="o"&gt;==&lt;/span&gt;1.5.0 &lt;span class="se"&gt;\
&lt;/span&gt;    dbt-postgres&lt;span class="o"&gt;==&lt;/span&gt;1.5.0 &lt;span class="se"&gt;\
&lt;/span&gt;    psycopg2-binary

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;What it does:&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Starts from an official Airflow base image, installs build tools, and adds dbt dependencies.&lt;br&gt;&lt;br&gt;
This image will power both Airflow webserver and scheduler.&lt;/p&gt;


&lt;h2&gt;
  
  
  Step 7: Create the Docker Compose Setup
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Goal:&lt;/strong&gt; Define services (Airflow, Postgres, and supporting components) that run together.&lt;/p&gt;

&lt;p&gt;In your project root, create &lt;code&gt;docker-compose.yaml&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;3'&lt;/span&gt;

&lt;span class="na"&gt;x-airflow-common&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="nl"&gt;&amp;amp;airflow-common&lt;/span&gt;
  &lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;context&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;./airflow&lt;/span&gt;
    &lt;span class="na"&gt;dockerfile&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Dockerfile&lt;/span&gt;
  &lt;span class="na"&gt;environment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="nl"&gt;&amp;amp;airflow-common-env&lt;/span&gt;
    &lt;span class="na"&gt;AIRFLOW__CORE__EXECUTOR&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;LocalExecutor&lt;/span&gt;
    &lt;span class="na"&gt;AIRFLOW__CORE__SQL_ALCHEMY_CONN&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;postgresql+psycopg2://postgres:password@postgres/sampledb&lt;/span&gt;
    &lt;span class="na"&gt;AIRFLOW__CORE__FERNET_KEY&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;'&lt;/span&gt;
    &lt;span class="na"&gt;AIRFLOW__CORE__DAGS_ARE_PAUSED_AT_CREATION&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;true'&lt;/span&gt;
    &lt;span class="na"&gt;AIRFLOW__CORE__LOAD_EXAMPLES&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;true'&lt;/span&gt;
    &lt;span class="na"&gt;AIRFLOW__API__AUTH_BACKEND&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;airflow.api.auth.backend.basic_auth'&lt;/span&gt;
  &lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;./airflow/dags:/opt/airflow/dags&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;./airflow/logs:/opt/airflow/logs&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;./dbt_project:/dbt&lt;/span&gt;
  &lt;span class="na"&gt;user&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;${AIRFLOW_UID:-50000}:${AIRFLOW_GID:-50000}"&lt;/span&gt;
  &lt;span class="na"&gt;depends_on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;postgres&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;condition&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;service_healthy&lt;/span&gt;

&lt;span class="na"&gt;services&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;postgres&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;postgres:13&lt;/span&gt;
    &lt;span class="na"&gt;environment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;POSTGRES_USER&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;postgres&lt;/span&gt;
      &lt;span class="na"&gt;POSTGRES_PASSWORD&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;password&lt;/span&gt;
      &lt;span class="na"&gt;POSTGRES_DB&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;sampledb&lt;/span&gt;
    &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;5432:5432"&lt;/span&gt;
    &lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;pgdata:/var/lib/postgresql/data&lt;/span&gt;
    &lt;span class="na"&gt;healthcheck&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;test&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CMD"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pg_isready"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-U"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;postgres"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
      &lt;span class="na"&gt;interval&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;5s&lt;/span&gt;
      &lt;span class="na"&gt;retries&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;5&lt;/span&gt;

  &lt;span class="na"&gt;airflow-init&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="nv"&gt;*airflow-common&lt;/span&gt;
    &lt;span class="na"&gt;entrypoint&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/bin/bash&lt;/span&gt;
    &lt;span class="na"&gt;command&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;-c&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
        &lt;span class="s"&gt;airflow db upgrade&lt;/span&gt;
        &lt;span class="s"&gt;airflow users create \&lt;/span&gt;
          &lt;span class="s"&gt;--username ${_AIRFLOW_WWW_USER_USERNAME:-airflow} \&lt;/span&gt;
          &lt;span class="s"&gt;--firstname Admin \&lt;/span&gt;
          &lt;span class="s"&gt;--lastname User \&lt;/span&gt;
          &lt;span class="s"&gt;--role Admin \&lt;/span&gt;
          &lt;span class="s"&gt;--email admin@example.com \&lt;/span&gt;
          &lt;span class="s"&gt;--password ${_AIRFLOW_WWW_USER_PASSWORD:-airflow}&lt;/span&gt;
    &lt;span class="na"&gt;environment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="nv"&gt;*airflow-common-env&lt;/span&gt;
      &lt;span class="na"&gt;_AIRFLOW_WWW_USER_CREATE&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;true'&lt;/span&gt;

  &lt;span class="na"&gt;airflow-webserver&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="nv"&gt;*airflow-common&lt;/span&gt;
    &lt;span class="na"&gt;command&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;webserver&lt;/span&gt;
    &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;8080:8080"&lt;/span&gt;
    &lt;span class="na"&gt;healthcheck&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;test&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CMD"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;curl"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;--fail"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http://localhost:8080/health"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
      &lt;span class="na"&gt;interval&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;10s&lt;/span&gt;
      &lt;span class="na"&gt;timeout&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;10s&lt;/span&gt;
      &lt;span class="na"&gt;retries&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;5&lt;/span&gt;
    &lt;span class="na"&gt;restart&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;always&lt;/span&gt;

  &lt;span class="na"&gt;airflow-scheduler&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="nv"&gt;*airflow-common&lt;/span&gt;
    &lt;span class="na"&gt;command&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;scheduler&lt;/span&gt;
    &lt;span class="na"&gt;restart&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;always&lt;/span&gt;

&lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;pgdata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;What it does:&lt;/strong&gt;  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Defines &lt;code&gt;postgres&lt;/code&gt; as the database used by both dbt and Airflow.
&lt;/li&gt;
&lt;li&gt;Configures Airflow’s webserver, scheduler, and initialization steps.
&lt;/li&gt;
&lt;li&gt;Mounts your project folders into the containers for live development.
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Step 8: Run Everything
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Goal:&lt;/strong&gt; Start the entire environment and trigger your workflow.&lt;/p&gt;

&lt;p&gt;Initialize Airflow:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker-compose up airflow-init
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then start all services:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;docker-compose up

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When it's running, open your browser to &lt;a href="http://localhost:8080" rel="noopener noreferrer"&gt;http://localhost:8080&lt;/a&gt;.&lt;br&gt;&lt;br&gt;
Log in using:&lt;br&gt;
Username: airflow&lt;br&gt;
Password: airflow&lt;/p&gt;

&lt;p&gt;From the Airflow UI, trigger the &lt;strong&gt;dbt_workflow&lt;/strong&gt; DAG.&lt;br&gt;&lt;br&gt;
You’ll see three tasks: seed, run, and test — each turning green as it succeeds.  &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2gxst1r06111seqp649u.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2gxst1r06111seqp649u.png" alt="DAG Successful Completion" width="800" height="425"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Optional Verification:&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Run this command to confirm the transformed data in Postgres:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;PGPASSWORD&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;password psql &lt;span class="nt"&gt;-h&lt;/span&gt; localhost &lt;span class="nt"&gt;-p&lt;/span&gt; 5432 &lt;span class="nt"&gt;-U&lt;/span&gt; postgres &lt;span class="nt"&gt;-d&lt;/span&gt; sampledb
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You should see rows only for completed orders — proving your dbt model worked end to end.&lt;/p&gt;




&lt;h2&gt;
  
  
  Advanced: Connecting to Cloud Warehouses
&lt;/h2&gt;

&lt;p&gt;Extend the same setup to use BigQuery, Snowflake, or Microsoft Fabric.  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Update &lt;code&gt;profiles.yml&lt;/code&gt; with the correct credentials.
&lt;/li&gt;
&lt;li&gt;Use Airflow Connections for key management.
&lt;/li&gt;
&lt;li&gt;Run your workflows through managed Airflow services like MWAA or Cloud Composer for less infrastructure maintenance.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;By combining dbt, Airflow, and Docker, you’ve created a fully automated, tested, and reproducible analytics pipeline.&lt;/p&gt;

&lt;p&gt;Together they deliver:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Data transformations as version-controlled code
&lt;/li&gt;
&lt;li&gt;Reliable orchestration and scheduling
&lt;/li&gt;
&lt;li&gt;Consistent environments for every developer or CI/CD job
&lt;/li&gt;
&lt;li&gt;Easy scalability to cloud setups
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This setup forms a strong foundation for production ETL pipelines and analytics workflows — one that scales effortlessly as your data and team grow.&lt;/p&gt;

</description>
      <category>airflow</category>
      <category>docker</category>
      <category>beginners</category>
      <category>databuildtool</category>
    </item>
  </channel>
</rss>
