<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: De' Clerke</title>
    <description>The latest articles on Forem by De' Clerke (@de_clerke).</description>
    <link>https://forem.com/de_clerke</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3506183%2F46467aed-cbcf-426d-95d8-160e51bc66f9.jpg</url>
      <title>Forem: De' Clerke</title>
      <link>https://forem.com/de_clerke</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/de_clerke"/>
    <language>en</language>
    <item>
      <title>Apache Airflow 2 vs 3: A Deep Technical Comparison for Data Engineers</title>
      <dc:creator>De' Clerke</dc:creator>
      <pubDate>Wed, 15 Apr 2026 09:10:31 +0000</pubDate>
      <link>https://forem.com/de_clerke/apache-airflow-2-vs-3-a-deep-technical-comparison-for-data-engineers-2on5</link>
      <guid>https://forem.com/de_clerke/apache-airflow-2-vs-3-a-deep-technical-comparison-for-data-engineers-2on5</guid>
      <description>&lt;h2&gt;
  
  
  Apache Airflow 2 vs 3: A Deep Technical Comparison for Data Engineers 🚀
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;TL;DR&lt;/strong&gt; — Airflow 3 dissolves the monolithic webserver into three independent&lt;br&gt;
services, strips direct database access from task code, ships a fully stable&lt;br&gt;
Task SDK, and rewrites the entire UI in React. If you are running Airflow 2 in&lt;br&gt;
production, this article will tell you exactly what breaks, what improves, and&lt;br&gt;
how to migrate without losing a night's sleep. 😴&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h3&gt;
  
  
  Why This Comparison Matters ⚖️
&lt;/h3&gt;

&lt;p&gt;Every major Airflow release has nudged the architecture forward. Airflow 2 gave us&lt;br&gt;
the TaskFlow API, the Scheduler high-availability refactor, and provider packages.&lt;br&gt;
Airflow 3 is different in kind, not just degree.&lt;/p&gt;

&lt;p&gt;In the process of migrating a production Docker Compose stack for a healthcare ML&lt;br&gt;
retraining pipeline from Airflow 2 patterns to Airflow 3, every single one of the&lt;br&gt;
following hit in production:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;CPU spike to &lt;strong&gt;600%&lt;/strong&gt; caused by a silent breaking change in JWT key management 📈&lt;/li&gt;
&lt;li&gt;Tasks silently failing with &lt;code&gt;Connection refused&lt;/code&gt; because &lt;code&gt;localhost&lt;/code&gt; no longer
means what it used to 🔌&lt;/li&gt;
&lt;li&gt;A healthcheck that always reported &lt;em&gt;unhealthy&lt;/em&gt; because port 8974 no longer exists ❌&lt;/li&gt;
&lt;li&gt;A user creation step that silently did nothing because FAB is gone 👤&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each of these failures traces back to a deliberate, principled architectural&lt;br&gt;
decision in Airflow 3. Once you understand &lt;em&gt;why&lt;/em&gt; the changes were made, the fixes&lt;br&gt;
are obvious — but without that context, Airflow 3 can feel like it is actively&lt;br&gt;
working against you.&lt;/p&gt;

&lt;p&gt;This article is that context. 💡&lt;/p&gt;


&lt;h3&gt;
  
  
  The 30-Second Summary ⏱️
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Dimension&lt;/th&gt;
&lt;th&gt;Airflow 2&lt;/th&gt;
&lt;th&gt;Airflow 3&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;UI framework&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Flask-AppBuilder (FAB)&lt;/td&gt;
&lt;td&gt;React (FastAPI backend)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Webserver&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;airflow webserver&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;airflow api-server&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;DAG Processor&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Embedded in scheduler&lt;/td&gt;
&lt;td&gt;Mandatory separate service&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Task Execution&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Direct fork/subprocess&lt;/td&gt;
&lt;td&gt;Task Execution API (AIP-72)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Metadata DB access from tasks&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Allowed&lt;/td&gt;
&lt;td&gt;Prohibited&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Auth manager default&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;FAB (full RBAC)&lt;/td&gt;
&lt;td&gt;SimpleAuthManager&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;REST API&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;v1 (Flask)&lt;/td&gt;
&lt;td&gt;v2 (FastAPI, stable)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Default schedule&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;@daily&lt;/code&gt; (cron)&lt;/td&gt;
&lt;td&gt;&lt;code&gt;None&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;&lt;code&gt;catchup&lt;/code&gt; default&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;True&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;False&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;SequentialExecutor&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Available&lt;/td&gt;
&lt;td&gt;Removed&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;SubDAGs&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Available&lt;/td&gt;
&lt;td&gt;Removed&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;SLAs&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Available&lt;/td&gt;
&lt;td&gt;Removed&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Import path for &lt;code&gt;@dag&lt;/code&gt;/&lt;code&gt;@task&lt;/code&gt;&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;airflow.decorators&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;airflow.sdk&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;XCom pickling&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Enabled by default&lt;/td&gt;
&lt;td&gt;Disabled by default&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Python minimum&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;3.8&lt;/td&gt;
&lt;td&gt;3.9&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;PostgreSQL minimum&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;12&lt;/td&gt;
&lt;td&gt;13&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;


&lt;h3&gt;
  
  
  🏗️ Part 1 — The Architectural Paradigm Shift
&lt;/h3&gt;
&lt;h4&gt;
  
  
  Airflow 2: One Webserver to Rule Them All 🏛️
&lt;/h4&gt;

&lt;p&gt;In Airflow 2, the mental model for a self-hosted deployment is relatively&lt;br&gt;
straightforward. You run four processes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;airflow webserver       &lt;span class="c"&gt;# Flask-AppBuilder UI + REST API v1 + auth&lt;/span&gt;
airflow scheduler       &lt;span class="c"&gt;# parses DAGs + triggers task instances&lt;/span&gt;
airflow worker          &lt;span class="c"&gt;# (CeleryExecutor) executes tasks&lt;/span&gt;
postgres/mysql          &lt;span class="c"&gt;# metadata database&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The webserver does double duty — it serves the browser UI &lt;em&gt;and&lt;/em&gt; exposes the REST&lt;br&gt;
API &lt;em&gt;and&lt;/em&gt; handles authentication, all from a single Flask application. The&lt;br&gt;
scheduler parses your &lt;code&gt;dags/&lt;/code&gt; directory inline, as part of its own main loop.&lt;/p&gt;

&lt;p&gt;This is simple to reason about. It is also a single point of failure for three&lt;br&gt;
completely separate concerns.🏚️&lt;/p&gt;
&lt;h3&gt;
  
  
  Airflow 3: Separation of Concerns as a First-Class Constraint
&lt;/h3&gt;

&lt;p&gt;Airflow 3 decomposes the monolith into discrete, independently scalable services:🧩&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;airflow api-server          &lt;span class="c"&gt;# FastAPI: UI + REST API v2 + auth (replaces webserver)&lt;/span&gt;
airflow scheduler           &lt;span class="c"&gt;# triggers task instances only; NO DAG parsing&lt;/span&gt;
airflow dag-processor       &lt;span class="c"&gt;# mandatory: parses DAGs, writes to serialized_dag table&lt;/span&gt;
airflow triggerer           &lt;span class="c"&gt;# manages deferrable operators&lt;/span&gt;
postgres/mysql              &lt;span class="c"&gt;# metadata database&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The key insight: the scheduler in Airflow 3 &lt;strong&gt;does not parse DAGs&lt;/strong&gt;. It reads the&lt;br&gt;
&lt;code&gt;serialized_dag&lt;/code&gt; table, which is populated exclusively by the dag-processor service.&lt;br&gt;
If you start a scheduler without a dag-processor, it will start cleanly — and then&lt;br&gt;
do nothing, because it has no serialized DAGs to schedule.🏜️&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Airflow 2: single scheduler did everything
[Scheduler process]
  ├── Parses dags/ directory
  ├── Updates serialized_dag table
  ├── Checks heartbeats
  └── Triggers TaskInstances

# Airflow 3: responsibilities split
[dag-processor]               [scheduler]
  └── Parses dags/                 ├── Reads serialized_dag
      Updates serialized_dag       ├── Checks heartbeats
                                   └── Triggers TaskInstances via Execution API
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This split unlocks horizontal scalability. The dag-processor can be scaled&lt;br&gt;
independently on compute-heavy deployments with thousands of DAG files, without&lt;br&gt;
touching the scheduler's scheduling loop latency.⚡&lt;/p&gt;


&lt;h2&gt;
  
  
  Part 2 — The Task Execution API (AIP-72): The Biggest Change You Haven't Heard Of 🤫
&lt;/h2&gt;
&lt;h3&gt;
  
  
  How Airflow 2 Ran Tasks
&lt;/h3&gt;

&lt;p&gt;In Airflow 2 with &lt;code&gt;LocalExecutor&lt;/code&gt;, task execution worked like this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Scheduler identifies a TaskInstance ready to run&lt;/li&gt;
&lt;li&gt;Scheduler forks a subprocess&lt;/li&gt;
&lt;li&gt;Subprocess imports your DAG file directly&lt;/li&gt;
&lt;li&gt;Subprocess calls &lt;code&gt;task.execute(context)&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Task code has unrestricted access to &lt;code&gt;settings.Session&lt;/code&gt;, &lt;code&gt;DagRun&lt;/code&gt;, &lt;code&gt;TaskInstance&lt;/code&gt;
models — the entire Airflow metadata database 🗄️&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Step 5 is a footgun. Task code could accidentally (or intentionally) query, modify,&lt;br&gt;
or drop metadata. It tightly coupled your business logic to Airflow internals.💣&lt;/p&gt;
&lt;h3&gt;
  
  
  How Airflow 3 Runs Tasks
&lt;/h3&gt;

&lt;p&gt;Airflow 3 introduces a &lt;strong&gt;Task Execution API&lt;/strong&gt; — a lightweight HTTP interface that&lt;br&gt;
sits between the task subprocess and the metadata database:🛡️&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[Scheduler] ──triggers──► [Task Subprocess]
                                 │
                                 │ HTTP (JWT-authenticated)
                                 ▼
                          [API Server /execution/]
                                 │
                                 ▼
                          [Metadata Database]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Task code no longer talks to the database. It talks to the Execution API, which&lt;br&gt;
enforces a controlled, auditable surface for every metadata operation. Direct&lt;br&gt;
imports like &lt;code&gt;from airflow.models import DagRun&lt;/code&gt; inside task code will raise errors&lt;br&gt;
in Airflow 3.🚫&lt;/p&gt;
&lt;h3&gt;
  
  
  The JWT Problem (and Why It Caused a 600% CPU Spike)💥
&lt;/h3&gt;

&lt;p&gt;The Execution API authenticates requests with JWT tokens. The scheduler &lt;em&gt;signs&lt;/em&gt; each&lt;br&gt;
task's token; the api-server &lt;em&gt;verifies&lt;/em&gt; it. Both must use the &lt;strong&gt;same secret key&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;In Airflow 3, if &lt;code&gt;AIRFLOW__API_AUTH__JWT_SECRET&lt;/code&gt; is not explicitly set, each service&lt;br&gt;
calls &lt;code&gt;get_signing_key()&lt;/code&gt; and generates a &lt;strong&gt;random in-memory key&lt;/strong&gt;. The scheduler's&lt;br&gt;
random key ≠ the api-server's random key. Every task fails immediately with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Invalid auth token: Signature verification failed
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The fix is one environment variable, shared across all containers:🛠️&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# docker-compose.yml — x-airflow-common environment block&lt;/span&gt;
&lt;span class="na"&gt;AIRFLOW__API_AUTH__JWT_SECRET&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;your-static-secret-change-in-prod"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The 600% CPU spike came from a related issue: the api-server, when launched with&lt;br&gt;
&lt;code&gt;--workers &amp;gt; 1&lt;/code&gt; (uvicorn default), spawns worker processes via&lt;br&gt;
&lt;code&gt;multiprocessing.spawn&lt;/code&gt;. Each spawned process re-initialises its own random JWT key&lt;br&gt;
and immediately crashes when it receives a token signed by the master process. The&lt;br&gt;
crash loop runs at full speed:🏎️&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[api-server] Waiting for child process [12]...
[api-server] Child process [12] died unexpectedly
[api-server] Waiting for child process [13]...
[api-server] Child process [13] died unexpectedly
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Fix: enforce a single worker until this is resolved upstream.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;command&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;api-server --workers &lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  The &lt;code&gt;EXECUTION_API_SERVER_URL&lt;/code&gt; Problem📍
&lt;/h3&gt;

&lt;p&gt;Every scheduler container needs to know where the Execution API lives. The default&lt;br&gt;
is &lt;code&gt;http://localhost:8080/execution/&lt;/code&gt;. In a Docker Compose deployment, &lt;code&gt;localhost&lt;/code&gt;&lt;br&gt;
inside the scheduler container is &lt;em&gt;the scheduler container's own loopback interface&lt;/em&gt;.&lt;br&gt;
The api-server is a different container on a different network namespace.🌐&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Airflow 2: localhost was fine (single process model)
# Airflow 3 Docker: localhost = wrong container
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Result: every task fails with &lt;code&gt;httpx.ConnectError: [Errno 111] Connection refused&lt;/code&gt;,&lt;br&gt;
even when the api-server is perfectly healthy.🛑&lt;/p&gt;

&lt;p&gt;Fix:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;AIRFLOW__CORE__EXECUTION_API_SERVER_URL&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http://airflow-api-server:8080/execution/"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Part 3 — Authentication: FAB Out, SimpleAuthManager In🔐
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Flask-AppBuilder in Airflow 2
&lt;/h3&gt;

&lt;p&gt;Airflow 2 used Flask-AppBuilder (FAB) for authentication. FAB gave you:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Full RBAC with built-in roles (Admin, Op, User, Viewer, Public)&lt;/li&gt;
&lt;li&gt;OAuth integrations (Google, GitHub, LDAP, etc.)&lt;/li&gt;
&lt;li&gt;A complete user management UI&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;_AIRFLOW_WWW_USER_CREATE&lt;/code&gt; environment variable for bootstrapping admin users🛠️
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Airflow 2: works as expected&lt;/span&gt;
&lt;span class="na"&gt;_AIRFLOW_WWW_USER_CREATE&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;true"&lt;/span&gt;
&lt;span class="na"&gt;_AIRFLOW_WWW_USER_USERNAME&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;admin"&lt;/span&gt;
&lt;span class="na"&gt;_AIRFLOW_WWW_USER_PASSWORD&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;admin"&lt;/span&gt;
&lt;span class="na"&gt;_AIRFLOW_WWW_USER_ROLE&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Admin"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  SimpleAuthManager in Airflow 3
&lt;/h3&gt;

&lt;p&gt;Airflow 3 ships &lt;code&gt;SimpleAuthManager&lt;/code&gt; as the default. It stores users and passwords in a plain-text JSON file:📁&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"admin"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"my_secure_password"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;FAB is not gone — it is available as an explicit provider — but it is no longer the default. The &lt;code&gt;_AIRFLOW_WWW_USER_CREATE&lt;/code&gt; variable is silently ignored when &lt;code&gt;SimpleAuthManager&lt;/code&gt; is active. You will see this in your init logs:📝&lt;/p&gt;

&lt;p&gt;&lt;code&gt;Skipping user creation as auth manager different from Fab is used&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;There is no warning that your carefully configured user variables did nothing.⚠️&lt;/p&gt;

&lt;p&gt;To bootstrap a user with SimpleAuthManager in Docker Compose:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Step 1: configure the users list and passwords file location&lt;/span&gt;
&lt;span class="na"&gt;AIRFLOW__CORE__SIMPLE_AUTH_MANAGER_USERS&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;admin:Admin"&lt;/span&gt;
&lt;span class="na"&gt;AIRFLOW__CORE__SIMPLE_AUTH_MANAGER_PASSWORDS_FILE&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/opt/airflow/project/simple_auth_manager_passwords.json"&lt;/span&gt;

&lt;span class="c1"&gt;# Step 2: write the passwords file in your init container&lt;/span&gt;
&lt;span class="na"&gt;command&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;-c&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
    &lt;span class="s"&gt;python3 -c "&lt;/span&gt;
    &lt;span class="s"&gt;import json&lt;/span&gt;
    &lt;span class="s"&gt;open('/opt/airflow/project/simple_auth_manager_passwords.json','w').write(&lt;/span&gt;
        &lt;span class="s"&gt;json.dumps({'admin': 'your_password'})&lt;/span&gt;
    &lt;span class="s"&gt;)"&lt;/span&gt;
    &lt;span class="s"&gt;exec /entrypoint airflow version&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The passwords file must be accessible to all containers — use a shared bind mount.🔗&lt;/p&gt;

&lt;h3&gt;
  
  
  Choosing Between SimpleAuthManager and FAB
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Scenario&lt;/th&gt;
&lt;th&gt;Recommendation&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Local dev / CI / demos&lt;/td&gt;
&lt;td&gt;SimpleAuthManager — fast, zero config&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Small team, basic username/password&lt;/td&gt;
&lt;td&gt;SimpleAuthManager&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Enterprise SSO (LDAP, OAuth, SAML)&lt;/td&gt;
&lt;td&gt;FAB provider (&lt;code&gt;apache-airflow-providers-fab&lt;/code&gt;)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Multi-team RBAC with fine-grained permissions&lt;/td&gt;
&lt;td&gt;FAB provider&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Kubernetes deployments&lt;/td&gt;
&lt;td&gt;FAB provider or custom &lt;code&gt;AuthManager&lt;/code&gt; implementation&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Part 4 — Breaking Changes Catalogue📑
&lt;/h2&gt;

&lt;h3&gt;
  
  
  4.1 SubDAGs → TaskGroups and Assets📦
&lt;/h3&gt;

&lt;p&gt;SubDAGs are removed in Airflow 3. They were always problematic — they introduced deadlock risks with pool management, made the graph view confusing, and performed poorly at scale.📉&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Airflow 2 (SubDAG pattern — do not migrate this verbatim)
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;airflow.operators.subdag&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;SubDagOperator&lt;/span&gt;

&lt;span class="n"&gt;process_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;SubDagOperator&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;task_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;process_data&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;subdag&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;create_subdag&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dag&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dag_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;process_data&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;dag&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;dag&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Airflow 3 migration: TaskGroups for visual grouping
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;airflow.sdk&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;dag&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;task&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;airflow.utils.task_group&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;TaskGroup&lt;/span&gt;

&lt;span class="nd"&gt;@dag&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;schedule&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;my_pipeline&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nc"&gt;TaskGroup&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;process_data&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;process_data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nd"&gt;@task&lt;/span&gt;
        &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;validate&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt; &lt;span class="bp"&gt;...&lt;/span&gt;

        &lt;span class="nd"&gt;@task&lt;/span&gt;
        &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;transform&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt; &lt;span class="bp"&gt;...&lt;/span&gt;

        &lt;span class="nf"&gt;validate&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;transform&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For cross-DAG dependencies that SubDAGs were sometimes used for, the preferred Airflow 3 pattern is &lt;strong&gt;Asset-based scheduling&lt;/strong&gt;:💎&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;airflow.sdk&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Asset&lt;/span&gt;

&lt;span class="n"&gt;raw_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Asset&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;s3://my-bucket/raw/&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nd"&gt;@dag&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;schedule&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;raw_data&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="c1"&gt;# this DAG runs when raw_data is updated
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;downstream_pipeline&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt; &lt;span class="bp"&gt;...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  4.2 SequentialExecutor Removed🚫
&lt;/h3&gt;

&lt;p&gt;SequentialExecutor (runs one task at a time, no parallelism) is gone. The replacement for local development is &lt;code&gt;LocalExecutor&lt;/code&gt; with a PostgreSQL or SQLite backend.🗃️&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Airflow 2: SequentialExecutor was the default for fresh installs
&lt;/span&gt;&lt;span class="n"&gt;AIRFLOW__CORE__EXECUTOR&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;SequentialExecutor&lt;/span&gt;

&lt;span class="c1"&gt;# Airflow 3: use LocalExecutor
&lt;/span&gt;&lt;span class="n"&gt;AIRFLOW__CORE__EXECUTOR&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;LocalExecutor&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Note: &lt;code&gt;LocalExecutor&lt;/code&gt; requires a real database backend (PostgreSQL recommended). SQLite with &lt;code&gt;LocalExecutor&lt;/code&gt; is technically functional but unsupported for production.⚠️&lt;/p&gt;

&lt;h3&gt;
  
  
  4.3 SLA Misses Removed⏰
&lt;/h3&gt;

&lt;p&gt;The SLA miss feature is gone. It was notoriously unreliable — callbacks fired inconsistently depending on scheduler restart timing, and the implementation was tightly coupled to the old execution model.🏚️&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Airflow 2 (no longer works in Airflow 3)
&lt;/span&gt;&lt;span class="nd"&gt;@dag&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;sla_miss_callback&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;my_sla_handler&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;my_dag&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;slow_task&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;PythonOperator&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;task_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;slow_task&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;python_callable&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;run_slow_thing&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;sla&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;timedelta&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;hours&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c1"&gt;# removed
&lt;/span&gt;    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Migration options:🛠️&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Airflow 3.2+&lt;/strong&gt;: Use Deadline Alerts (scheduler-native, much more reliable)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;External monitoring&lt;/strong&gt;: Instrument task duration in your observability stack (Prometheus, Datadog, etc.) and alert from there&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  4.4 REST API v1 Removed → FastAPI v2🔌
&lt;/h3&gt;

&lt;p&gt;The REST API v1 (Flask-based, under &lt;code&gt;/api/v1/&lt;/code&gt;) is completely removed. Airflow 3 ships a stable, FastAPI-backed REST API under &lt;code&gt;/api/v2/&lt;/code&gt;.🚀&lt;/p&gt;

&lt;p&gt;The v2 API is not backward-compatible. Common breakage points:🧨&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# v1 endpoint (broken in Airflow 3)
GET /api/v1/dags/{dag_id}/dagRuns

# v2 endpoint (Airflow 3)
GET /api/v2/dags/{dag_id}/dagRuns
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Beyond the URL prefix change, the response schemas have also changed. Any custom integrations, CI scripts, or tooling that hit the Airflow API directly will require updates.🛠️&lt;/p&gt;

&lt;p&gt;The new health endpoint is:🩺&lt;br&gt;
&lt;code&gt;GET /api/v2/monitor/health&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"metadatabase"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"status"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"healthy"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"scheduler"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"status"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"healthy"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"triggerer"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"status"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"healthy"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"dag_processor"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"status"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"healthy"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Note that &lt;code&gt;dag_processor&lt;/code&gt; is a new key — it did not exist in Airflow 2 health responses.📝&lt;/p&gt;

&lt;h3&gt;
  
  
  4.5 Removed Context Variables🏷️
&lt;/h3&gt;

&lt;p&gt;Several context variables that were available in &lt;code&gt;TaskInstance.context&lt;/code&gt; are removed:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# These no longer exist in Airflow 3 task context
&lt;/span&gt;&lt;span class="n"&gt;execution_date&lt;/span&gt;      &lt;span class="c1"&gt;# use logical_date
&lt;/span&gt;&lt;span class="n"&gt;tomorrow_ds&lt;/span&gt;         &lt;span class="c1"&gt;# compute manually
&lt;/span&gt;&lt;span class="n"&gt;yesterday_ds&lt;/span&gt;        &lt;span class="c1"&gt;# compute manually
&lt;/span&gt;&lt;span class="n"&gt;prev_ds&lt;/span&gt;             &lt;span class="c1"&gt;# compute manually
&lt;/span&gt;&lt;span class="n"&gt;prev_execution_date&lt;/span&gt; &lt;span class="c1"&gt;# removed
&lt;/span&gt;&lt;span class="n"&gt;next_execution_date&lt;/span&gt; &lt;span class="c1"&gt;# removed
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;execution_date&lt;/code&gt; rename to &lt;code&gt;logical_date&lt;/code&gt; reflects a deeper semantic change: in Airflow 3, &lt;code&gt;logical_date&lt;/code&gt; represents &lt;code&gt;run_after&lt;/code&gt; (when the DAG should run) rather than &lt;code&gt;data_interval_start&lt;/code&gt; (the start of the data window). For event-driven and manual DAGs, this distinction matters.🧐&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Airflow 2
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;my_task&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;run_date&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;execution_date&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="c1"&gt;# deprecated
&lt;/span&gt;
&lt;span class="c1"&gt;# Airflow 3
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;my_task&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;run_date&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;logical_date&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="c1"&gt;# correct
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  4.6 XCom Pickling Disabled🥒
&lt;/h3&gt;

&lt;p&gt;XCom pickling is disabled by default in Airflow 3. In Airflow 2, Python objects were serialized via &lt;code&gt;pickle&lt;/code&gt; and stored in the metadata database. This allowed arbitrary Python objects to flow between tasks but introduced security risks (arbitrary code execution on deserialization) and size limits.🛡️&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Airflow 2: this worked silently
&lt;/span&gt;&lt;span class="nd"&gt;@task&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;extract&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;data&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;some_sklearn_model&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="c1"&gt;# pickled into XCom
&lt;/span&gt;
&lt;span class="c1"&gt;# Airflow 3: raises an error with default XCom backend
# Use JSON-serializable return values or a custom XCom backend
&lt;/span&gt;&lt;span class="nd"&gt;@task&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;extract&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rows&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;path&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;s3://bucket/output.parquet&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="c1"&gt;# safe
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For large artifacts (models, DataFrames), the recommended pattern is to write to external storage (S3, GCS, local filesystem) and pass only the path as XCom.💾&lt;/p&gt;




&lt;h2&gt;
  
  
  Part 5 — What's New in Airflow 3✨
&lt;/h2&gt;

&lt;h3&gt;
  
  
  5.1 The &lt;code&gt;airflow.sdk&lt;/code&gt; Namespace🏗️
&lt;/h3&gt;

&lt;p&gt;Airflow 3 ships a stable, versioned Task SDK. All DAG authoring primitives now live under &lt;code&gt;airflow.sdk&lt;/code&gt;:📦&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Airflow 2 import paths (still work in early Airflow 3, will be removed)
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;airflow.decorators&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;dag&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;task&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;airflow.models.dag&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;DAG&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;airflow.sensors.base&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BaseSensorOperator&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;airflow.datasets&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Dataset&lt;/span&gt;

&lt;span class="c1"&gt;# Airflow 3 canonical imports
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;airflow.sdk&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;dag&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;DAG&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Asset&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;airflow.sdk&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BaseSensorOperator&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The SDK is designed to have a stable interface across minor versions. The intent is that DAGs written against &lt;code&gt;airflow.sdk&lt;/code&gt; should be forward-compatible with future Airflow releases without import-path churn.🚀&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Important for Docker deployments&lt;/strong&gt;: The &lt;code&gt;airflow.sdk&lt;/code&gt; import chain triggers a connection attempt to the Task Execution API at import time. If the api-server is unavailable or CPU-starved, the dag-processor will hang on this import and eventually be SIGKILL'd by its own parse timeout. Fix the api-server first; everything else follows.🚨&lt;/p&gt;

&lt;h3&gt;
  
  
  5.2 DAG Versioning (AIP-66)📑
&lt;/h3&gt;

&lt;p&gt;Airflow 3 introduces first-class DAG versioning. Multiple versions of the same DAG can exist simultaneously in the &lt;code&gt;serialized_dag&lt;/code&gt; table, and running DagRuns execute against the DAG version they were triggered with — not the latest version.🕰️&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;dag_id: "healthcare_retrain"
├── version 1: train → validate (runs triggered before 2026-04-10)
└── version 2: load_data → train → validate (runs triggered after 2026-04-10)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This solves a long-standing pain point: in Airflow 2, modifying a DAG while runs were in-flight could corrupt active DagRuns if the task structure changed.✅&lt;/p&gt;

&lt;h3&gt;
  
  
  5.3 Asset-Based Scheduling (AIP-74, AIP-75)💎
&lt;/h3&gt;

&lt;p&gt;The Airflow 2 &lt;code&gt;Dataset&lt;/code&gt; concept has been renamed to &lt;code&gt;Asset&lt;/code&gt; and significantly expanded. Assets replace cron-based scheduling for data-driven pipelines:🔄&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;airflow.sdk&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;dag&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Asset&lt;/span&gt;

&lt;span class="c1"&gt;# Producer DAG
&lt;/span&gt;&lt;span class="n"&gt;raw_asset&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Asset&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;s3://my-datalake/raw/events.parquet&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nd"&gt;@dag&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;schedule&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;@hourly&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;ingest_events&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="nd"&gt;@task&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;outlets&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;raw_asset&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;fetch_and_write&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
        &lt;span class="c1"&gt;# ... write to S3
&lt;/span&gt;        &lt;span class="k"&gt;pass&lt;/span&gt;
    &lt;span class="nf"&gt;fetch_and_write&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# Consumer DAG — runs when raw_asset is updated, not on a clock
&lt;/span&gt;&lt;span class="nd"&gt;@dag&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;schedule&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;raw_asset&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;process_events&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="nd"&gt;@task&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;transform&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
        &lt;span class="k"&gt;pass&lt;/span&gt;
    &lt;span class="nf"&gt;transform&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Assets enable a &lt;strong&gt;push-driven&lt;/strong&gt; scheduling model where downstream DAGs run when their data dependencies are satisfied, not when a clock fires.🌊&lt;/p&gt;

&lt;h3&gt;
  
  
  5.4 Edge Executor (AIP-69)🌐
&lt;/h3&gt;

&lt;p&gt;The Edge Executor allows Airflow tasks to run on lightweight remote workers without CeleryExecutor's operational overhead. Workers register with the api-server via HTTP polling and execute tasks locally, making it viable for:🦾&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;IoT / edge compute deployments&lt;/li&gt;
&lt;li&gt;Low-resource VMs that can not run a Celery broker&lt;/li&gt;
&lt;li&gt;Multi-cloud task distribution without VPN tunnels
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# airflow.cfg / env var&lt;/span&gt;
&lt;span class="na"&gt;AIRFLOW__CORE__EXECUTOR&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;EdgeExecutor&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  5.5 Scheduler-Managed Backfills (AIP-78)🔙
&lt;/h3&gt;

&lt;p&gt;Backfills in Airflow 2 were CLI-driven one-shot operations. Airflow 3 makes backfills first-class scheduler concepts:🗓️&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Airflow 3: create a scheduler-managed backfill&lt;/span&gt;
airflow dags backfill create &lt;span class="nt"&gt;--dag-id&lt;/span&gt; my_dag &lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="nt"&gt;--from-date&lt;/span&gt; 2024-01-01 &lt;span class="nt"&gt;--to-date&lt;/span&gt; 2024-12-31

&lt;span class="c"&gt;# Inspect backfill state&lt;/span&gt;
airflow dags backfill list &lt;span class="nt"&gt;--dag-id&lt;/span&gt; my_dag
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Scheduler-managed backfills respect pool limits, run in parallel with live DagRuns, and are visible in the UI — eliminating the "backfill is a black box" experience from Airflow 2.🖤&lt;/p&gt;

&lt;h3&gt;
  
  
  5.6 React UI (AIP-38, AIP-84)🎨
&lt;/h3&gt;

&lt;p&gt;The Airflow 3 UI is a full rewrite in React, backed by the FastAPI REST API v2. Practical implications:🖱️&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Significantly faster rendering for DAGs with hundreds of tasks⚡&lt;/li&gt;
&lt;li&gt;Grid view replaces the old Tree view as the primary timeline view📊&lt;/li&gt;
&lt;li&gt;The legacy Graph view (force-directed) is replaced with a cleaner task-level dependency graph🔗&lt;/li&gt;
&lt;li&gt;The UI now works correctly in all modern browsers without Flask session issues🌐&lt;/li&gt;
&lt;li&gt;Dark mode is available natively🌙&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Part 6 — Import Path Migration Guide🗺️
&lt;/h2&gt;

&lt;p&gt;This is the table you want bookmarked during a migration:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Airflow 2 import&lt;/th&gt;
&lt;th&gt;Airflow 3 import&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;from airflow.decorators import dag, task&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;from airflow.sdk import dag, task&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;from airflow.models.dag import DAG&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;from airflow.sdk import DAG&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;from airflow.sensors.base import BaseSensorOperator&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;from airflow.sdk import BaseSensorOperator&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;from airflow.datasets import Dataset&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;from airflow.sdk import Asset&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;from airflow.models import Variable&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;from airflow.sdk import Variable&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;from airflow.models import Connection&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;from airflow.sdk import Connection&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;from airflow.operators.python import PythonOperator&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;apache-airflow-providers-standard&lt;/code&gt; package&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;from airflow.operators.bash import BashOperator&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;apache-airflow-providers-standard&lt;/code&gt; package&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;from airflow.sensors.filesystem import FileSensor&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;apache-airflow-providers-standard&lt;/code&gt; package&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Many common operators (Python, Bash, File sensors) have moved to &lt;code&gt;apache-airflow-providers-standard&lt;/code&gt;. Install this package explicitly:🛠️&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;apache-airflow-providers-standard
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Automated Migration with Ruff🐶
&lt;/h3&gt;

&lt;p&gt;Airflow 3 ships with &lt;a href="https://docs.astral.sh/ruff/" rel="noopener noreferrer"&gt;Ruff&lt;/a&gt; lint rules specifically for migration:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="s2"&gt;"ruff&amp;gt;=0.13.1"&lt;/span&gt;

&lt;span class="c"&gt;# Check for mandatory breaking changes (AIR301)&lt;/span&gt;
ruff check dags/ &lt;span class="nt"&gt;--select&lt;/span&gt; AIR301 &lt;span class="nt"&gt;--preview&lt;/span&gt;

&lt;span class="c"&gt;# Auto-fix safe renames&lt;/span&gt;
ruff check dags/ &lt;span class="nt"&gt;--select&lt;/span&gt; AIR301 &lt;span class="nt"&gt;--fix&lt;/span&gt; &lt;span class="nt"&gt;--unsafe-fixes&lt;/span&gt; &lt;span class="nt"&gt;--preview&lt;/span&gt;

&lt;span class="c"&gt;# Check for recommended updates (AIR302: deprecated-but-not-yet-removed)&lt;/span&gt;
ruff check dags/ &lt;span class="nt"&gt;--select&lt;/span&gt; AIR302 &lt;span class="nt"&gt;--preview&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Example output:📝&lt;br&gt;
&lt;code&gt;dags/retrain_dag.py:3:1: AIR301 airflow.decorators.dag is removed in Airflow 3.0. Use airflow.sdk.dag instead.&lt;/code&gt;&lt;br&gt;
&lt;code&gt;[*] AIR301 auto-fix available&lt;/code&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Part 7 — Docker Compose: What Breaks, What to Add🐳
&lt;/h2&gt;

&lt;p&gt;If you are running Airflow 2 via Docker Compose, here is a precise list of changes required for Airflow 3.&lt;/p&gt;

&lt;h3&gt;
  
  
  Services to Add➕
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;airflow-dag-processor&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="nv"&gt;*airflow-common&lt;/span&gt;
  &lt;span class="na"&gt;command&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;dag-processor&lt;/span&gt;
  &lt;span class="na"&gt;healthcheck&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;test&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CMD"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;airflow"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;jobs"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;check"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;--job-type"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;DagProcessorJob"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;--local"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
    &lt;span class="na"&gt;interval&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;60s&lt;/span&gt;
    &lt;span class="na"&gt;timeout&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;60s&lt;/span&gt;
    &lt;span class="na"&gt;retries&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;3&lt;/span&gt;
    &lt;span class="na"&gt;start_period&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;300s&lt;/span&gt;
  &lt;span class="na"&gt;restart&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;always&lt;/span&gt;
  &lt;span class="na"&gt;depends_on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;airflow-init&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;condition&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;service_completed_successfully&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Services to Rename✏️
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Airflow 2&lt;/span&gt;
&lt;span class="na"&gt;airflow-webserver&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;command&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;webserver&lt;/span&gt;
  &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;8080:8080"&lt;/span&gt;

&lt;span class="c1"&gt;# Airflow 3&lt;/span&gt;
&lt;span class="na"&gt;airflow-api-server&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;command&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;api-server --workers &lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt; &lt;span class="c1"&gt;# --workers 1 is critical (see Part 2)&lt;/span&gt;
  &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;8080:8080"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Environment Variables to Add🌍
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;x-airflow-common&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="nl"&gt;&amp;amp;airflow-common&lt;/span&gt;
  &lt;span class="na"&gt;environment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# Critical: prevents Connection refused in scheduler&lt;/span&gt;
    &lt;span class="na"&gt;AIRFLOW__CORE__EXECUTION_API_SERVER_URL&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http://airflow-api-server:8080/execution/"&lt;/span&gt;

    &lt;span class="c1"&gt;# Critical: prevents JWT Signature verification failed&lt;/span&gt;
    &lt;span class="na"&gt;AIRFLOW__API_AUTH__JWT_SECRET&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;change-this-in-production"&lt;/span&gt;

    &lt;span class="c1"&gt;# Required for SimpleAuthManager user configuration&lt;/span&gt;
    &lt;span class="na"&gt;AIRFLOW__CORE__SIMPLE_AUTH_MANAGER_USERS&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;admin:Admin"&lt;/span&gt;
    &lt;span class="na"&gt;AIRFLOW__CORE__SIMPLE_AUTH_MANAGER_PASSWORDS_FILE&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/opt/airflow/project/simple_auth_manager_passwords.json"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Healthcheck Changes🩺
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Airflow 2 scheduler healthcheck (port 8974 no longer exists in Airflow 3)&lt;/span&gt;
&lt;span class="na"&gt;healthcheck&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;test&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CMD"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;curl"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;--fail"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http://localhost:8974/health"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;

&lt;span class="c1"&gt;# Airflow 3 scheduler healthcheck&lt;/span&gt;
&lt;span class="na"&gt;healthcheck&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;test&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CMD"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;airflow"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;jobs"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;check"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;--job-type"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SchedulerJob"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;--local"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
  &lt;span class="na"&gt;interval&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;60s&lt;/span&gt;
  &lt;span class="na"&gt;timeout&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;60s&lt;/span&gt; &lt;span class="c1"&gt;# airflow jobs check takes ~42s (full Python + DB round-trip)&lt;/span&gt;
  &lt;span class="na"&gt;retries&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;3&lt;/span&gt;
  &lt;span class="na"&gt;start_period&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;300s&lt;/span&gt; &lt;span class="c1"&gt;# covers pip install time on first start&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;--local&lt;/code&gt; flag is essential. &lt;code&gt;--hostname $(hostname)&lt;/code&gt; compares the container's &lt;code&gt;$HOSTNAME&lt;/code&gt; env var against the hostname Airflow registered in the database — these often differ (&lt;code&gt;9811c4ea8dec&lt;/code&gt; vs &lt;code&gt;airflow-scheduler.internal&lt;/code&gt;), causing perpetual &lt;em&gt;unhealthy&lt;/em&gt; status even when the service is running correctly.🔍&lt;/p&gt;




&lt;h2&gt;
  
  
  Part 8 — Configuration Migration⚙️
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Changed Defaults That Will Surprise You😮
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ini"&gt;&lt;code&gt;&lt;span class="err"&gt;catchup_by_default:&lt;/span&gt; &lt;span class="err"&gt;was&lt;/span&gt; &lt;span class="err"&gt;True&lt;/span&gt; &lt;span class="err"&gt;in&lt;/span&gt; &lt;span class="err"&gt;Airflow&lt;/span&gt; &lt;span class="err"&gt;2,&lt;/span&gt; &lt;span class="err"&gt;False&lt;/span&gt; &lt;span class="err"&gt;in&lt;/span&gt; &lt;span class="err"&gt;Airflow&lt;/span&gt; &lt;span class="err"&gt;3&lt;/span&gt;
&lt;span class="c"&gt;# If you have DAGs with start_date in the past and no explicit catchup=True,
# they will NOT backfill on first deploy — this is usually what you want,
# but verify before deploying
&lt;/span&gt;&lt;span class="nn"&gt;[scheduler]&lt;/span&gt;
&lt;span class="py"&gt;catchup_by_default&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;False # Airflow 3 default&lt;/span&gt;

&lt;span class="c"&gt;# Default schedule: was @daily implicit in some contexts, now None
# DAGs with no schedule parameter will not run automatically
&lt;/span&gt;&lt;span class="nn"&gt;[scheduler]&lt;/span&gt;
&lt;span class="c"&gt;# Use schedule=None explicitly if that's your intent
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Renamed Configuration Keys✏️
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ini"&gt;&lt;code&gt;&lt;span class="c"&gt;# Airflow 2 → Airflow 3 config key mapping
&lt;/span&gt;&lt;span class="nn"&gt;[webserver]&lt;/span&gt;
&lt;span class="py"&gt;web_server_host&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;0.0.0.0 → [api]&lt;/span&gt;
&lt;span class="py"&gt;host&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;0.0.0.0&lt;/span&gt;

&lt;span class="nn"&gt;[webserver]&lt;/span&gt;
&lt;span class="py"&gt;error_logfile&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;... → REMOVED (no replacement)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Automated Config Migration🛠️
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Check your airflow.cfg for deprecated/invalid keys&lt;/span&gt;
airflow config lint

&lt;span class="c"&gt;# Apply automatic fixes&lt;/span&gt;
airflow config update &lt;span class="nt"&gt;--fix&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Part 9 — Migration Path🛤️
&lt;/h2&gt;

&lt;p&gt;If you are upgrading a production Airflow 2 deployment, follow this sequence:&lt;/p&gt;

&lt;h3&gt;
  
  
  Phase 1 — Prepare (Still on Airflow 2)🏗️
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Upgrade to Airflow 2.7+&lt;/strong&gt; — the schema migration from earlier versions significantly increases &lt;code&gt;airflow db migrate&lt;/code&gt; time; get that done first.⏳&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Clean the metadata database&lt;/strong&gt; — &lt;code&gt;airflow db clean&lt;/code&gt; removes old DagRun/TaskInstance records and dramatically speeds up the schema migration.🧹&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Run Ruff AIR301 checks&lt;/strong&gt; — &lt;code&gt;ruff check dags/ --select AIR301 --preview&lt;/code&gt;.🐶&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fix all deprecation warnings&lt;/strong&gt; — zero warnings in Airflow 2.9 means fewer surprises in Airflow 3.⚠️&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Audit direct database access&lt;/strong&gt; — grep your task code for &lt;code&gt;from airflow.models&lt;/code&gt; imports; these will break.🔍
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Find tasks using direct metadata DB access&lt;/span&gt;
&lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; &lt;span class="s2"&gt;"from airflow.models"&lt;/span&gt; dags/ &lt;span class="nt"&gt;--include&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"*.py"&lt;/span&gt;
&lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; &lt;span class="s2"&gt;"settings.Session"&lt;/span&gt; dags/ &lt;span class="nt"&gt;--include&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"*.py"&lt;/span&gt;
&lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; &lt;span class="s2"&gt;"DagRun|TaskInstance|Variable"&lt;/span&gt; dags/ &lt;span class="nt"&gt;--include&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"*.py"&lt;/span&gt; | &lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="s2"&gt;"import"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Phase 2 — Upgrade⬆️
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Back up your metadata database&lt;/strong&gt; — non-negotiable.💾&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Update your Docker image&lt;/strong&gt; to &lt;code&gt;apache/airflow:3.0.0&lt;/code&gt;.🐳&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Add dag-processor service&lt;/strong&gt; to your Compose/Kubernetes manifests.🧩&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Rename webserver → api-server&lt;/strong&gt; in service definitions.✏️&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Set the three critical env vars&lt;/strong&gt;:🌍

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;AIRFLOW__CORE__EXECUTION_API_SERVER_URL&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;AIRFLOW__API_AUTH__JWT_SECRET&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;SimpleAuthManager password config&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Run &lt;code&gt;airflow db migrate&lt;/code&gt;&lt;/strong&gt;.🔄&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Update all import paths&lt;/strong&gt; (use Ruff auto-fix first, then manual review).🛠️&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Update healthchecks&lt;/strong&gt; to &lt;code&gt;airflow jobs check --local&lt;/code&gt;.🩺&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Phase 3 — Validate✅
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Check all services healthy&lt;/span&gt;
curl http://localhost:8080/api/v2/monitor/health | python3 &lt;span class="nt"&gt;-m&lt;/span&gt; json.tool

&lt;span class="c"&gt;# Expected output&lt;/span&gt;
&lt;span class="o"&gt;{&lt;/span&gt;
  &lt;span class="s2"&gt;"metadatabase"&lt;/span&gt;: &lt;span class="o"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;"status"&lt;/span&gt;: &lt;span class="s2"&gt;"healthy"&lt;/span&gt;&lt;span class="o"&gt;}&lt;/span&gt;,
  &lt;span class="s2"&gt;"scheduler"&lt;/span&gt;: &lt;span class="o"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;"status"&lt;/span&gt;: &lt;span class="s2"&gt;"healthy"&lt;/span&gt;&lt;span class="o"&gt;}&lt;/span&gt;,
  &lt;span class="s2"&gt;"triggerer"&lt;/span&gt;: &lt;span class="o"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;"status"&lt;/span&gt;: &lt;span class="s2"&gt;"healthy"&lt;/span&gt;&lt;span class="o"&gt;}&lt;/span&gt;,
  &lt;span class="s2"&gt;"dag_processor"&lt;/span&gt;: &lt;span class="o"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;"status"&lt;/span&gt;: &lt;span class="s2"&gt;"healthy"&lt;/span&gt;&lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;

&lt;span class="c"&gt;# Trigger a test DAG&lt;/span&gt;
airflow dags trigger your_test_dag

&lt;span class="c"&gt;# Check task state&lt;/span&gt;
airflow tasks states-for-dag-run your_test_dag &amp;lt;run_id&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Part 10 — Should You Upgrade?🤔
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Upgrade Now If:🚀
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;You are &lt;strong&gt;starting a new project&lt;/strong&gt; — there is no reason to build on Airflow 2.✨&lt;/li&gt;
&lt;li&gt;You have &lt;strong&gt;simple DAGs&lt;/strong&gt; (PythonOperator, BashOperator, standard providers) — the migration is mostly find-and-replace on import paths.🛠️&lt;/li&gt;
&lt;li&gt;You want &lt;strong&gt;DAG versioning&lt;/strong&gt; — this solves real operational pain.🕰️&lt;/li&gt;
&lt;li&gt;You are running on &lt;strong&gt;Kubernetes&lt;/strong&gt; — the separation of concerns maps cleanly to individual pod scaling.🏗️&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Wait If:🛑
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;You depend heavily on &lt;strong&gt;FAB's OAuth/LDAP integrations&lt;/strong&gt; and have not tested the FAB provider on Airflow 3.🔐&lt;/li&gt;
&lt;li&gt;You have &lt;strong&gt;extensive SLA miss callback logic&lt;/strong&gt; and no monitoring alternative ready.⏰&lt;/li&gt;
&lt;li&gt;Your codebase has &lt;strong&gt;heavy direct metadata database access&lt;/strong&gt; in task code — refactoring that to the Python Client is non-trivial.🗃️&lt;/li&gt;
&lt;li&gt;You use &lt;strong&gt;CeleryKubernetesExecutor or LocalKubernetesExecutor&lt;/strong&gt; — both are removed; you need to evaluate the Multiple Executor Configuration feature instead.🧩&lt;/li&gt;
&lt;li&gt;You have &lt;strong&gt;custom Flask-AppBuilder views or blueprints&lt;/strong&gt; — these require porting to FastAPI.🎨&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  The Honest Assessment⚖️
&lt;/h3&gt;

&lt;p&gt;Airflow 3 is the version the project should have been architecturally from the beginning. The separation of the dag-processor, the Task Execution API, and the prohibition on direct metadata access are the right engineering decisions. They make Airflow significantly more secure, more scalable, and more maintainable at the cost of a one-time migration investment.🦾&lt;/p&gt;

&lt;p&gt;The upgrade complexity is proportional to how much your codebase relied on Airflow 2's leaky abstractions: direct database access, FAB internals, SLA callbacks, and SubDAGs. If you followed Airflow 2 best practices (TaskFlow API, provider operators, no direct DB access), the migration is a half-day of import path updates and Docker Compose additions.🛠️&lt;/p&gt;

&lt;p&gt;If you did not, this upgrade is the forcing function to do it properly.🚀&lt;/p&gt;




&lt;h2&gt;
  
  
  Conclusion🏁
&lt;/h2&gt;

&lt;p&gt;The jump from Airflow 2 to Airflow 3 is the most significant change in the project's history. The webserver is gone. The scheduler no longer parses DAGs. Tasks no longer touch the metadata database. The JWT-authenticated Execution API connects them all.🔗&lt;/p&gt;

&lt;p&gt;Each of these changes surfaces as a concrete failure mode in the first deployment: CPU spikes from JWT key divergence, &lt;code&gt;Connection refused&lt;/code&gt; from wrong service URLs, silent healthcheck failures from removed ports, and silently no-op user creation from a replaced auth manager.🧨&lt;/p&gt;

&lt;p&gt;Understanding the &lt;em&gt;why&lt;/em&gt; behind the architecture — isolation, security, scalability — converts each failure from mysterious to obvious. The fixes are not workarounds; they are the intended configuration patterns for a distributed, multi-service orchestration system.🧠&lt;/p&gt;

&lt;p&gt;Airflow 3 is what a modern data orchestrator should look like. Migrate when you are ready, migrate properly, and you will not look back.🚀&lt;/p&gt;




&lt;h2&gt;
  
  
  Resources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://airflow.apache.org/docs/apache-airflow/stable/release_notes.html" rel="noopener noreferrer"&gt;Apache Airflow 3.0 Release Notes&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://airflow.apache.org/docs/apache-airflow/stable/installation/upgrading_to_airflow3.html" rel="noopener noreferrer"&gt;Official Upgrade to Airflow 3 Guide&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.astronomer.io/docs/learn/airflow-upgrade-2-3" rel="noopener noreferrer"&gt;Astronomer: Upgrading from Airflow 2 to 3&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-72+Task+Execution+Interface" rel="noopener noreferrer"&gt;AIP-72: Task Execution Interface&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-66+DAG+Versioning" rel="noopener noreferrer"&gt;AIP-66: DAG Versioning&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.astral.sh/ruff/rules/#airflow-air" rel="noopener noreferrer"&gt;Ruff AIR linting rules&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Written from direct production experience migrating a healthcare ML retraining pipeline from Airflow 2 patterns to Airflow 3.0.0 on Docker Compose, April 2026 📝.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>airflow</category>
      <category>dataengineering</category>
      <category>python</category>
      <category>docker</category>
    </item>
    <item>
      <title>⚡ High-Performance Warehousing: Partitioning &amp; Clustering</title>
      <dc:creator>De' Clerke</dc:creator>
      <pubDate>Wed, 04 Feb 2026 16:52:03 +0000</pubDate>
      <link>https://forem.com/de_clerke/high-performance-warehousing-partitioning-clustering-4om3</link>
      <guid>https://forem.com/de_clerke/high-performance-warehousing-partitioning-clustering-4om3</guid>
      <description>&lt;p&gt;In my previous posts, we discussed how to structure a Data Warehouse. But as your data grows from thousands to billions of rows, even a perfect Star Schema can become slow. To keep queries lightning-fast, we use two primary optimization techniques: &lt;strong&gt;Partitioning&lt;/strong&gt; and &lt;strong&gt;Clustering&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  1. Partitioning: Divide and Conquer
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Partitioning&lt;/strong&gt; is a technique that divides large tables into smaller, more manageable segments based on a specific column, like a date or a region.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Analogy
&lt;/h3&gt;

&lt;p&gt;Imagine a library with millions of exam papers. If they are all in one giant pile, finding a specific paper is impossible. But if you divide them into separate boxes by &lt;strong&gt;Subject&lt;/strong&gt;, you only need to search the "Math" box to find a math paper.&lt;/p&gt;

&lt;h3&gt;
  
  
  Strategies
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Horizontal Partitioning&lt;/strong&gt;: Divides tables based on row values (e.g., separating sales by month).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Vertical Partitioning&lt;/strong&gt;: Divides tables based on columns, separating frequently accessed data from rarely used or sensitive information (like moving Social Security Numbers to a separate, restricted segment).&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Why use it?
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Query Performance&lt;/strong&gt;: The database engine only scans the relevant partitions, which significantly reduces I/O operations.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Maintenance Efficiency&lt;/strong&gt;: You can back up or archive specific partitions without touching the entire table.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  2. Clustering: Keeping Neighbors Close
&lt;/h2&gt;

&lt;p&gt;While partitioning splits data into "boxes," &lt;strong&gt;Clustering&lt;/strong&gt; organizes how the data is physically stored on the disk within those boxes.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Analogy
&lt;/h3&gt;

&lt;p&gt;Think of a library again. Inside the "History" box, you group books by &lt;strong&gt;Author&lt;/strong&gt;. If a student wants all books by a specific author, they are all sitting right next to each other on the shelf, so the student doesn't have to walk back and forth.&lt;/p&gt;

&lt;h3&gt;
  
  
  Benefits
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;I/O Reduction&lt;/strong&gt;: Related records are read in a single disk operation.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cache Efficiency&lt;/strong&gt;: Accessing one record automatically brings its "neighbors" into the cache.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Compression&lt;/strong&gt;: Similar values cluster together, which allows the database to compress the data more effectively.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Partitioning vs. Clustering: Which when?
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;Partitioning&lt;/th&gt;
&lt;th&gt;Clustering&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Logic&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Logical division into segments&lt;/td&gt;
&lt;td&gt;Physical organization on disk&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Common Use&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Date, Year, or Region&lt;/td&gt;
&lt;td&gt;ID, Category, or frequent filter keys&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Impact&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Great for "skipping" huge amounts of data&lt;/td&gt;
&lt;td&gt;Great for speeding up searches within a dataset&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Summary and Key Takeaways
&lt;/h2&gt;

&lt;p&gt;Building a successful data warehouse isn't just about storing data; it's about making it accessible. &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;OLTP vs OLAP&lt;/strong&gt;: Separate your "doing" from your "thinking".&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Star vs Snowflake&lt;/strong&gt;: Choose a schema that balances speed and storage.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Partitioning &amp;amp; Clustering&lt;/strong&gt;: Use these to ensure your warehouse scales as your business grows.&lt;/li&gt;
&lt;/ul&gt;




</description>
      <category>partitioning</category>
      <category>clustering</category>
      <category>datawarehouse</category>
      <category>database</category>
    </item>
    <item>
      <title>⭐ Star vs. ❄️ Snowflake: Designing the Data Warehouse</title>
      <dc:creator>De' Clerke</dc:creator>
      <pubDate>Wed, 04 Feb 2026 16:33:02 +0000</pubDate>
      <link>https://forem.com/de_clerke/star-vs-snowflake-designing-the-data-warehouse-22ad</link>
      <guid>https://forem.com/de_clerke/star-vs-snowflake-designing-the-data-warehouse-22ad</guid>
      <description>&lt;p&gt;In my previous post, we explored why we use OLAP systems (Data Warehouses) for analytics. But once you have a warehouse, how do you organize the data inside it? This is where &lt;strong&gt;Data Modeling&lt;/strong&gt; comes in.&lt;/p&gt;

&lt;p&gt;To make data easy to query, we use &lt;strong&gt;Dimensional Modeling&lt;/strong&gt;, which organizes data into two types of tables: &lt;strong&gt;Facts&lt;/strong&gt; and &lt;strong&gt;Dimensions&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Building Blocks: Facts &amp;amp; Dimensions
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Fact Tables
&lt;/h3&gt;

&lt;p&gt;These are the central repositories for measurable business metrics. &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;What they store&lt;/strong&gt;: Quantitative measurements (facts) like sales amounts, quantities, or durations.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Structure&lt;/strong&gt;: Usually the largest tables, containing foreign keys that link to related dimension tables.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Example&lt;/strong&gt;: A &lt;code&gt;Sales_Fact&lt;/code&gt; table containing &lt;code&gt;revenue&lt;/code&gt;, &lt;code&gt;quantity&lt;/code&gt;, and &lt;code&gt;discount&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2. Dimension Tables
&lt;/h3&gt;

&lt;p&gt;These provide the descriptive context that makes fact table measurements meaningful.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;What they store&lt;/strong&gt;: Attributes used for filtering and grouping, such as product names, customer demographics, or dates.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Structure&lt;/strong&gt;: Typically smaller in terms of row count and often denormalized for speed.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  The Star Schema
&lt;/h2&gt;

&lt;p&gt;The &lt;strong&gt;Star Schema&lt;/strong&gt; is the most fundamental and widely used pattern. It looks like a star because the central fact table is surrounded by a single layer of dimension tables.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why use a Star Schema?
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Query Simplicity&lt;/strong&gt;: It requires fewer joins, making it easier for business users to understand and query.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Performance&lt;/strong&gt;: Because there are fewer joins, queries generally execute faster.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tool Compatibility&lt;/strong&gt;: Most BI tools (like Tableau or Power BI) are optimized for this structure.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs2qptgwqw1k186sbvs5j.jpg" alt=" " width="798" height="518"&gt;
&lt;/h2&gt;

&lt;h2&gt;
  
  
  The Snowflake Schema
&lt;/h2&gt;

&lt;p&gt;The &lt;strong&gt;Snowflake Schema&lt;/strong&gt; is an extension of the star schema. In this model, the dimension tables are &lt;strong&gt;normalized&lt;/strong&gt; into multiple related tables.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why use a Snowflake Schema?
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Storage Efficiency&lt;/strong&gt;: Normalization reduces data redundancy, which is helpful if your dimension tables are massive.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Data Integrity&lt;/strong&gt;: It reduces the risk of inconsistencies because attributes are updated in only one place.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Maintenance&lt;/strong&gt;: Changes to hierarchical data (like a product category) are easier to manage.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh8hkar89bxbcyq08zz0c.png" alt=" " width="800" height="565"&gt;
&lt;/h2&gt;

&lt;h2&gt;
  
  
  Side-by-Side Comparison
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;Star Schema&lt;/th&gt;
&lt;th&gt;Snowflake Schema&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Complexity&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Simple (1 join per dimension)&lt;/td&gt;
&lt;td&gt;Complex (multiple joins)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Data Redundancy&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Higher (Denormalized)&lt;/td&gt;
&lt;td&gt;Lower (Normalized)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Query Speed&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Generally faster&lt;/td&gt;
&lt;td&gt;Potentially slower due to joins&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;User Experience&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Intuitive for business users&lt;/td&gt;
&lt;td&gt;Less intuitive&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Which one should you choose?
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Choose Star Schema&lt;/strong&gt; if you prioritize query speed and want to make it easy for non-technical users to build their own reports.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Choose Snowflake Schema&lt;/strong&gt; if you have very large dimension tables where storage costs are a concern or if you need to strictly enforce data integrity.&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Summary:&lt;/strong&gt; The Star Schema is built for &lt;strong&gt;speed and simplicity&lt;/strong&gt;, while the Snowflake Schema is built for &lt;strong&gt;storage efficiency and organization&lt;/strong&gt;.&lt;/p&gt;
&lt;/blockquote&gt;




</description>
      <category>datamodeling</category>
      <category>starchema</category>
      <category>snowflakeschema</category>
      <category>dimensionalmodeling</category>
    </item>
    <item>
      <title>🏦 OLTP vs. OLAP: Why One Database Isn't Enough</title>
      <dc:creator>De' Clerke</dc:creator>
      <pubDate>Wed, 04 Feb 2026 10:57:08 +0000</pubDate>
      <link>https://forem.com/de_clerke/oltp-vs-olap-why-one-database-isnt-enough-4op0</link>
      <guid>https://forem.com/de_clerke/oltp-vs-olap-why-one-database-isnt-enough-4op0</guid>
      <description>&lt;p&gt;If you’ve ever wondered why companies don't just run their big data reports directly on their production database, you’re asking the right question. In the world of data engineering, we solve this by separating systems into two categories: &lt;strong&gt;OLTP&lt;/strong&gt; and &lt;strong&gt;OLAP&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Core Concept
&lt;/h2&gt;

&lt;p&gt;To understand the difference, think of a &lt;strong&gt;supermarket manager’s office&lt;/strong&gt;. The checkout counters handle individual transactions as they happen—that’s &lt;strong&gt;OLTP&lt;/strong&gt;. The manager’s office, however, stores years of sales records to analyze trends and plan for the future—that’s &lt;strong&gt;OLAP&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  1. OLTP (Online Transaction Processing)
&lt;/h2&gt;

&lt;p&gt;OLTP systems are the "workhorses" of day-to-day business. They are designed to handle real-time operations where data changes frequently.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Primary Purpose&lt;/strong&gt;: Process individual transactions in real-time.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Common Operations&lt;/strong&gt;: &lt;code&gt;INSERT&lt;/code&gt;, &lt;code&gt;UPDATE&lt;/code&gt;, and &lt;code&gt;DELETE&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Database Structure&lt;/strong&gt;: Highly &lt;strong&gt;normalized&lt;/strong&gt; (many small tables) to reduce redundancy and ensure fast writes.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Examples&lt;/strong&gt;: Banking systems, e-commerce checkouts, and inventory tracking.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  2. OLAP (Online Analytical Processing)
&lt;/h2&gt;

&lt;p&gt;OLAP systems (Data Warehouses) are built for the "big picture". They are optimized for complex analysis and reporting rather than processing single transactions.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Primary Purpose&lt;/strong&gt;: Enable complex data mining and business intelligence.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Common Operations&lt;/strong&gt;: Complex &lt;code&gt;SELECT&lt;/code&gt; statements with heavy &lt;code&gt;GROUP BY&lt;/code&gt; and aggregations.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Database Structure&lt;/strong&gt;: &lt;strong&gt;Denormalized&lt;/strong&gt; (fewer, larger tables) to reduce the need for joins during analysis.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Examples&lt;/strong&gt;: Business intelligence dashboards and market trend analysis tools.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Side-by-Side Comparison
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;OLTP (The "Doer")&lt;/th&gt;
&lt;th&gt;OLAP (The "Thinker")&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Data Focus&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Current, live data&lt;/td&gt;
&lt;td&gt;Historical data&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Optimization&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Fast write operations&lt;/td&gt;
&lt;td&gt;Fast read operations&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Query Pattern&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Simple queries on few records&lt;/td&gt;
&lt;td&gt;Complex queries on massive datasets&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;User Base&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Operational staff &amp;amp; Customers&lt;/td&gt;
&lt;td&gt;Analysts, Data Scientists, &amp;amp; Execs&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdqw061lhne6whftyq2rl.png" alt=" " width="800" height="329"&gt;
&lt;/h2&gt;

&lt;h2&gt;
  
  
  Why the Separation Matters?
&lt;/h2&gt;

&lt;p&gt;Separating these workloads is critical for &lt;strong&gt;Performance&lt;/strong&gt; and &lt;strong&gt;Stability&lt;/strong&gt;. &lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;Workload Isolation&lt;/strong&gt;: You don't want a heavy "Year-over-Year Sales" report slowing down the checkout counter for a customer.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Data Quality&lt;/strong&gt;: Data warehouses use ETL/ELT processes to cleanse and standardize data from multiple sources before it reaches the OLAP system.&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Example: Query Patterns
&lt;/h2&gt;

&lt;p&gt;An &lt;strong&gt;OLTP&lt;/strong&gt; query is surgical and fast:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;customer_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;balance&lt;/span&gt; 
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;accounts&lt;/span&gt; 
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;account_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;98765&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;An &lt;strong&gt;OLAP&lt;/strong&gt; query is broad and resource-intensive:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; 
    &lt;span class="n"&gt;region&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
    &lt;span class="k"&gt;SUM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sale_amount&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;total_revenue&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;AVG&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sale_amount&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;avg_transaction&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;sales_fact&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;sale_date&lt;/span&gt; &lt;span class="k"&gt;BETWEEN&lt;/span&gt; &lt;span class="s1"&gt;'2023-01-01'&lt;/span&gt; &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="s1"&gt;'2023-12-31'&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;region&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Summary:&lt;/strong&gt; OLTP is about &lt;strong&gt;accuracy and speed&lt;/strong&gt; in the moment; OLAP is about &lt;strong&gt;insight and context&lt;/strong&gt; over time.&lt;/p&gt;
&lt;/blockquote&gt;




</description>
      <category>datawarehouse</category>
      <category>oltp</category>
      <category>olap</category>
      <category>database</category>
    </item>
    <item>
      <title>🔄 ETL vs. ELT: The Evolution of Data Integration</title>
      <dc:creator>De' Clerke</dc:creator>
      <pubDate>Wed, 04 Feb 2026 10:15:35 +0000</pubDate>
      <link>https://forem.com/de_clerke/etl-vs-elt-the-evolution-of-data-integration-1ep</link>
      <guid>https://forem.com/de_clerke/etl-vs-elt-the-evolution-of-data-integration-1ep</guid>
      <description>&lt;p&gt;In my last post, we looked at how databases store information. But how does that data actually get there? As a data engineer, most of your time is spent designing the "pipelines" that move data from source to destination. &lt;/p&gt;

&lt;p&gt;Two main methodologies dominate this space: &lt;strong&gt;ETL&lt;/strong&gt; and &lt;strong&gt;ELT&lt;/strong&gt;. These define whether data is transformed before or after it hits your target system. Let's break down the evolution.&lt;/p&gt;




&lt;h2&gt;
  
  
  What are ETL and ELT?
&lt;/h2&gt;

&lt;p&gt;Both acronyms represent three core steps: &lt;strong&gt;Extract&lt;/strong&gt;, &lt;strong&gt;Transform&lt;/strong&gt;, and &lt;strong&gt;Load&lt;/strong&gt;. The difference lies entirely in the &lt;strong&gt;sequence&lt;/strong&gt; and &lt;strong&gt;where&lt;/strong&gt; the heavy lifting happens.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2998dlhz5f8lvlt3u590.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2998dlhz5f8lvlt3u590.png" alt=" " width="800" height="417"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  1. ETL (Extract, Transform, Load)
&lt;/h3&gt;

&lt;p&gt;This is the traditional approach. Data is extracted, transformed in a separate processing layer, and then loaded into the target system.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Workflow:&lt;/strong&gt; Data moves from Source → Transformation Engine → Target.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Strengths:&lt;/strong&gt; Ensures high data quality and security (masking) before the data is stored.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Best for:&lt;/strong&gt; Complex transformations or when target systems have limited resources.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2. ELT (Extract, Load, Transform)
&lt;/h3&gt;

&lt;p&gt;ELT is the modern, cloud-native approach. Raw data is loaded directly into the target system, and transformations are performed using the target's own computational power.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Workflow:&lt;/strong&gt; Data moves from Source → Target → Transformation.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Strengths:&lt;/strong&gt; Faster loading times and high scalability using cloud warehouses like Snowflake or BigQuery.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Best for:&lt;/strong&gt; Big Data scenarios and agile analytics where requirements change rapidly.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fw2p7m3rak2hbl2u7fxum.png" alt=" " width="800" height="457"&gt;
&lt;/h2&gt;

&lt;h2&gt;
  
  
  Comparison Matrix
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Aspect&lt;/th&gt;
&lt;th&gt;ETL&lt;/th&gt;
&lt;th&gt;ELT&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Processing Location&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;External transformation engine&lt;/td&gt;
&lt;td&gt;Within target system&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Data Quality&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;High (pre-loading validation)&lt;/td&gt;
&lt;td&gt;Variable (post-loading validation)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Flexibility&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Lower (rigid schemas)&lt;/td&gt;
&lt;td&gt;Higher (on-demand views)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Maintenance&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Complex schema management&lt;/td&gt;
&lt;td&gt;Easier to adapt to changes&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Modern Architectures: The Medallion Approach
&lt;/h2&gt;

&lt;p&gt;Many modern data teams use a "Hybrid" or &lt;strong&gt;Medallion Architecture&lt;/strong&gt; to balance both worlds. This organizes data into layers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Bronze (Raw):&lt;/strong&gt; The ELT starting point. Raw data is dumped here exactly as it came from the source.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Silver (Filtered):&lt;/strong&gt; Data is cleaned, standardized, and joined.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Gold (Business-Ready):&lt;/strong&gt; Highly transformed and aggregated data ready for analytics.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Example: Transformation in Action
&lt;/h2&gt;

&lt;p&gt;In an &lt;strong&gt;ETL&lt;/strong&gt; workflow, you might use &lt;strong&gt;Python&lt;/strong&gt; to clean data before loading it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pandas&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;

&lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read_csv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;raw_sales_data.csv&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;total_price&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;quantity&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;unit_price&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;order_date&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_datetime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;order_date&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="n"&gt;df_cleaned&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dropna&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;df_cleaned&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_sql&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sales_table&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;engine&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In an &lt;strong&gt;ELT&lt;/strong&gt; workflow, you load the raw data first and then use &lt;strong&gt;SQL&lt;/strong&gt; (often managed by tools like dbt) inside your warehouse:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;analytics&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fact_sales&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt; 
    &lt;span class="n"&gt;order_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;quantity&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;unit_price&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;total_price&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;CAST&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;order_date&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="nb"&gt;DATE&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;order_date&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;raw_data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sales_staging&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;order_id&lt;/span&gt; &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Conclusion: Which should you choose?
&lt;/h2&gt;

&lt;p&gt;The choice depends on your infrastructure and speed requirements.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Use ETL&lt;/strong&gt; if you have strict regulatory compliance, need to mask data before storage, or have limited target resources.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use ELT&lt;/strong&gt; if you are working with cloud-native architectures (BigQuery, Redshift) and need to provide near real-time insights for big data.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  👉 &lt;strong&gt;Summary:&lt;/strong&gt; ETL = &lt;em&gt;Cleanliness at the gate.&lt;/em&gt; ELT = &lt;em&gt;Agility at scale.&lt;/em&gt;
&lt;/h2&gt;

</description>
      <category>dataengineering</category>
      <category>database</category>
      <category>etl</category>
      <category>elt</category>
    </item>
    <item>
      <title>Understanding Databases: SQL, NoSQL, Schemas,DDL, and DML</title>
      <dc:creator>De' Clerke</dc:creator>
      <pubDate>Mon, 29 Sep 2025 11:17:34 +0000</pubDate>
      <link>https://forem.com/de_clerke/understanding-databases-sql-nosql-ddl-and-dml-mok</link>
      <guid>https://forem.com/de_clerke/understanding-databases-sql-nosql-ddl-and-dml-mok</guid>
      <description>&lt;h1&gt;
  
  
  🗄️ Database Essentials
&lt;/h1&gt;

&lt;p&gt;Databases sit at the core of every modern application. Whether you're building a social media platform, an online store, or a data pipeline, you need a reliable way to store, organize, and access information. Let’s break down the essentials.&lt;/p&gt;




&lt;h2&gt;
  
  
  What is a Database?
&lt;/h2&gt;

&lt;p&gt;A &lt;strong&gt;database&lt;/strong&gt; is an organized collection of data that can be stored, managed, and retrieved efficiently. Instead of scattering data across files or spreadsheets, databases provide a structured system where data can be queried, updated, and maintained consistently.&lt;/p&gt;




&lt;h2&gt;
  
  
  Types of Databases
&lt;/h2&gt;

&lt;p&gt;Databases come in many flavors, but the two most common categories are &lt;strong&gt;SQL (relational)&lt;/strong&gt; and &lt;strong&gt;NoSQL (non-relational)&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. SQL Databases
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Structure:&lt;/strong&gt; Tables with rows and columns.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Examples:&lt;/strong&gt; MySQL, PostgreSQL, Oracle, SQL Server.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Strengths:&lt;/strong&gt;

&lt;ul&gt;
&lt;li&gt;Strong consistency and reliability.&lt;/li&gt;
&lt;li&gt;Support for complex queries and relationships (joins).&lt;/li&gt;
&lt;li&gt;Schema-based design ensures data integrity.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;h3&gt;
  
  
  2. NoSQL Databases
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Structure:&lt;/strong&gt; Can be document-based, key-value pairs, wide-column stores, or graph databases.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Examples:&lt;/strong&gt; MongoDB (document), Redis (key-value), Cassandra (wide-column), Neo4j (graph).
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Strengths:&lt;/strong&gt;

&lt;ul&gt;
&lt;li&gt;Flexible schema (data doesn’t need to fit into fixed tables).&lt;/li&gt;
&lt;li&gt;Handles unstructured or semi-structured data.&lt;/li&gt;
&lt;li&gt;Scales horizontally with ease, often preferred in big data and real-time apps.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;




&lt;h2&gt;
  
  
  Schemas
&lt;/h2&gt;

&lt;p&gt;A &lt;strong&gt;schema&lt;/strong&gt; is the blueprint of a database. It defines how data is organized, what data types are allowed, and how different entities relate to each other.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;In SQL databases, schemas are strict and must be defined before data is added.&lt;/li&gt;
&lt;li&gt;In NoSQL databases, schemas are often flexible, allowing each document or record to have different fields.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  SQL Schema Example:
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;Users&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="nb"&gt;INT&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;email&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;UNIQUE&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;Orders&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="nb"&gt;INT&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;user_id&lt;/span&gt; &lt;span class="nb"&gt;INT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;product&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="k"&gt;FOREIGN&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;REFERENCES&lt;/span&gt; &lt;span class="n"&gt;Users&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  NoSQL Schema Example:
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"user_1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Alice"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"email"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"alice@mail.com"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"orders"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"product"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Laptop"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"product"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Smartphone"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;Think of a schema as the rules of the game: SQL enforces strict rules, while NoSQL gives you room to improvise.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  When to Use SQL vs NoSQL
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Use SQL when:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Data is highly structured with clear relationships.&lt;/li&gt;
&lt;li&gt;You need ACID transactions (e.g., banking, e-commerce checkout).&lt;/li&gt;
&lt;li&gt;Queries involve complex joins and aggregations.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;&lt;strong&gt;Use NoSQL when:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Data is semi-structured, rapidly changing, or unstructured.&lt;/li&gt;
&lt;li&gt;Applications need high scalability and performance at massive scale.&lt;/li&gt;
&lt;li&gt;You’re working with big data, caching, or real-time analytics.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;👉 &lt;strong&gt;Summary:&lt;/strong&gt; - SQL = &lt;em&gt;consistency and structure.&lt;/em&gt; - NoSQL = &lt;em&gt;flexibility and speed.&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  DDL vs DML
&lt;/h2&gt;

&lt;p&gt;Within databases, two key categories of SQL commands are &lt;strong&gt;DDL&lt;/strong&gt; and &lt;strong&gt;DML&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;DDL (Data Definition Language):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Defines and manages database structures like tables, schemas, indexes.&lt;/li&gt;
&lt;li&gt;Examples: &lt;code&gt;CREATE&lt;/code&gt;, &lt;code&gt;ALTER&lt;/code&gt;, &lt;code&gt;DROP&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Blueprint design.&lt;/em&gt;&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;&lt;strong&gt;DML (Data Manipulation Language):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Works with the actual data inside the structures.&lt;/li&gt;
&lt;li&gt;Examples: &lt;code&gt;INSERT&lt;/code&gt;, &lt;code&gt;UPDATE&lt;/code&gt;, &lt;code&gt;DELETE&lt;/code&gt;, &lt;code&gt;SELECT&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Content management.&lt;/em&gt;&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;




&lt;h2&gt;
  
  
  Example: DDL and DML in Action
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;Users&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="nb"&gt;INT&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;email&lt;/span&gt; &lt;span class="nb"&gt;VARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;UNIQUE&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;INSERT&lt;/span&gt; &lt;span class="k"&gt;INTO&lt;/span&gt; &lt;span class="n"&gt;Users&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;email&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;VALUES&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'Alice'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'alice@example.com'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;Users&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



</description>
      <category>database</category>
      <category>backend</category>
      <category>beginners</category>
      <category>sql</category>
    </item>
  </channel>
</rss>
