<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Marko</title>
    <description>The latest articles on Forem by Marko (@markolekic).</description>
    <link>https://forem.com/markolekic</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3620330%2F1122d202-e3f4-4f9b-9bff-2cf6fcb4603d.jpg</url>
      <title>Forem: Marko</title>
      <link>https://forem.com/markolekic</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/markolekic"/>
    <language>en</language>
    <item>
      <title>Stop Re-running Everything: A Local Incremental Pipeline in DuckDB</title>
      <dc:creator>Marko</dc:creator>
      <pubDate>Sat, 10 Jan 2026 14:26:38 +0000</pubDate>
      <link>https://forem.com/markolekic/stop-re-running-everything-a-local-incremental-pipeline-in-duckdb-543p</link>
      <guid>https://forem.com/markolekic/stop-re-running-everything-a-local-incremental-pipeline-in-duckdb-543p</guid>
      <description>&lt;p&gt;I love local-first data work… until I catch myself doing the same thing for the 12th time:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“I changed one model. Better rerun the whole pipeline.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This post is a light walkthrough of a tiny project that fixes that habit using &lt;strong&gt;incremental models + cached DAG runs&lt;/strong&gt; — all on your laptop with &lt;strong&gt;DuckDB&lt;/strong&gt;. The example is a simplified, DuckDB-only version of the existing &lt;code&gt;incremental_demo&lt;/code&gt; project. &lt;/p&gt;

&lt;p&gt;We’ll do three runs:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;seed v1 → initial build&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;run again unchanged → mostly skipped&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;seed v2 (update + new row) → incremental merge/upsert&lt;/strong&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That’s it. No cloud. No ceremony. &lt;/p&gt;

&lt;h2&gt;
  
  
  The whole demo in one sentence
&lt;/h2&gt;

&lt;p&gt;We seed a tiny &lt;code&gt;raw.events&lt;/code&gt; table (from CSV), build a staging model, then build incremental facts that only process “new enough” rows based on &lt;code&gt;updated_at&lt;/code&gt;, and apply updates based on &lt;code&gt;event_id&lt;/code&gt;. &lt;/p&gt;

&lt;h2&gt;
  
  
  What’s in the mini project
&lt;/h2&gt;

&lt;p&gt;There are three key pieces:&lt;/p&gt;

&lt;h3&gt;
  
  
  1) Two seed snapshots
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;v1&lt;/strong&gt; has 3 rows. &lt;br&gt;
&lt;strong&gt;v2&lt;/strong&gt; changes &lt;code&gt;event_id=2&lt;/code&gt; (newer &lt;code&gt;updated_at&lt;/code&gt;, different &lt;code&gt;value&lt;/code&gt;) and adds &lt;code&gt;event_id=4&lt;/code&gt;. &lt;/p&gt;
&lt;h3&gt;
  
  
  2) A source mapping
&lt;/h3&gt;

&lt;p&gt;The project defines a source &lt;code&gt;raw.events&lt;/code&gt; pointing at a seeded table called &lt;code&gt;seed_events&lt;/code&gt;. &lt;/p&gt;
&lt;h3&gt;
  
  
  3) A few models (SQL + Python)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;events_base&lt;/code&gt; (staging table)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;fct_events_sql_inline&lt;/code&gt; (incremental SQL, config inline)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;fct_events_sql_yaml&lt;/code&gt; (incremental SQL, config in &lt;code&gt;project.yml&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;fct_events_py_incremental&lt;/code&gt; (incremental Python model for DuckDB)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All of these exist in the exported demo. &lt;/p&gt;
&lt;h2&gt;
  
  
  DuckDB-only setup
&lt;/h2&gt;

&lt;p&gt;The demo’s DuckDB profile is simple: it writes to a local DuckDB file. &lt;/p&gt;
&lt;h3&gt;
  
  
  &lt;code&gt;profiles.yml&lt;/code&gt; (DuckDB profile)
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;dev_duckdb&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;engine&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;duckdb&lt;/span&gt;
  &lt;span class="na"&gt;duckdb&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;{{&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;env('FF_DUCKDB_PATH',&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;'.local/incremental_demo.duckdb')&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;}}"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h3&gt;
  
  
  &lt;code&gt;.env.dev_duckdb&lt;/code&gt; (optional convenience)
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;FF_DUCKDB_PATH&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;.local/incremental_demo.duckdb
&lt;span class="nv"&gt;FF_DUCKDB_SCHEMA&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;inc_demo_schema
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h2&gt;
  
  
  The models (the fun part)
&lt;/h2&gt;
&lt;h3&gt;
  
  
  Staging: &lt;code&gt;events_base&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;This is intentionally boring: cast timestamps, keep columns tidy.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="p"&gt;{{&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;materialized&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'table'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;}}&lt;/span&gt;

&lt;span class="k"&gt;select&lt;/span&gt;
  &lt;span class="n"&gt;event_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;cast&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;updated_at&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nb"&gt;timestamp&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;updated_at&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;value&lt;/span&gt;
&lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="p"&gt;{{&lt;/span&gt; &lt;span class="k"&gt;source&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'raw'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'events'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;}};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Incremental SQL (inline config): &lt;code&gt;fct_events_sql_inline&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;This model declares:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;materialized='incremental'&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;unique_key='event_id'&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;watermark column: &lt;code&gt;updated_at&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And on incremental runs it only selects rows newer than the max &lt;code&gt;updated_at&lt;/code&gt; already in the target.&lt;/p&gt;

&lt;p&gt;This assumes updated_at increases when rows change (it’s a demo; real pipelines may need late-arrival handling).&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="p"&gt;{{&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;materialized&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'incremental'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;unique_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'event_id'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;incremental&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="s1"&gt;'updated_at_column'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'updated_at'&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;}}&lt;/span&gt;

&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;base&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;
  &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="p"&gt;{{&lt;/span&gt; &lt;span class="k"&gt;ref&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'events_base.ff'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;}}&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;select&lt;/span&gt;
  &lt;span class="n"&gt;event_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;updated_at&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;value&lt;/span&gt;
&lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;base&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="n"&gt;if&lt;/span&gt; &lt;span class="n"&gt;is_incremental&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="k"&gt;where&lt;/span&gt; &lt;span class="n"&gt;updated_at&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="n"&gt;coalesce&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;max&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;updated_at&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="nb"&gt;timestamp&lt;/span&gt; &lt;span class="s1"&gt;'1970-01-01 00:00:00'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="p"&gt;{{&lt;/span&gt; &lt;span class="n"&gt;this&lt;/span&gt; &lt;span class="p"&gt;}}&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="n"&gt;endif&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Incremental SQL (YAML-config style): &lt;code&gt;fct_events_sql_yaml&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;Same idea, but the SQL file stays “clean”:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="p"&gt;{{&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;materialized&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'incremental'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;}}&lt;/span&gt;

&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;base&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="k"&gt;select&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;
  &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="p"&gt;{{&lt;/span&gt; &lt;span class="k"&gt;ref&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'events_base.ff'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;}}&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;select&lt;/span&gt;
  &lt;span class="n"&gt;event_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;updated_at&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;value&lt;/span&gt;
&lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="n"&gt;base&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;…and the incremental knobs live in &lt;code&gt;project.yml&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;models&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;incremental&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;fct_events_sql_yaml.ff&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;unique_key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;event_id"&lt;/span&gt;
      &lt;span class="na"&gt;incremental&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;enabled&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
        &lt;span class="na"&gt;updated_at_column&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;updated_at"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Pick whichever style better suits you.&lt;/p&gt;

&lt;h3&gt;
  
  
  Incremental Python (DuckDB): &lt;code&gt;fct_events_py_incremental&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;This one just adds &lt;code&gt;value_x10&lt;/code&gt; in pandas, and returns a delta frame. The incremental behavior (merge/upsert) is configured in &lt;code&gt;project.yml&lt;/code&gt; for the model.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;fastflowtransform&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;engine_model&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pandas&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;

&lt;span class="nd"&gt;@engine_model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;only&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;duckdb&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;fct_events_py_incremental&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;deps&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;events_base.ff&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;build&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;events_df&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;DataFrame&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;DataFrame&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;events_df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;copy&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;value_x10&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;value&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;event_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;updated_at&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;value&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;value_x10&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  The three-run walkthrough
&lt;/h2&gt;

&lt;p&gt;We’ll follow the demo’s exact “story arc”: first build, no-op build, then seed changes triggering incremental updates. &lt;/p&gt;

&lt;h3&gt;
  
  
  Step 0: pick a local seeds folder
&lt;/h3&gt;

&lt;p&gt;The Makefile uses a local seeds dir and swaps &lt;code&gt;seed_events.csv&lt;/code&gt; between v1 and v2.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;mkdir&lt;/span&gt; &lt;span class="nt"&gt;-p&lt;/span&gt; .local/seeds
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  A tiny dataset that still proves “incremental”
&lt;/h3&gt;

&lt;p&gt;This demo uses two versions of the same seed file. &lt;strong&gt;v2 updates one existing row and adds one new row&lt;/strong&gt; — so you can watch an incremental model do both an &lt;strong&gt;upsert&lt;/strong&gt; and an &lt;strong&gt;insert&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;seeds/seed_events_v1.csv&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;event_id,updated_at,value
1,2024-01-01 00:00:00,10
2,2024-01-02 00:00:00,20
3,2024-01-03 00:00:00,30
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;seeds/seed_events_v2.csv&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;event_id,updated_at,value
1,2024-01-01 00:00:00,10
2,2024-01-05 00:00:00,999
3,2024-01-03 00:00:00,30
4,2024-01-06 00:00:00,40
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After you switch from v1 → v2 and run again, you should end up with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;event_id=2&lt;/code&gt; updated (newer &lt;code&gt;updated_at&lt;/code&gt;, &lt;code&gt;value=999&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;event_id=4&lt;/code&gt; inserted (brand new row)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  1) First run (seed v1 → initial build)
&lt;/h3&gt;

&lt;p&gt;Copy v1 into place:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cp &lt;/span&gt;seeds/seed_events_v1.csv .local/seeds/seed_events.csv
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Seed + run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;FFT_SEEDS_DIR&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;.local/seeds fft seed &lt;span class="nb"&gt;.&lt;/span&gt; &lt;span class="nt"&gt;--env&lt;/span&gt; dev_duckdb
fft run &lt;span class="nb"&gt;.&lt;/span&gt; &lt;span class="nt"&gt;--env&lt;/span&gt; dev_duckdb &lt;span class="nt"&gt;--cache&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;rw
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;What you should expect:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;events_base&lt;/code&gt; becomes a normal table&lt;/li&gt;
&lt;li&gt;incremental models create their target tables for the first time (it’s effectively a full build the first time)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2) No-op run (same seed v1; should be mostly skipped)
&lt;/h3&gt;

&lt;p&gt;Run again without changing anything:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;fft run &lt;span class="nb"&gt;.&lt;/span&gt; &lt;span class="nt"&gt;--env&lt;/span&gt; dev_duckdb &lt;span class="nt"&gt;--cache&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;rw
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The demo literally calls this the “no-op run… should be mostly skipped,” which is the best feeling in local data dev.&lt;/p&gt;

&lt;h3&gt;
  
  
  3) Change the seed (v2 snapshot) and run incremental
&lt;/h3&gt;

&lt;p&gt;Now swap to v2:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cp &lt;/span&gt;seeds/seed_events_v2.csv .local/seeds/seed_events.csv
&lt;span class="nv"&gt;FFT_SEEDS_DIR&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;.local/seeds fft seed &lt;span class="nb"&gt;.&lt;/span&gt; &lt;span class="nt"&gt;--env&lt;/span&gt; dev_duckdb
fft run &lt;span class="nb"&gt;.&lt;/span&gt; &lt;span class="nt"&gt;--env&lt;/span&gt; dev_duckdb &lt;span class="nt"&gt;--cache&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;rw
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is the punchline:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;event_id=2&lt;/code&gt; comes in with a &lt;strong&gt;newer&lt;/strong&gt; &lt;code&gt;updated_at&lt;/code&gt; and &lt;code&gt;value=999&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;event_id=4&lt;/code&gt; shows up for the first time&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So your incremental facts should &lt;strong&gt;update&lt;/strong&gt; the row for &lt;code&gt;event_id=2&lt;/code&gt; and &lt;strong&gt;insert&lt;/strong&gt; &lt;code&gt;event_id=4&lt;/code&gt;, based on &lt;code&gt;unique_key=event_id&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Sanity check in DuckDB
&lt;/h2&gt;

&lt;p&gt;Query the incremental SQL table:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;duckdb .local/incremental_demo.duckdb &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="s2"&gt;"select * from inc_demo_schema.fct_events_sql_inline order by event_id;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After v2, you should see:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;event_id=2&lt;/code&gt; with &lt;code&gt;updated_at = 2024-01-05&lt;/code&gt; and &lt;code&gt;value = 999&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;a new row for &lt;code&gt;event_id=4&lt;/code&gt; with &lt;code&gt;updated_at = 2024-01-06&lt;/code&gt; and &lt;code&gt;value = 40&lt;/code&gt; &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you query the Python table:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;duckdb .local/incremental_demo.duckdb &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="s2"&gt;"select * from inc_demo_schema.fct_events_py_incremental order by event_id;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You should also see &lt;code&gt;value_x10&lt;/code&gt; (e.g., &lt;code&gt;9990&lt;/code&gt; for the updated row).&lt;/p&gt;

&lt;h2&gt;
  
  
  Make the DAG visible
&lt;/h2&gt;

&lt;p&gt;You can see the DAG in the docs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;fft docs serve &lt;span class="nt"&gt;--env&lt;/span&gt; dev_duckdb &lt;span class="nt"&gt;--open&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flhwk0xaty6vau5rh3txx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flhwk0xaty6vau5rh3txx.png" alt="DAG from local docs server"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Optional tiny “quality check”
&lt;/h2&gt;

&lt;p&gt;The demo includes simple not-null tests for the incremental outputs.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;fft &lt;span class="nb"&gt;test&lt;/span&gt; &lt;span class="nb"&gt;.&lt;/span&gt; &lt;span class="nt"&gt;--env&lt;/span&gt; dev_duckdb &lt;span class="nt"&gt;--select&lt;/span&gt; tag:incremental
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  What you just bought yourself
&lt;/h2&gt;

&lt;p&gt;With this setup, your local dev loop becomes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Run once&lt;/strong&gt; to build everything&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Run again&lt;/strong&gt; and skip most work&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Change input data&lt;/strong&gt; and update only what’s necessary&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Update existing rows&lt;/strong&gt; safely (via &lt;code&gt;unique_key&lt;/code&gt;) instead of “append and pray”&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And all of it works with a single local DuckDB file, which makes experimenting feel cheap again.&lt;/p&gt;

</description>
      <category>python</category>
      <category>dataengineering</category>
      <category>sql</category>
      <category>duckdb</category>
    </item>
    <item>
      <title>Stop Waiting for the Cloud: Building a Hybrid SQL+Python Data Pipeline Locally with DuckDB</title>
      <dc:creator>Marko</dc:creator>
      <pubDate>Fri, 28 Nov 2025 16:59:05 +0000</pubDate>
      <link>https://forem.com/markolekic/stop-waiting-for-the-cloud-building-a-hybrid-sqlpython-data-pipeline-locally-with-duckdb-438j</link>
      <guid>https://forem.com/markolekic/stop-waiting-for-the-cloud-building-a-hybrid-sqlpython-data-pipeline-locally-with-duckdb-438j</guid>
      <description>&lt;p&gt;&lt;strong&gt;Cloud data warehouses are amazing for production. They are terrible for development.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If you’re a Data Engineer, you know the pain of the “cloud feedback loop”:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;You write a complex SQL query.&lt;/li&gt;
&lt;li&gt;You hit “Run” in your orchestrator.&lt;/li&gt;
&lt;li&gt;You wait 45 seconds for the warehouse to spin up or queue your job.&lt;/li&gt;
&lt;li&gt;It fails because of a syntax error.&lt;/li&gt;
&lt;li&gt;You fix it. You pay for the query slot. You wait again.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This latency kills flow.&lt;/p&gt;

&lt;p&gt;In software engineering, we run code locally on our laptops before shipping to production. Why can’t we do the same for data pipelines?&lt;/p&gt;

&lt;p&gt;I built &lt;strong&gt;FastFlowTransform (FFT)&lt;/strong&gt; to solve this. It’s a framework that lets you build and test your pipeline locally using &lt;strong&gt;DuckDB&lt;/strong&gt; (for speed and free compute), and then deploy the &lt;strong&gt;same project&lt;/strong&gt; to &lt;strong&gt;Snowflake, BigQuery, or Databricks&lt;/strong&gt; for production.&lt;/p&gt;

&lt;p&gt;In this post we’ll build a tiny “Users” pipeline that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;runs locally on DuckDB in &lt;strong&gt;&amp;lt; 1s&lt;/strong&gt;, and&lt;/li&gt;
&lt;li&gt;can be deployed to BigQuery by changing a &lt;strong&gt;single CLI flag (--env)&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  1. Initialize a local-first project (no cloud creds)
&lt;/h2&gt;

&lt;p&gt;You don’t need AWS keys or a Snowflake login. Just a laptop.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;fastflowtransform

fft init building_locally_demo &lt;span class="nt"&gt;--engine&lt;/span&gt; duckdb
&lt;span class="nb"&gt;cd &lt;/span&gt;building_locally_demo
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This creates a minimal FFT project with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;models/&lt;/code&gt; – SQL/Python models&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;seeds/&lt;/code&gt; – CSV/Parquet seeds&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;profiles.yml&lt;/code&gt; – connection profiles, including a local DuckDB one&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We’ll use DuckDB as our “dev warehouse”.&lt;/p&gt;




&lt;h2&gt;
  
  
  2. Add some data (seeds + sources)
&lt;/h2&gt;

&lt;p&gt;We start with a simple &lt;code&gt;users&lt;/code&gt; CSV.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;seeds/seed_users.csv&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;id,email,signup_date
1,alice@example.com,2023-01-01
2,bob@example.com,2023-01-02
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then we tell FFT how to reference this seed as a &lt;strong&gt;source&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;sources.yml&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;2&lt;/span&gt;

&lt;span class="na"&gt;sources&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;raw&lt;/span&gt;
    &lt;span class="na"&gt;schema&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;staging&lt;/span&gt;
    &lt;span class="na"&gt;tables&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;users&lt;/span&gt;
        &lt;span class="na"&gt;identifier&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;seed_users&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now &lt;code&gt;{{ source('raw', 'users') }}&lt;/code&gt; will resolve to the table created from &lt;code&gt;seed_users.csv&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Load the seed into DuckDB:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;fft seed &lt;span class="nb"&gt;.&lt;/span&gt; &lt;span class="nt"&gt;--env&lt;/span&gt; dev_duckdb
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyly7wyu71twg50ky9ceq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyly7wyu71twg50ky9ceq.png" alt="Local DuckDB - Seed" width="800" height="64"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You should now have:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a DuckDB file at &lt;code&gt;.local/dev.duckdb&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;a table like &lt;code&gt;staging.seed_users&lt;/code&gt; available in your local engine&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  3. Write a transformation in SQL
&lt;/h2&gt;

&lt;p&gt;FFT uses standard SQL with Jinja templating (similar to dbt). It takes care of engine differences for you.&lt;/p&gt;

&lt;p&gt;Create &lt;code&gt;models/staging/stg_users.ff.sql&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="p"&gt;{{&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;materialized&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'table'&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;}}&lt;/span&gt;

&lt;span class="k"&gt;select&lt;/span&gt; 
    &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
    &lt;span class="c1"&gt;-- Standardizing email casing&lt;/span&gt;
    &lt;span class="k"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;email&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;email&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
    &lt;span class="c1"&gt;-- Casting types explicitly&lt;/span&gt;
    &lt;span class="k"&gt;cast&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;signup_date&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nb"&gt;date&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;signup_date&lt;/span&gt;
&lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="p"&gt;{{&lt;/span&gt; &lt;span class="k"&gt;source&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'raw'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s1"&gt;'users'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;}}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is just… SQL. No FFT-specific magic beyond &lt;code&gt;config()&lt;/code&gt; and &lt;code&gt;source()&lt;/code&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  4. Run the pipeline locally (the “fast loop”)
&lt;/h2&gt;

&lt;p&gt;Now run the DAG on your machine:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;fft run &lt;span class="nb"&gt;.&lt;/span&gt; &lt;span class="nt"&gt;--env&lt;/span&gt; dev_duckdb
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm73hocx1n1wrl2z7aduf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm73hocx1n1wrl2z7aduf.png" alt="Local DuckDB - Run" width="800" height="300"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;On my laptop, I see something like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Time:&lt;/strong&gt; &lt;code&gt;18 ms&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cost:&lt;/strong&gt; &lt;code&gt;$0.00&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Infrastructure:&lt;/strong&gt; my CPU&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I can iterate on this loop hundreds of times per hour.&lt;/p&gt;

&lt;p&gt;Add data quality checks:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;fft &lt;span class="nb"&gt;test&lt;/span&gt; &lt;span class="nb"&gt;.&lt;/span&gt; &lt;span class="nt"&gt;--env&lt;/span&gt; dev_duckdb
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fte4hsz8q9pfed8cmrms1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fte4hsz8q9pfed8cmrms1.png" alt="Local DuckDB - Test" width="800" height="178"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Or even model-level unit tests (no real DB needed):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;fft utest &lt;span class="nb"&gt;.&lt;/span&gt; &lt;span class="nt"&gt;--env&lt;/span&gt; dev_duckdb
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6dyrwosfnmkwgq75zalr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6dyrwosfnmkwgq75zalr.png" alt="Local DuckDB - uTest" width="800" height="63"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Local-first DX:&lt;/strong&gt; fast feedback, offline-friendly, and you only touch the cloud once your logic is solid.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  5. Point the same project at BigQuery (the “flow loop”)
&lt;/h2&gt;

&lt;p&gt;Once you’re happy with the logic, it’s time to push to production.&lt;/p&gt;

&lt;p&gt;In other frameworks you might:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;change connection strings,&lt;/li&gt;
&lt;li&gt;update environment variables manually,&lt;/li&gt;
&lt;li&gt;maybe even rewrite SQL if you used engine-specific functions.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In FFT, you &lt;strong&gt;just switch the profile&lt;/strong&gt; (and do not forget to set your credentials for BigQuery).&lt;/p&gt;

&lt;p&gt;&lt;code&gt;profiles.yml&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# My Local Playground&lt;/span&gt;
&lt;span class="na"&gt;dev_duckdb&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;engine&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;duckdb&lt;/span&gt;
  &lt;span class="na"&gt;duckdb&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.local/dev.duckdb"&lt;/span&gt;

&lt;span class="c1"&gt;# My Local utest Overrides&lt;/span&gt;
&lt;span class="na"&gt;dev_duckdb_utest&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;engine&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;duckdb&lt;/span&gt;
  &lt;span class="na"&gt;duckdb&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;:memory:"&lt;/span&gt;

&lt;span class="c1"&gt;# My Production Environment&lt;/span&gt;
&lt;span class="na"&gt;prod_bigquery&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;engine&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;bigquery&lt;/span&gt;
  &lt;span class="na"&gt;bigquery&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;project&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;fft-basic-demo"&lt;/span&gt;
    &lt;span class="na"&gt;dataset&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;production_marts"&lt;/span&gt;
    &lt;span class="na"&gt;location&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;EU"&lt;/span&gt;
    &lt;span class="c1"&gt;# Use the pandas backend here; FFT can also use BigFrames if you set this to true.&lt;/span&gt;
    &lt;span class="na"&gt;use_bigframes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;
    &lt;span class="na"&gt;allow_create_dataset&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;

&lt;span class="c1"&gt;# My Production utest Overrides&lt;/span&gt;
&lt;span class="na"&gt;prod_bigquery_utest&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;engine&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;bigquery&lt;/span&gt;
  &lt;span class="na"&gt;bigquery&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;dataset&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;production_marts_utest"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now run the &lt;strong&gt;same project&lt;/strong&gt;, different environment:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# exactly the same command, different --env&lt;/span&gt;
fft seed &lt;span class="nb"&gt;.&lt;/span&gt; &lt;span class="nt"&gt;--env&lt;/span&gt; prod_bigquery
fft run &lt;span class="nb"&gt;.&lt;/span&gt; &lt;span class="nt"&gt;--env&lt;/span&gt; prod_bigquery
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fskz76sxm5t8nrmkd80zw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fskz76sxm5t8nrmkd80zw.png" alt="Remote BigQuery - Seed" width="800" height="37"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnn7vmehuf9ijyced2otl.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnn7vmehuf9ijyced2otl.png" alt="Remote BigQuery - Run" width="800" height="235"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;FFT:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;builds the same DAG,&lt;/li&gt;
&lt;li&gt;compiles the same SQL models,&lt;/li&gt;
&lt;li&gt;authenticates with Google Cloud using your local creds,&lt;/li&gt;
&lt;li&gt;executes the transformations on BigQuery.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We didn’t touch &lt;code&gt;stg_users.ff.sql&lt;/code&gt;. Only &lt;code&gt;--env&lt;/code&gt; changed.&lt;/p&gt;




&lt;h2&gt;
  
  
  6. Why this is a big deal for DX
&lt;/h2&gt;

&lt;p&gt;This isn’t just about saving money (though you do get that for free). It’s about &lt;strong&gt;Developer Experience&lt;/strong&gt;.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Work offline.&lt;/strong&gt; Build complex DAGs on a train or in airplane mode with DuckDB.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Unit test your models.&lt;/strong&gt; Use &lt;code&gt;fft utest&lt;/code&gt; with tiny fixtures to validate logic before hitting any real warehouse.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hybrid SQL+Python.&lt;/strong&gt; FFT supports &lt;strong&gt;Python models&lt;/strong&gt; alongside SQL. Use SQL for aggregations and joins, Python for custom logic or ML.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Example Python model:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# models/marts/mart_latest_signup.ff.py
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pandas&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;fastflowtransform&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;engine_model&lt;/span&gt;


&lt;span class="nd"&gt;@engine_model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="c1"&gt;# Register this model for both DuckDB (local) and BigQuery (pandas backend)
&lt;/span&gt;    &lt;span class="n"&gt;only&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;duckdb&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;bigquery&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mart_latest_signup&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;materialized&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;table&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;deps&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;stg_users.ff&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;  &lt;span class="c1"&gt;# SQL model from earlier in the article
&lt;/span&gt;    &lt;span class="n"&gt;tags&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;scope:mart&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;engine:duckdb&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;engine:bigquery&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;requires&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="c1"&gt;# Columns produced by stg_users.ff.sql:
&lt;/span&gt;        &lt;span class="c1"&gt;#   id, email, signup_date
&lt;/span&gt;        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;stg_users.ff&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;email&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;signup_date&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;build&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;stg_users&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;DataFrame&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;DataFrame&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Return the latest signup per email domain using pandas.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="c1"&gt;# Derive an email_domain column in Python
&lt;/span&gt;    &lt;span class="n"&gt;users&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;stg_users&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;copy&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;users&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;email_domain&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;users&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;email&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;@&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="n"&gt;latest&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;users&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sort_values&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;signup_date&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ascending&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;drop_duplicates&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;email_domain&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# keep the newest per domain
&lt;/span&gt;        &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;loc&lt;/span&gt;&lt;span class="p"&gt;[:,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;email_domain&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;email&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;signup_date&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt;
        &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;rename&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;columns&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;latest_user_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;email&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;latest_email&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;signup_date&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;latest_signup_date&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;reset_index&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;drop&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;latest&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can run this locally (DuckDB + pandas), then later switch to Spark or BigQuery with the same decorator.&lt;/p&gt;




&lt;h2&gt;
  
  
  7. Try it yourself
&lt;/h2&gt;

&lt;p&gt;Stop treating your laptop like a thin client. Your machine is powerful enough to build data pipelines.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;fastflowtransform

fft init building_locally_demo &lt;span class="nt"&gt;--engine&lt;/span&gt; duckdb
&lt;span class="nb"&gt;cd &lt;/span&gt;building_locally_demo

&lt;span class="c"&gt;# add the users seed + sources.yml from this article,&lt;/span&gt;
&lt;span class="c"&gt;# then:&lt;/span&gt;
fft seed &lt;span class="nb"&gt;.&lt;/span&gt; &lt;span class="nt"&gt;--env&lt;/span&gt; dev_duckdb
fft run &lt;span class="nb"&gt;.&lt;/span&gt; &lt;span class="nt"&gt;--env&lt;/span&gt; dev_duckdb
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Repo:&lt;/strong&gt;&lt;br&gt;
&lt;a href="https://github.com/FFTLabs/FastFlowTransform" rel="noopener noreferrer"&gt;https://github.com/FFTLabs/FastFlowTransform&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If you try this, I’d love to hear:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What does your workflow look like&lt;/li&gt;
&lt;li&gt;Which warehouse you’re deploying to (BigQuery/Snowflake/Databricks/etc.)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Thinking about a follow-up post on &lt;strong&gt;incremental models&lt;/strong&gt; or &lt;strong&gt;data-quality tests&lt;/strong&gt; with FFT — if that’s interesting, tell me in the comments.&lt;/p&gt;

</description>
      <category>python</category>
      <category>sql</category>
      <category>dataengineering</category>
      <category>tooling</category>
    </item>
    <item>
      <title>The Offline Data Engineer: Building Resilient API Pipelines that Work on an Airplane</title>
      <dc:creator>Marko</dc:creator>
      <pubDate>Fri, 21 Nov 2025 17:00:20 +0000</pubDate>
      <link>https://forem.com/markolekic/the-offline-data-engineer-building-resilient-api-pipelines-that-work-on-an-airplane-3pgf</link>
      <guid>https://forem.com/markolekic/the-offline-data-engineer-building-resilient-api-pipelines-that-work-on-an-airplane-3pgf</guid>
      <description>&lt;p&gt;&lt;strong&gt;Development loops for API integrations are usually painful.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;We’ve all been there: You are building a data pipeline to ingest data from a third-party API (Salesforce, Stripe, or an internal microservice). You write your Python script, hit &lt;code&gt;run&lt;/code&gt;, and wait.&lt;/p&gt;

&lt;p&gt;It works. You change a column name in your transformation logic. You hit &lt;code&gt;run&lt;/code&gt; again. You wait again.&lt;/p&gt;

&lt;p&gt;Suddenly, you hit a rate limit. Or the API throws a 503 error. Or, worse, you are on a train or an airplane with spotty WiFi, and you can’t run your code at all because it depends on a live internet connection.&lt;/p&gt;

&lt;p&gt;In the world of SQL, we solved this with local databases (DuckDB) and seeds. But in the world of Python API ingestion, we are often still writing fragile &lt;code&gt;requests.get()&lt;/code&gt; loops that break the moment the internet does.&lt;/p&gt;

&lt;p&gt;I built &lt;strong&gt;FastFlowTransform (FFT)&lt;/strong&gt; to solve this. It’s a hybrid SQL+Python framework that treats HTTP responses like immutable artifacts, allowing you to build, test, and debug API pipelines completely offline.&lt;/p&gt;

&lt;p&gt;Here is how to build a pipeline that is "Airplane Mode" ready, using a real API example.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem: The "Request Loop" Antipattern
&lt;/h2&gt;

&lt;p&gt;Typically, a Python extraction script looks something like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pandas&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;

&lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
&lt;span class="n"&gt;page&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
&lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# Fragile: Depending on a live connection for every test run
&lt;/span&gt;    &lt;span class="n"&gt;resp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://jsonplaceholder.typicode.com/todos?_page=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;page&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;&amp;amp;_limit=20&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;json_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;resp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;json_data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="c1"&gt;# Stop if empty
&lt;/span&gt;        &lt;span class="k"&gt;break&lt;/span&gt;

    &lt;span class="c1"&gt;# ... complex logic to clean and parse JSON ...
&lt;/span&gt;    &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;extend&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;json_data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;page&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;

&lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;DataFrame&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# ... save to DB ...
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;The issues with this approach:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;Coupled Extraction &amp;amp; Logic:&lt;/strong&gt; If you mess up the parsing logic, you have to re-fetch &lt;em&gt;everything&lt;/em&gt; to test the fix.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;No State:&lt;/strong&gt; If the script crashes on page 99 of 100, you restart from page 1.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Online Only:&lt;/strong&gt; You cannot run this without a live connection.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  The Solution: FFT's Cached HTTP Module
&lt;/h2&gt;

&lt;p&gt;FastFlowTransform introduces a dedicated module &lt;code&gt;fastflowtransform.api.http&lt;/code&gt;. It separates the &lt;strong&gt;fetch&lt;/strong&gt; (IO) from the &lt;strong&gt;transformation&lt;/strong&gt; (Compute) using a file-backed cache.&lt;/p&gt;

&lt;p&gt;Let's build a model that pulls "To-Do" items from JSONPlaceholder.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. The Setup
&lt;/h3&gt;

&lt;p&gt;First, we initialize a project. We'll use &lt;strong&gt;DuckDB&lt;/strong&gt; for local development so we don't need a cloud warehouse.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;fastflowtransform
fft init my_api_project &lt;span class="nt"&gt;--engine&lt;/span&gt; duckdb
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2. The Model with Pagination
&lt;/h3&gt;

&lt;p&gt;In FFT, Python models are first-class citizens. We need to define how to talk to the API.&lt;/p&gt;

&lt;p&gt;Since JSONPlaceholder uses query parameters (&lt;code&gt;_page&lt;/code&gt; and &lt;code&gt;_limit&lt;/code&gt;), we write a simple paginator function that detects when to stop (when the list is empty) and how to get the next page.&lt;/p&gt;

&lt;p&gt;Create &lt;code&gt;models/todos_ingest.ff.py&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pandas&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;fastflowtransform&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;fastflowtransform.api.http&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;get_df&lt;/span&gt;

&lt;span class="c1"&gt;# 1. Define the Paginator
# This function runs after every request to determine what to do next.
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;offset_paginator&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;response_json&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# If the API returns an empty list, we are done.
&lt;/span&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;response_json&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;

    &lt;span class="c1"&gt;# Otherwise, increment the page number
&lt;/span&gt;    &lt;span class="n"&gt;current_page&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;_page&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;current_page&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
    &lt;span class="n"&gt;next_params&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;params&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="p"&gt;{})&lt;/span&gt;
    &lt;span class="n"&gt;next_params&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;_page&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;current_page&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;next_request&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;params&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;next_params&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nd"&gt;@model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;todos_ingest&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;fetch_todos&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;DataFrame&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# 2. get_df handles the HTTP calls, caching, and conversion
&lt;/span&gt;    &lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get_df&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://jsonplaceholder.typicode.com/todos&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;_page&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;_limit&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="c1"&gt;# Start at page 1
&lt;/span&gt;        &lt;span class="n"&gt;paginator&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;offset_paginator&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="c1"&gt;# record_path is None because the root of the JSON is the list itself
&lt;/span&gt;        &lt;span class="n"&gt;record_path&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;None&lt;/span&gt; 
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# 3. Apply transformation logic
&lt;/span&gt;    &lt;span class="c1"&gt;# If we change THIS logic later, FFT won't re-fetch the API!
&lt;/span&gt;
    &lt;span class="c1"&gt;# Example: Mark high-priority items locally
&lt;/span&gt;    &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;priority&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;title&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;apply&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;HIGH&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;delectus&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;NORMAL&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3. The First Run (Online)
&lt;/h3&gt;

&lt;p&gt;When we run this for the first time, FFT hits the API.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;fft run &lt;span class="nb"&gt;.&lt;/span&gt; &lt;span class="nt"&gt;--select&lt;/span&gt; todos_ingest
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fistsls4w8xvdda9yl3pi.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fistsls4w8xvdda9yl3pi.png" alt="Run logs after requesting the API" width="800" height="176"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What happens under the hood:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; FFT calculates a fingerprint for the model.&lt;/li&gt;
&lt;li&gt; It executes the requests: &lt;code&gt;_page=1&lt;/code&gt;, then &lt;code&gt;_page=2&lt;/code&gt;, and so on, until the API returns &lt;code&gt;[]&lt;/code&gt; or it reaches the configured limits.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Crucially:&lt;/strong&gt; It saves the raw JSON to a local cache directory (&lt;code&gt;.fastflowtransform/http_cache&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt; It transforms the data and materializes a table in DuckDB.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2r71224f1nxj0nigjo5k.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2r71224f1nxj0nigjo5k.png" alt="Preview of the resulting table - online run" width="800" height="240"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3t13rfki0tw446qficr6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3t13rfki0tw446qficr6.png" alt="Count of the resulting table - online run" width="800" height="88"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  4. The Second Run (Offline / "Airplane Mode")
&lt;/h3&gt;

&lt;p&gt;Now, imagine you are on a plane. You realize you made a mistake: you want to filter out any tasks that are already &lt;code&gt;completed&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;You update the code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;    &lt;span class="c1"&gt;# ... inside the model function ...
&lt;/span&gt;
    &lt;span class="c1"&gt;# New Logic: Filter rows
&lt;/span&gt;    &lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;completed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="c1"&gt;# New Logic: Uppercase titles
&lt;/span&gt;    &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;title&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;title&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;upper&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You don't have internet. But you don't need it. Run with the &lt;code&gt;--offline&lt;/code&gt; flag:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;fft run &lt;span class="nb"&gt;.&lt;/span&gt; &lt;span class="nt"&gt;--select&lt;/span&gt; todos_ingest &lt;span class="nt"&gt;--offline&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F86mq0kdokrvarylegigr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F86mq0kdokrvarylegigr.png" alt="Run logs for cached run" width="800" height="180"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Result:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;FFT sees the &lt;code&gt;--offline&lt;/code&gt; flag.&lt;/li&gt;
&lt;li&gt;It checks the cache. It finds the JSON from the previous run.&lt;/li&gt;
&lt;li&gt;It &lt;strong&gt;skips&lt;/strong&gt; the network request entirely.&lt;/li&gt;
&lt;li&gt;It feeds the cached JSON into your new logic.&lt;/li&gt;
&lt;li&gt;The run succeeds in milliseconds.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F38mccfyeuf9a1gipk8js.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F38mccfyeuf9a1gipk8js.png" alt="Preview of the resulting table using HTTP cache" width="800" height="247"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftwm3h5s5gjyf9jem867t.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftwm3h5s5gjyf9jem867t.png" alt="Count of the resulting table using HTTP cache " width="800" height="96"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Telemetry and Observability
&lt;/h3&gt;

&lt;p&gt;How do you know if you hit the cache? FFT generates a &lt;code&gt;run_results.json&lt;/code&gt; artifact after every run. It provides deep visibility into your API consumption:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="nl"&gt;"http"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"bytes"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;2273&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"cache_hits"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"content_hashes"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"sha256:110aa4d5dac630aa245ff3c3c53d7ea9bc4212df93f04d96f900ba9cb93f4622"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"sha256:27ea31b0b9bb05c4feba2951d2f0a5f9dde340f0d19cc45722386e8951b794b5"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"keys"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"7a8d720efd2b8afb319534d0d1f08b7937f666a14fea0952c3cbbe0c2442b6d9"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"24850fd6c24df9ecd041d643331023d48d39b6c6bbf64080c76f86c95613a584"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"node"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"todos_ingest"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"requests"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"used_offline"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This gives you confidence that your CI/CD pipeline is deterministic. You can even commit your cache to git (for small reference datasets) to ensure your tests never flake due to external API downtime.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this matters
&lt;/h2&gt;

&lt;p&gt;Data engineering is moving toward software engineering best practices.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;Reproducibility:&lt;/strong&gt; Your pipeline should produce the same result today as it did yesterday, regardless of the state of an external API.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Speed:&lt;/strong&gt; You shouldn't pay a latency penalty every time you test a logic change.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Cost:&lt;/strong&gt; If you are hitting a paid API, caching saves you money during development.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;FastFlowTransform brings the developer experience of "Localhost" to the messy world of Data Engineering.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Try it out:&lt;/strong&gt;&lt;br&gt;
&lt;a href="https://github.com/FFTLabs/FastFlowTransform" rel="noopener noreferrer"&gt;[FastFlowTransform GitHub]&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;fastflowtransform
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



</description>
      <category>python</category>
      <category>dataengineering</category>
      <category>api</category>
      <category>sql</category>
    </item>
  </channel>
</rss>
