<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Andrew Kalik</title>
    <description>The latest articles on Forem by Andrew Kalik (@geekusa33).</description>
    <link>https://forem.com/geekusa33</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F731391%2F4a09ef56-289b-4235-8fd1-05300a399594.jpeg</url>
      <title>Forem: Andrew Kalik</title>
      <link>https://forem.com/geekusa33</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/geekusa33"/>
    <language>en</language>
    <item>
      <title>cx_Oracle vs setuptools: A Dependency Fight Nobody Wanted</title>
      <dc:creator>Andrew Kalik</dc:creator>
      <pubDate>Tue, 10 Feb 2026 16:14:21 +0000</pubDate>
      <link>https://forem.com/geekusa33/cxoracle-vs-setuptools-a-dependency-fight-nobody-wanted-11gm</link>
      <guid>https://forem.com/geekusa33/cxoracle-vs-setuptools-a-dependency-fight-nobody-wanted-11gm</guid>
      <description>&lt;p&gt;If you've recently tried installing &lt;code&gt;cx_Oracle&lt;/code&gt; and it blew up in your face…&lt;/p&gt;

&lt;p&gt;Yeah. Same.&lt;/p&gt;

&lt;p&gt;And if your first thought was:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Why the hell did this break? It worked fine a few days ago.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Welcome to the wonderful world of dependency drift, Python packaging ecosystem changes, and the uncomfortable reality that “cloud managed” does &lt;em&gt;not&lt;/em&gt; mean “someone else will keep your dependencies from rotting.”&lt;/p&gt;

&lt;p&gt;This post is part technical breakdown, part cloud PSA, and part friendly warning:&lt;/p&gt;

&lt;p&gt;✅ If you're still using &lt;code&gt;cx_Oracle&lt;/code&gt;, it’s time to migrate to &lt;code&gt;oracledb&lt;/code&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Breaking Change Nobody Asked For
&lt;/h2&gt;

&lt;p&gt;For years, Oracle connectivity in Python was basically muscle memory:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;cx_Oracle
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And it worked.&lt;/p&gt;

&lt;p&gt;It worked locally.&lt;br&gt;&lt;br&gt;
It worked in EC2.&lt;br&gt;&lt;br&gt;
It worked in containers.&lt;br&gt;&lt;br&gt;
It worked in Airflow.&lt;br&gt;&lt;br&gt;
It worked in MWAA.  &lt;/p&gt;

&lt;p&gt;And then suddenly it didn’t.&lt;/p&gt;

&lt;p&gt;If you’ve updated your Python build toolchain recently (pip / setuptools / wheel), you may have seen installs fail with errors like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;build failures during &lt;code&gt;pip install&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;metadata generation errors&lt;/li&gt;
&lt;li&gt;compilation failures depending on OS image&lt;/li&gt;
&lt;li&gt;dependency resolution errors that feel random&lt;/li&gt;
&lt;li&gt;“it worked yesterday” failures after rebuilding an image&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And here’s the important part:&lt;/p&gt;
&lt;h3&gt;
  
  
  This isn’t just a cloud problem anymore.
&lt;/h3&gt;

&lt;p&gt;If your laptop is running a modern Python toolchain, &lt;code&gt;cx_Oracle&lt;/code&gt; can fail locally too. So this isn't just some AWS runtime quirk.&lt;/p&gt;

&lt;p&gt;This is Python packaging evolution colliding with a legacy dependency.&lt;/p&gt;


&lt;h2&gt;
  
  
  Why This Happens (And Why It Feels Random)
&lt;/h2&gt;

&lt;p&gt;The Python ecosystem has been steadily moving away from older build behaviors and pushing toward modern standards:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;PEP 517 / PEP 518 build isolation&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;pyproject.toml&lt;/code&gt; driven builds&lt;/li&gt;
&lt;li&gt;stricter build dependency enforcement&lt;/li&gt;
&lt;li&gt;fewer “legacy fallback” behaviors&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is a good thing overall.&lt;/p&gt;

&lt;p&gt;But it also means packages that depend on older assumptions can break as pip/setuptools evolve.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;cx_Oracle&lt;/code&gt; has been around forever, and a lot of codebases still depend on it, but it’s increasingly out of alignment with how Python packaging works today.&lt;/p&gt;


&lt;h2&gt;
  
  
  MWAA Makes This Worse (Because You Don’t Control the Runtime)
&lt;/h2&gt;

&lt;p&gt;This is where the pain becomes operational instead of just annoying.&lt;/p&gt;

&lt;p&gt;In AWS MWAA (Managed Workflows for Apache Airflow), you are not fully in control of the runtime environment forever.&lt;/p&gt;

&lt;p&gt;Even if you never click "upgrade", AWS still applies platform patching and refreshes as part of operating a managed service.&lt;/p&gt;

&lt;p&gt;So you end up in the worst-case scenario:&lt;/p&gt;

&lt;p&gt;✅ your DAGs didn’t change&lt;br&gt;&lt;br&gt;
✅ your &lt;code&gt;requirements.txt&lt;/code&gt; didn’t change&lt;br&gt;&lt;br&gt;
❌ your environment breaks anyway  &lt;/p&gt;

&lt;p&gt;And now your production orchestration system is down because a dependency that used to install cleanly no longer does.&lt;/p&gt;


&lt;h2&gt;
  
  
  MWAA Maintenance Windows: The Silent Dependency Killer
&lt;/h2&gt;

&lt;p&gt;If you're running MWAA, you’ve probably seen the concept of &lt;strong&gt;maintenance windows&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;These are the scheduled windows where AWS can apply updates behind the scenes.&lt;/p&gt;

&lt;p&gt;During MWAA maintenance windows, AWS may patch or refresh things like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the underlying base OS image&lt;/li&gt;
&lt;li&gt;Python runtime components&lt;/li&gt;
&lt;li&gt;pip&lt;/li&gt;
&lt;li&gt;setuptools&lt;/li&gt;
&lt;li&gt;wheel&lt;/li&gt;
&lt;li&gt;OpenSSL and other system libraries&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Which is great for security.&lt;/p&gt;

&lt;p&gt;But it also means your environment can shift underneath you without you explicitly touching your deployment pipeline.&lt;/p&gt;

&lt;p&gt;And that means a MWAA maintenance window can quietly turn into:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Congrats, your production scheduler is now a science experiment.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Because dependency installation behavior changes, and brittle packages fall apart.&lt;/p&gt;


&lt;h2&gt;
  
  
  This Is the Shared Responsibility Model in Real Life
&lt;/h2&gt;

&lt;p&gt;This is the part people don’t like to hear.&lt;/p&gt;

&lt;p&gt;AWS owns securing the MWAA platform.&lt;/p&gt;

&lt;p&gt;You own your application dependencies.&lt;/p&gt;

&lt;p&gt;That’s the cloud shared responsibility model, whether we like it or not.&lt;/p&gt;

&lt;p&gt;AWS will keep patching MWAA. They should. They have to.&lt;/p&gt;

&lt;p&gt;But if your workloads depend on fragile dependencies, you’re effectively betting your data platform stability on old packaging assumptions staying frozen in time.&lt;/p&gt;

&lt;p&gt;And they won’t.&lt;/p&gt;


&lt;h2&gt;
  
  
  The Tempting Fix: Pin setuptools (Not Recommended)
&lt;/h2&gt;

&lt;p&gt;One quick fix is to pin setuptools back to an older version:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="s2"&gt;"setuptools&amp;lt;XX"&lt;/span&gt;
pip &lt;span class="nb"&gt;install &lt;/span&gt;cx_Oracle
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And yes, this can work.&lt;/p&gt;

&lt;p&gt;But this is duct tape.&lt;/p&gt;

&lt;p&gt;Now you’re freezing foundational build tooling, which can introduce:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;future dependency conflicts&lt;/li&gt;
&lt;li&gt;security risk&lt;/li&gt;
&lt;li&gt;unpredictable behavior across environments&lt;/li&gt;
&lt;li&gt;even more painful breakage later&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In a cloud environment, pinning ancient build tooling is basically saying:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Let’s solve this by refusing to update ever again.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That’s not a strategy. That’s denial.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Real Fix: Stop Using cx_Oracle
&lt;/h2&gt;

&lt;p&gt;Here’s the PSA:&lt;/p&gt;

&lt;h3&gt;
  
  
  If you’re still using cx_Oracle, migrate to oracledb.
&lt;/h3&gt;

&lt;p&gt;Oracle’s supported modern replacement is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;oracledb
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This isn’t just a rename.&lt;/p&gt;

&lt;p&gt;It’s the forward path.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why oracledb Is Better
&lt;/h2&gt;

&lt;p&gt;The &lt;code&gt;oracledb&lt;/code&gt; package is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;actively maintained&lt;/li&gt;
&lt;li&gt;Oracle’s official successor to cx_Oracle&lt;/li&gt;
&lt;li&gt;more compatible with modern Python packaging standards&lt;/li&gt;
&lt;li&gt;built to survive modern CI/CD and container workflows&lt;/li&gt;
&lt;li&gt;capable of running in both &lt;strong&gt;Thin mode&lt;/strong&gt; and &lt;strong&gt;Thick mode&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Translation: it behaves better in modern cloud runtimes and managed services.&lt;/p&gt;




&lt;h2&gt;
  
  
  Migration Is Usually Easier Than You Think
&lt;/h2&gt;

&lt;p&gt;In many codebases, migration is minimal.&lt;/p&gt;

&lt;p&gt;Often it’s just:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;cx_Oracle&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;becoming:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;oracledb&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The API is intentionally similar because &lt;code&gt;oracledb&lt;/code&gt; was designed as the successor.&lt;/p&gt;

&lt;p&gt;Yes, you should test it.&lt;br&gt;
Yes, there are edge cases.&lt;/p&gt;

&lt;p&gt;But compared to debugging broken Airflow deployments and chasing packaging failures across ephemeral compute fleets?&lt;/p&gt;

&lt;p&gt;This is the easier problem.&lt;/p&gt;




&lt;h2&gt;
  
  
  If You're Running MWAA + Oracle, This Is a Real Risk
&lt;/h2&gt;

&lt;p&gt;In MWAA, dependencies are installed during environment creation or update.&lt;/p&gt;

&lt;p&gt;If your &lt;code&gt;requirements.txt&lt;/code&gt; fails to install cleanly, your environment becomes unstable fast:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;workers fail to start&lt;/li&gt;
&lt;li&gt;schedulers fail&lt;/li&gt;
&lt;li&gt;tasks stop running&lt;/li&gt;
&lt;li&gt;CloudWatch logs become a wall of stack traces&lt;/li&gt;
&lt;li&gt;your “simple DAG deployment” becomes a multi-hour incident&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And when the root cause is dependency ecosystem changes, it’s even worse because it feels random.&lt;/p&gt;

&lt;p&gt;It’s not random.&lt;/p&gt;

&lt;p&gt;It’s just the ecosystem moving forward.&lt;/p&gt;




&lt;h2&gt;
  
  
  PSA: Do This Before Your Next Maintenance Window
&lt;/h2&gt;

&lt;p&gt;If you’re using Oracle connectivity in Python:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;stop building new pipelines on &lt;code&gt;cx_Oracle&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;migrate existing workloads to &lt;code&gt;oracledb&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;test the migration before the next MWAA maintenance window forces your hand&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Because MWAA is going to keep evolving.&lt;/p&gt;

&lt;p&gt;Your codebase needs to keep up.&lt;/p&gt;




&lt;h2&gt;
  
  
  Final Thought
&lt;/h2&gt;

&lt;p&gt;A lot of outages don’t happen because AWS went down.&lt;/p&gt;

&lt;p&gt;They happen because your dependency tree quietly shifted under your feet.&lt;/p&gt;

&lt;p&gt;And that’s exactly why patching and upgrades aren’t optional in cloud engineering.&lt;/p&gt;

&lt;p&gt;They’re operational survival.&lt;/p&gt;

&lt;p&gt;So yeah…&lt;/p&gt;

&lt;p&gt;Update your stuff.&lt;br&gt;&lt;br&gt;
Test your stuff.&lt;br&gt;&lt;br&gt;
And stop using &lt;code&gt;cx_Oracle&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Your future self will thank you.&lt;/p&gt;

</description>
      <category>python</category>
      <category>aws</category>
    </item>
    <item>
      <title>Designing a Cross-Cloud Data Plane with Apache Iceberg</title>
      <dc:creator>Andrew Kalik</dc:creator>
      <pubDate>Mon, 26 Jan 2026 19:50:36 +0000</pubDate>
      <link>https://forem.com/geekusa33/designing-a-cross-cloud-data-plane-with-apache-iceberg-3n83</link>
      <guid>https://forem.com/geekusa33/designing-a-cross-cloud-data-plane-with-apache-iceberg-3n83</guid>
      <description>&lt;h1&gt;
  
  
  Designing a Cross-Cloud Data Plane with Apache Iceberg
&lt;/h1&gt;

&lt;p&gt;Most organizations don’t deliberately choose to build multi-cloud data platforms.&lt;/p&gt;

&lt;p&gt;They arrive there gradually — through acquisitions, organizational boundaries, and the reality that different teams and workloads gravitate toward different platforms. Over time, AWS and GCP both become part of the picture, whether that was the original plan or not.&lt;/p&gt;

&lt;p&gt;The challenge isn’t the presence of multiple clouds.&lt;br&gt;&lt;br&gt;
The challenge is what happens to data once they are.&lt;/p&gt;

&lt;p&gt;Rather than focusing on specific tools or implementations, this post is meant to share a mental model for reasoning about cross-cloud data platforms — one that prioritizes cost discipline, simplicity, and long-term flexibility.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why Multi-Cloud Is Often Unavoidable
&lt;/h2&gt;

&lt;p&gt;Multi-cloud is rarely ideological.&lt;/p&gt;

&lt;p&gt;It usually emerges from:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Independent teams choosing platforms that fit their needs&lt;/li&gt;
&lt;li&gt;Mergers and acquisitions that bring existing cloud footprints&lt;/li&gt;
&lt;li&gt;Organizational boundaries that resist forced standardization&lt;/li&gt;
&lt;li&gt;Analytics and AI capabilities evolving at different speeds&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In practice:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Most organizations are already multi-cloud long before they design for it.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Trying to undo that reality often leads to brittle mandates and slow delivery. A more durable approach is to design around multi-cloud instead of fighting it.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Real Cost of Multi-Cloud Is Data Duplication
&lt;/h2&gt;

&lt;p&gt;Where most multi-cloud data architectures struggle is not orchestration or tooling — it’s duplication.&lt;/p&gt;

&lt;p&gt;The same dataset is often:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Ingested separately into AWS and GCP&lt;/li&gt;
&lt;li&gt;Transformed independently in each environment&lt;/li&gt;
&lt;li&gt;Stored in different formats&lt;/li&gt;
&lt;li&gt;Reprocessed for analytics, applications, and AI&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each duplication multiplies:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Storage cost&lt;/li&gt;
&lt;li&gt;Compute cost&lt;/li&gt;
&lt;li&gt;Pipeline complexity&lt;/li&gt;
&lt;li&gt;Operational risk&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;At scale, this becomes compounding waste.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Multi-cloud becomes expensive only when data is duplicated.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The alternative is to process data once and reuse it everywhere.&lt;/p&gt;




&lt;h2&gt;
  
  
  A Three-Plane Model for Cross-Cloud Data Platforms
&lt;/h2&gt;

&lt;p&gt;To make this practical, it helps to step back and use a simple mental model that separates responsibilities into three planes, each with a distinct purpose.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Data Plane: The Source of Truth
&lt;/h2&gt;

&lt;p&gt;The data plane defines &lt;strong&gt;what the data is&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;It includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;How data is stored&lt;/li&gt;
&lt;li&gt;How tables are structured&lt;/li&gt;
&lt;li&gt;How schemas evolve&lt;/li&gt;
&lt;li&gt;How versions and snapshots are managed&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This plane should be:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Durable&lt;/li&gt;
&lt;li&gt;Engine-agnostic&lt;/li&gt;
&lt;li&gt;Slowly changing&lt;/li&gt;
&lt;li&gt;Written once and reused many times&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Apache Iceberg fits naturally here. It provides a stable, open table contract that works across object storage and compute engines, without binding data to a specific cloud or execution model.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The data plane is not optimized for speed — it is optimized for correctness and reuse.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This is what enables a true single source of truth — not by centralizing platforms, but by standardizing how data is defined and evolved.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Control Plane: Coordination Without Ownership
&lt;/h2&gt;

&lt;p&gt;The control plane defines &lt;strong&gt;when and why work happens&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;It includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Orchestration&lt;/li&gt;
&lt;li&gt;Eventing&lt;/li&gt;
&lt;li&gt;Scheduling&lt;/li&gt;
&lt;li&gt;Governance hooks&lt;/li&gt;
&lt;li&gt;Policy enforcement&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each cloud can have its own control plane. AWS and GCP do not need to share orchestration logic or operational workflows.&lt;/p&gt;

&lt;p&gt;The critical constraint is this:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Control planes coordinate access to data, but they do not own it.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This keeps orchestration stateless, replaceable, and cloud-native, while the data plane remains stable.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Consumption Plane: Execution and Experience
&lt;/h2&gt;

&lt;p&gt;The consumption plane defines &lt;strong&gt;how data is used&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;It includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Analytics and querying&lt;/li&gt;
&lt;li&gt;Applications&lt;/li&gt;
&lt;li&gt;Feature extraction&lt;/li&gt;
&lt;li&gt;Machine learning and AI workloads&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This plane is intentionally:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Ephemeral&lt;/li&gt;
&lt;li&gt;Cost-variable&lt;/li&gt;
&lt;li&gt;Optimized for workload needs&lt;/li&gt;
&lt;li&gt;Free to evolve independently&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Serverless execution fits naturally here. Compute spins up only when needed, processes a slice of data, and shuts down.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Compute should be temporary. Data should not be.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Apache Iceberg as a Shared Cross-Cloud Data Plane
&lt;/h2&gt;

&lt;p&gt;By using Apache Iceberg as the data plane, AWS and GCP can evolve independently while relying on the same underlying data contract.&lt;/p&gt;

&lt;p&gt;Iceberg allows:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Data to be processed once&lt;/li&gt;
&lt;li&gt;Schemas to evolve without rewrites&lt;/li&gt;
&lt;li&gt;Snapshots to support consistent reads&lt;/li&gt;
&lt;li&gt;Multiple consumers across clouds&lt;/li&gt;
&lt;li&gt;Object storage to remain the system of record&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The clouds don’t need shared pipelines.&lt;br&gt;&lt;br&gt;
They need &lt;strong&gt;shared tables&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Single Processing Is the Biggest Cost Reduction Lever
&lt;/h2&gt;

&lt;p&gt;Without a shared data plane:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Each cloud processes the same raw data&lt;/li&gt;
&lt;li&gt;Each environment runs its own transformations&lt;/li&gt;
&lt;li&gt;Each platform retrains AI models independently&lt;/li&gt;
&lt;li&gt;Compute cost scales with the number of clouds&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;With a shared data plane:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Data is transformed once&lt;/li&gt;
&lt;li&gt;Snapshots are reused across consumers&lt;/li&gt;
&lt;li&gt;Incremental processing minimizes rework&lt;/li&gt;
&lt;li&gt;Serverless compute stays small and targeted&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Processing a dataset once and reusing it across analytics, applications, and AI workloads is one of the most effective ways to reduce cost in cross-cloud data platforms.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Every additional cloud, engine, or workload that reuses that same processed dataset benefits from this decision without incurring proportional cost.&lt;/p&gt;

&lt;p&gt;This is architectural efficiency, not after-the-fact optimization.&lt;/p&gt;




&lt;h2&gt;
  
  
  Where AI Fits
&lt;/h2&gt;

&lt;p&gt;AI makes architectural efficiency non-negotiable, because the cost of duplicated data shows up fastest in training, retraining, and experimentation.&lt;/p&gt;

&lt;p&gt;AI does not require a separate plane.&lt;/p&gt;

&lt;p&gt;It spans all three:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The data plane provides training data and historical snapshots&lt;/li&gt;
&lt;li&gt;The control plane governs training and retraining&lt;/li&gt;
&lt;li&gt;The consumption plane handles inference and interaction&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Training the same data multiple times across clouds is expensive and unnecessary.&lt;br&gt;&lt;br&gt;
A shared data plane reduces that pressure by design.&lt;/p&gt;




&lt;h2&gt;
  
  
  Tradeoffs and Reality Checks
&lt;/h2&gt;

&lt;p&gt;This approach does not eliminate complexity entirely.&lt;/p&gt;

&lt;p&gt;Teams still need to manage:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Catalog consistency&lt;/li&gt;
&lt;li&gt;Identity and access boundaries&lt;/li&gt;
&lt;li&gt;Feature differences across execution engines&lt;/li&gt;
&lt;li&gt;Cross-cloud networking considerations&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These are governance and coordination problems — not data duplication problems — and they scale far better than parallel pipelines.&lt;/p&gt;




&lt;h2&gt;
  
  
  When This Pattern Makes Sense
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Strong fit&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Organizations operating in AWS and GCP&lt;/li&gt;
&lt;li&gt;Shared analytical and AI datasets&lt;/li&gt;
&lt;li&gt;Cost-sensitive platforms&lt;/li&gt;
&lt;li&gt;Serverless-first execution models&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Less ideal&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Ultra-low-latency streaming&lt;/li&gt;
&lt;li&gt;Workloads tightly coupled to proprietary execution features&lt;/li&gt;
&lt;li&gt;Single-cloud environments with no external consumers&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Looking Ahead: Cloud Interconnect as the Final Enabler
&lt;/h2&gt;

&lt;p&gt;One of the most exciting developments for cross-cloud data architectures is the continued maturation of private cloud interconnect between AWS and GCP.&lt;/p&gt;

&lt;p&gt;Interconnect transforms cross-cloud connectivity from a workaround into a first-class architectural feature. It provides a private, predictable network path that avoids the public internet entirely, improving not just performance, but security and control.&lt;/p&gt;

&lt;p&gt;As interconnect becomes more accessible:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Cross-cloud data access becomes more deliberate and auditable&lt;/li&gt;
&lt;li&gt;Serverless consumption across clouds becomes more practical&lt;/li&gt;
&lt;li&gt;Data no longer needs to be duplicated simply to feel “close” or “safe”&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is where the three-plane model fully comes together. A shared data plane backed by Iceberg, independent control planes per cloud, and ephemeral consumption planes can operate across platforms with confidence.&lt;/p&gt;

&lt;p&gt;Instead of copying data defensively, teams can design for access intentionally — reducing cost, tightening security boundaries, and simplifying how data moves between clouds.&lt;/p&gt;

&lt;p&gt;It’s one of the clearest signals that cross-cloud data architectures are moving from workaround to first-class design.&lt;/p&gt;

&lt;p&gt;Interconnect doesn’t change the need for good architecture.&lt;br&gt;&lt;br&gt;
It rewards it.&lt;/p&gt;




&lt;h2&gt;
  
  
  Closing Thoughts
&lt;/h2&gt;

&lt;p&gt;Multi-cloud does not require identical architectures.&lt;/p&gt;

&lt;p&gt;It requires:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A shared data plane&lt;/li&gt;
&lt;li&gt;Independent control planes&lt;/li&gt;
&lt;li&gt;Ephemeral, serverless consumption&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;By treating Apache Iceberg as the data contract, teams can avoid duplicating data, minimize compute cost, and support analytics and AI across AWS and GCP without rebuilding their platform for each cloud.&lt;/p&gt;

&lt;p&gt;In practice, the most resilient architectures make the fewest assumptions about where compute runs — and the strongest assumptions about how data is defined.&lt;/p&gt;

</description>
      <category>dataengineering</category>
      <category>gcp</category>
      <category>aws</category>
      <category>bigdata</category>
    </item>
    <item>
      <title>A Pragmatic, Event-Driven Serverless Data Architecture</title>
      <dc:creator>Andrew Kalik</dc:creator>
      <pubDate>Sat, 24 Jan 2026 16:28:43 +0000</pubDate>
      <link>https://forem.com/geekusa33/a-pragmatic-event-driven-serverless-data-architecture-52bp</link>
      <guid>https://forem.com/geekusa33/a-pragmatic-event-driven-serverless-data-architecture-52bp</guid>
      <description>&lt;h2&gt;
  
  
  MWAA + Glue + Iceberg + Snowflake
&lt;/h2&gt;

&lt;p&gt;Batch data pipelines are often far more expensive and complex than they need to be.&lt;/p&gt;

&lt;p&gt;Many teams still operate always-on schedulers, persistent Spark clusters, and long-running infrastructure for workloads that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Run a few times per day&lt;/li&gt;
&lt;li&gt;Complete in minutes&lt;/li&gt;
&lt;li&gt;Are triggered by data arrival, not time&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This post walks through a pragmatic, event-driven serverless data architecture on AWS that focuses on &lt;strong&gt;real cost reduction and operational simplification&lt;/strong&gt;, not architectural theory.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Core Problem: Paying for Idle Data Infrastructure
&lt;/h2&gt;

&lt;p&gt;A traditional batch pipeline commonly includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Always-running Airflow workers&lt;/li&gt;
&lt;li&gt;Persistent EMR or Spark clusters&lt;/li&gt;
&lt;li&gt;Cron-based scheduling for event-driven data&lt;/li&gt;
&lt;li&gt;Infrastructure sized for peak usage&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In practice, this means teams pay for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Idle CPU and memory&lt;/li&gt;
&lt;li&gt;Idle orchestration capacity&lt;/li&gt;
&lt;li&gt;Ongoing patching and operational overhead&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For many pipelines, &lt;strong&gt;most of the cost is spent waiting&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Design Principles
&lt;/h2&gt;

&lt;p&gt;This architecture is built around a few non-negotiable principles:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Event-driven first, schedule only when necessary&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Fully serverless wherever possible&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Task-level isolation&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Pay only when something executes&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Open storage formats to avoid lock-in&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The system reacts to data. It does not sit idle waiting for a clock.&lt;/p&gt;




&lt;h2&gt;
  
  
  Architecture Overview
&lt;/h2&gt;

&lt;p&gt;High-level flow:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Data arrives in Amazon S3 or an upstream system&lt;/li&gt;
&lt;li&gt;An event (for example, S3 object creation) triggers orchestration&lt;/li&gt;
&lt;li&gt;Amazon MWAA (Serverless) coordinates the workflow&lt;/li&gt;
&lt;li&gt;AWS Glue (Serverless) executes transformations&lt;/li&gt;
&lt;li&gt;Data is written as Apache Iceberg tables in Amazon S3&lt;/li&gt;
&lt;li&gt;Tables are registered in the AWS Glue Data Catalog&lt;/li&gt;
&lt;li&gt;Snowflake queries the data using external tables&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The key shift is &lt;strong&gt;reactive execution&lt;/strong&gt; — pipelines run because data changed, not because time passed.&lt;/p&gt;




&lt;h2&gt;
  
  
  Event-Driven Orchestration with MWAA Serverless
&lt;/h2&gt;

&lt;p&gt;Airflow is still used, but only for what it does best:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Dependency management&lt;/li&gt;
&lt;li&gt;Retry semantics&lt;/li&gt;
&lt;li&gt;Visibility and auditability&lt;/li&gt;
&lt;li&gt;Coordinating multiple services&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;With MWAA Serverless:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;There are no always-on workers&lt;/li&gt;
&lt;li&gt;There is no capacity planning&lt;/li&gt;
&lt;li&gt;There is no idle orchestration cost&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Events (for example, S3 notifications via EventBridge) trigger DAG runs only when new data arrives. MWAA spins up to coordinate execution and scales back down afterward.&lt;/p&gt;

&lt;p&gt;Airflow becomes &lt;strong&gt;control flow&lt;/strong&gt;, not infrastructure.&lt;/p&gt;




&lt;h2&gt;
  
  
  Glue Serverless as Event-Driven Compute
&lt;/h2&gt;

&lt;p&gt;Each transformation step is implemented as a small, purpose-built Glue job:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;One responsibility per job&lt;/li&gt;
&lt;li&gt;No shared cluster assumptions&lt;/li&gt;
&lt;li&gt;Independent scaling and retries&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;From a cost perspective:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Jobs run only when triggered&lt;/li&gt;
&lt;li&gt;There is no idle cluster time&lt;/li&gt;
&lt;li&gt;Failures are isolated and cheap to retry&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Instead of paying for a Spark cluster all day, you pay &lt;strong&gt;per execution&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why Apache Iceberg Enables Cost Reduction
&lt;/h2&gt;

&lt;p&gt;Apache Iceberg is foundational to making this architecture efficient.&lt;/p&gt;

&lt;p&gt;Iceberg enables:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Schema evolution without rewriting entire tables&lt;/li&gt;
&lt;li&gt;Partition evolution without backfills&lt;/li&gt;
&lt;li&gt;Snapshot-based time travel for recovery&lt;/li&gt;
&lt;li&gt;Multiple engines reading the same data&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;From a cost perspective:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;No duplicate datasets per consumer&lt;/li&gt;
&lt;li&gt;No full-table rewrites for small schema changes&lt;/li&gt;
&lt;li&gt;No tight coupling between producers and consumers&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Iceberg supports &lt;strong&gt;incremental, event-driven writes&lt;/strong&gt; without downstream reprocessing.&lt;/p&gt;




&lt;h2&gt;
  
  
  Surfacing Data to Snowflake Without Duplication
&lt;/h2&gt;

&lt;p&gt;Snowflake consumes Iceberg tables using external tables backed by Amazon S3 and the Glue Data Catalog.&lt;/p&gt;

&lt;p&gt;This approach:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Avoids copying data into Snowflake-managed storage&lt;/li&gt;
&lt;li&gt;Makes data available immediately after it is written&lt;/li&gt;
&lt;li&gt;Keeps storage costs centralized in S3&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If performance requirements change later, data can still be materialized — but &lt;strong&gt;duplication becomes a deliberate choice&lt;/strong&gt;, not a default.&lt;/p&gt;




&lt;h2&gt;
  
  
  Where the Cost Savings Actually Come From
&lt;/h2&gt;

&lt;p&gt;This architecture removes several major cost drivers.&lt;/p&gt;

&lt;h3&gt;
  
  
  Traditional Pipeline Costs
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;24/7 Airflow workers&lt;/li&gt;
&lt;li&gt;Always-on Spark or EMR clusters&lt;/li&gt;
&lt;li&gt;Idle compute between scheduled runs&lt;/li&gt;
&lt;li&gt;Operational effort maintaining infrastructure&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Even small clusters add up over time.&lt;/p&gt;




&lt;h3&gt;
  
  
  Costs Removed by This Architecture
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Eliminated&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Idle Airflow capacity&lt;/li&gt;
&lt;li&gt;Persistent Spark clusters&lt;/li&gt;
&lt;li&gt;Long-running EC2 instances&lt;/li&gt;
&lt;li&gt;Custom metastore infrastructure&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Introduced&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Per-event MWAA execution cost&lt;/li&gt;
&lt;li&gt;Per-job Glue runtime cost&lt;/li&gt;
&lt;li&gt;Object storage costs in S3&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In practice, teams often see:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Near-zero idle compute spend&lt;/li&gt;
&lt;li&gt;Costs directly proportional to data volume&lt;/li&gt;
&lt;li&gt;Predictable per-run pricing&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For short-lived batch workloads, this frequently results in &lt;strong&gt;meaningful cost reduction&lt;/strong&gt; without sacrificing capability.&lt;/p&gt;




&lt;h2&gt;
  
  
  Operational Simplification (The Hidden Savings)
&lt;/h2&gt;

&lt;p&gt;Cost is not just dollars.&lt;/p&gt;

&lt;p&gt;This architecture also reduces:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;On-call surface area&lt;/li&gt;
&lt;li&gt;Patch and upgrade cycles&lt;/li&gt;
&lt;li&gt;Capacity planning work&lt;/li&gt;
&lt;li&gt;Failure blast radius&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Fewer always-on systems means fewer things that can fail silently.&lt;/p&gt;




&lt;h2&gt;
  
  
  Tradeoffs to Be Aware Of
&lt;/h2&gt;

&lt;p&gt;This pattern does introduce responsibilities:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Event-driven pipelines require idempotent design&lt;/li&gt;
&lt;li&gt;Iceberg requires schema and table discipline&lt;/li&gt;
&lt;li&gt;External tables may not suit all query patterns&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These are engineering tradeoffs, not infrastructure problems.&lt;/p&gt;




&lt;h2&gt;
  
  
  When This Pattern Works Best
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Strong fit&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Event-driven or near-real-time batch ingestion&lt;/li&gt;
&lt;li&gt;Teams optimizing for cost and simplicity&lt;/li&gt;
&lt;li&gt;Lakehouse or multi-engine environments&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Less ideal&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Ultra-low-latency streaming&lt;/li&gt;
&lt;li&gt;Always-on interactive workloads&lt;/li&gt;
&lt;li&gt;Extremely large, tightly coupled transformations&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;Serverless data architectures are not about removing structure.&lt;/p&gt;

&lt;p&gt;They are about &lt;strong&gt;aligning cost and complexity with reality&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;By combining MWAA Serverless, Glue Serverless, and Apache Iceberg, teams can build pipelines that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;React to data instead of schedules&lt;/li&gt;
&lt;li&gt;Eliminate idle compute&lt;/li&gt;
&lt;li&gt;Scale naturally&lt;/li&gt;
&lt;li&gt;Remain flexible as requirements evolve&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In many cases, the simplest architecture is also the most cost-effective one.&lt;/p&gt;

</description>
      <category>serverless</category>
      <category>data</category>
      <category>dataengineering</category>
    </item>
    <item>
      <title>Building a Private Photo Sharing Platform on AWS</title>
      <dc:creator>Andrew Kalik</dc:creator>
      <pubDate>Mon, 05 Jan 2026 05:17:31 +0000</pubDate>
      <link>https://forem.com/geekusa33/building-a-private-photo-sharing-platform-on-aws-1k7b</link>
      <guid>https://forem.com/geekusa33/building-a-private-photo-sharing-platform-on-aws-1k7b</guid>
      <description>&lt;p&gt;In July 2024, my dad had a massive stroke.&lt;/p&gt;

&lt;p&gt;In the weeks that followed, my sister, my wife, other family members, and I started going through my dad’s barn. We were not expecting much beyond tools and old boxes. What we found instead was almost &lt;strong&gt;3,000 photos&lt;/strong&gt;, handwritten letters, family recipes, notes, and other documents. Many of them had been sitting there for more than &lt;strong&gt;15 years&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Some were in decent shape. Others were dusty, faded, warped, or brittle. All of them felt irreplaceable.&lt;/p&gt;

&lt;p&gt;At some point, the question stopped being what did we find and became how do we make sure we do not lose this.&lt;/p&gt;

&lt;h2&gt;
  
  
  This Was Not a Photo App Problem
&lt;/h2&gt;

&lt;p&gt;The emotional part came first. The technical problem came later.&lt;/p&gt;

&lt;p&gt;We needed a way to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Scan&lt;/strong&gt; a large volume of photos and documents quickly&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Clean&lt;/strong&gt; them up enough to be readable and usable&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Avoid&lt;/strong&gt; passing fragile originals around&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Share&lt;/strong&gt; access privately with family across the United States&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Ensure&lt;/strong&gt; nothing disappeared if a laptop died or a service changed direction&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This was not about building a public gallery or a social feed.&lt;/p&gt;

&lt;p&gt;It was about &lt;strong&gt;preserving family history quietly and predictably&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why I Did Not Use Facebook Instagram or a Consumer Photo Platform
&lt;/h2&gt;

&lt;p&gt;I did not want this living on Facebook or Instagram.&lt;/p&gt;

&lt;p&gt;Not because those platforms cannot store photos, but because I wanted:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Private sharing&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;No social feeds&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;No resurfacing or reminders&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Clear ownership of the data&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Predictable long term access&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I also looked at consumer photo services. Most of them technically worked, but they required giving up control in ways that did not feel right for this situation.&lt;/p&gt;

&lt;p&gt;I wanted something boring, understandable, and under my control.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why This Turned Into an AWS Architecture
&lt;/h2&gt;

&lt;p&gt;Eventually this stopped being a where do we upload photos question and became an infrastructure problem.&lt;/p&gt;

&lt;p&gt;I deployed a private self hosted photo sharing platform on AWS and later presented it to my local AWS User Group as a real world case study.&lt;/p&gt;

&lt;p&gt;The goal was not polish. The goal was durability and access.&lt;/p&gt;

&lt;h2&gt;
  
  
  Deployment Architecture
&lt;/h2&gt;

&lt;p&gt;The platform runs on a simple AWS setup:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;EC2&lt;/strong&gt; runs the application&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;EBS&lt;/strong&gt; provides primary attached storage&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;S3&lt;/strong&gt; stores original scans and long term artifacts&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Elastic Load Balancer&lt;/strong&gt; handles HTTPS access&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Family members access the system over HTTPS. Active data lives on EBS. Originals live in S3.&lt;/p&gt;

&lt;p&gt;Nothing exotic.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why I Chose to Self Host
&lt;/h2&gt;

&lt;p&gt;Self hosting was not about ideology.&lt;/p&gt;

&lt;p&gt;I needed:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Control over data&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Independent backups&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Predictable costs&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;A UI my family could use&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What I Learned
&lt;/h2&gt;

&lt;p&gt;Two things stood out.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Emotional situations change technical priorities&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Simple systems are easier to trust and explain&lt;/strong&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Closing Thoughts
&lt;/h2&gt;

&lt;p&gt;I did not set out to build a platform.&lt;/p&gt;

&lt;p&gt;I set out to make sure we did not lose pieces of our family.&lt;/p&gt;

&lt;p&gt;The architecture mattered, but the mindset mattered more.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Own the data&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Keep the system understandable&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Optimize for recovery&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>aws</category>
      <category>opensource</category>
      <category>photoprism</category>
    </item>
    <item>
      <title>Event-Driven Data Pipelines - Real-Time Orchestration on AWS</title>
      <dc:creator>Andrew Kalik</dc:creator>
      <pubDate>Sat, 03 Jan 2026 17:26:31 +0000</pubDate>
      <link>https://forem.com/geekusa33/event-driven-data-pipelines-real-time-orchestration-on-aws-510p</link>
      <guid>https://forem.com/geekusa33/event-driven-data-pipelines-real-time-orchestration-on-aws-510p</guid>
      <description>&lt;p&gt;For a long time, batch pipelines were “good enough.”&lt;br&gt;&lt;br&gt;
Nightly jobs ran. Dashboards updated the next morning. Everyone learned to live with the lag.&lt;/p&gt;

&lt;p&gt;But as data volumes grew — and expectations for freshness grew even faster — those tradeoffs stopped being acceptable.&lt;/p&gt;

&lt;p&gt;I originally developed this material while preparing a talk for &lt;strong&gt;AWS Summit Los Angeles&lt;/strong&gt;, and later refined it through conversations and feedback at the &lt;strong&gt;Portland AWS User Group&lt;/strong&gt;. This post is the expanded, written version of that work — focused on what actually breaks in real systems, and how event-driven architectures help fix it.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why Batch Pipelines Start to Break Down
&lt;/h2&gt;

&lt;p&gt;Most teams don’t &lt;em&gt;choose&lt;/em&gt; slow pipelines — they inherit them.&lt;/p&gt;

&lt;p&gt;Over time, the same failure modes show up again and again:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Slow feedback loops&lt;/strong&gt; – Nightly batch jobs mean yesterday’s data drives today’s decisions.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Manual orchestration&lt;/strong&gt; – Scripts and human coordination introduce fragility.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Duplicate or failed runs&lt;/strong&gt; – No idempotency leads to wasted compute and inconsistent results.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Missed or late events&lt;/strong&gt; – Downstream teams lose trust when data silently disappears.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Over-provisioned infrastructure&lt;/strong&gt; – Jobs sized “just in case” drive unnecessary cost.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Limited observability&lt;/strong&gt; – It’s difficult to answer a basic question: &lt;em&gt;Where is my data right now?&lt;/em&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These were the exact pain points that kept coming up in conversations after both talks — and they’re strong signals that schedule-driven pipelines are being pushed past what they were designed to do.&lt;/p&gt;




&lt;h2&gt;
  
  
  What “Event-Driven” Really Means
&lt;/h2&gt;

&lt;p&gt;At a high level, an event-driven pipeline reacts to &lt;em&gt;something happening&lt;/em&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A file lands in object storage
&lt;/li&gt;
&lt;li&gt;An API request is received
&lt;/li&gt;
&lt;li&gt;A message is published to a queue or stream
&lt;/li&gt;
&lt;li&gt;A record arrives from an upstream system
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Instead of polling on a fixed schedule, the pipeline starts &lt;strong&gt;the moment the event occurs&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;This framing resonated strongly at both &lt;strong&gt;AWS Summit LA&lt;/strong&gt; and the &lt;strong&gt;Portland AWS User Group&lt;/strong&gt;:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Stop asking &lt;em&gt;“when should this run?”&lt;/em&gt;&lt;br&gt;&lt;br&gt;
Start asking &lt;em&gt;“what should trigger this?”&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That shift alone simplifies architecture decisions and reduces wasted compute.&lt;/p&gt;




&lt;h2&gt;
  
  
  Event Triggers &amp;amp; Routing: The Backbone
&lt;/h2&gt;

&lt;p&gt;Modern AWS architectures give you multiple ways to capture and route events:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Object storage events
&lt;/li&gt;
&lt;li&gt;API-driven ingestion
&lt;/li&gt;
&lt;li&gt;Message queues
&lt;/li&gt;
&lt;li&gt;Streaming platforms
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;What matters most is &lt;strong&gt;decoupling producers from consumers&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;This is where event routing becomes more than just plumbing. A centralized event bus allows you to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Filter noisy events
&lt;/li&gt;
&lt;li&gt;Transform payloads
&lt;/li&gt;
&lt;li&gt;Fan out to multiple consumers
&lt;/li&gt;
&lt;li&gt;Make data flow explicit and observable
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;One point I emphasized heavily in the Portland AWS User Group talk is that routing is an architectural boundary. When done well, teams can evolve independently without coordinating deployments or breaking downstream consumers.&lt;/p&gt;




&lt;h2&gt;
  
  
  Workflow Orchestration Without Schedule Glue
&lt;/h2&gt;

&lt;p&gt;Once an event is routed, something still needs to coordinate the work.&lt;/p&gt;

&lt;p&gt;Depending on complexity, orchestration might involve:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Lightweight coordination for simple pipelines
&lt;/li&gt;
&lt;li&gt;Stateful workflows for multi-step transformations
&lt;/li&gt;
&lt;li&gt;Long-running or dependency-heavy DAGs
&lt;/li&gt;
&lt;li&gt;Request-driven data products
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Airflow still plays an important role here — not as a &lt;em&gt;time-based scheduler&lt;/em&gt;, but as a &lt;strong&gt;state coordinator&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;This distinction landed particularly well at &lt;strong&gt;AWS Summit Los Angeles&lt;/strong&gt;, where many teams were already using Airflow but struggling to move beyond cron-driven DAGs.&lt;/p&gt;




&lt;h2&gt;
  
  
  Transforming Data with Serverless ETL
&lt;/h2&gt;

&lt;p&gt;Once data is flowing, transformation is where value is created.&lt;/p&gt;

&lt;p&gt;A serverless ETL approach works especially well in event-driven systems because it:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Scales automatically with demand
&lt;/li&gt;
&lt;li&gt;Eliminates idle infrastructure
&lt;/li&gt;
&lt;li&gt;Aligns cost with actual work performed
&lt;/li&gt;
&lt;li&gt;Integrates cleanly with cataloged datasets
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Common patterns include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Micro-batch processing as data lands
&lt;/li&gt;
&lt;li&gt;Small-file compaction and partition optimization
&lt;/li&gt;
&lt;li&gt;Deduplication and data quality enforcement
&lt;/li&gt;
&lt;li&gt;Normalizing raw inputs into analytics-ready formats
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These patterns consistently came up in follow-up discussions after both talks, especially from teams trying to reduce operational overhead without sacrificing data freshness.&lt;/p&gt;




&lt;h2&gt;
  
  
  Resiliency Is Not Optional
&lt;/h2&gt;

&lt;p&gt;In event-driven systems, failures don’t disappear — they become more visible.&lt;/p&gt;

&lt;p&gt;That’s a good thing.&lt;/p&gt;

&lt;p&gt;Resilient pipelines are built with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Retries at every execution boundary
&lt;/li&gt;
&lt;li&gt;Idempotent processing to avoid duplicates
&lt;/li&gt;
&lt;li&gt;Dead-letter queues for poison messages
&lt;/li&gt;
&lt;li&gt;Buffering to absorb traffic spikes
&lt;/li&gt;
&lt;li&gt;Clear failure paths instead of silent drops
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This section generated some of the best questions at the &lt;strong&gt;Portland AWS User Group&lt;/strong&gt;, particularly around how to design for failure without over-engineering.&lt;/p&gt;




&lt;h2&gt;
  
  
  Observability: Knowing Where Your Data Is
&lt;/h2&gt;

&lt;p&gt;If you can’t answer &lt;em&gt;“what’s happening right now?”&lt;/em&gt;, the pipeline isn’t finished.&lt;/p&gt;

&lt;p&gt;Strong observability means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;End-to-end visibility into pipeline state
&lt;/li&gt;
&lt;li&gt;Metrics that surface lag and backlog
&lt;/li&gt;
&lt;li&gt;Clear lineage from source to output
&lt;/li&gt;
&lt;li&gt;The ability to trace a single event across services
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Event-driven architectures make this easier — but only if observability is designed in from the start.&lt;/p&gt;




&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;This post reflects lessons learned not just from slides, but from real conversations — at &lt;strong&gt;AWS Summit Los Angeles&lt;/strong&gt;, at the &lt;strong&gt;Portland AWS User Group&lt;/strong&gt;, and with teams actively modernizing their data platforms.&lt;/p&gt;

&lt;p&gt;Event-driven pipelines aren’t about chasing trends.&lt;br&gt;&lt;br&gt;
They’re about aligning your data systems with how the business actually operates — &lt;strong&gt;in real time, not yesterday&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;When done well, they are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Faster
&lt;/li&gt;
&lt;li&gt;More cost-efficient
&lt;/li&gt;
&lt;li&gt;More resilient
&lt;/li&gt;
&lt;li&gt;Easier to reason about at scale
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And most importantly: they restore trust in the data.&lt;/p&gt;

&lt;p&gt;If you attended either talk — or you’re tackling similar challenges — feel free to connect with me. I’m always happy to dig deeper into specific patterns, tradeoffs, or failure modes.&lt;/p&gt;

</description>
      <category>dataengineering</category>
      <category>aws</category>
      <category>eventdriven</category>
    </item>
    <item>
      <title>Relearning How to Learn: Preparing for AWS Certifications with ADHD</title>
      <dc:creator>Andrew Kalik</dc:creator>
      <pubDate>Sat, 27 Dec 2025 18:34:04 +0000</pubDate>
      <link>https://forem.com/geekusa33/relearning-how-to-learn-preparing-for-aws-certifications-with-adhd-4ikp</link>
      <guid>https://forem.com/geekusa33/relearning-how-to-learn-preparing-for-aws-certifications-with-adhd-4ikp</guid>
      <description>&lt;p&gt;For as long as I can remember, I’ve not been a great test taker.&lt;/p&gt;

&lt;p&gt;Timed exams, dense wording, and second-guessing myself under pressure have never played to my strengths. Add ADHD and a learning disability on top of that, and standardized tests have always been something I approach with hesitation.&lt;/p&gt;

&lt;p&gt;Because of that, I put off AWS certifications for a long time. Not because I didn’t work with AWS, but because I already knew the testing format was something I struggled with.&lt;/p&gt;

&lt;p&gt;Eventually, I decided that avoiding it forever wasn’t helping either.&lt;/p&gt;

&lt;p&gt;So I studied for — and passed — the AWS Cloud Practitioner exam. Now that I’ve figured out what actually works for me when preparing, I plan to use the same approach as I work toward other cloud certifications.&lt;/p&gt;

&lt;p&gt;This post isn’t about exam hacks or speed-running certs. It’s about relearning how I learn — and what actually worked for me.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;I Had to Stop Studying the “Right” Way&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Most certification prep advice assumes you can sit down for long, structured study sessions and steadily grind through material. That has never worked for me. When I tried to force it, I just procrastinated harder.&lt;/p&gt;

&lt;p&gt;The problem wasn’t learning AWS.&lt;br&gt;
The problem was the test itself.&lt;/p&gt;

&lt;p&gt;Once I stopped pretending otherwise, my approach changed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Using Practice Exams Without Spiraling&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Practice exams can be rough if you already struggle with testing. A bad score can quickly turn into “this is why I don’t do this.”&lt;/p&gt;

&lt;p&gt;So I stopped treating practice exams as pass/fail signals and started using them as feedback:&lt;br&gt;
    • Which questions did I misread?&lt;br&gt;
    • Where did wording trip me up?&lt;br&gt;
    • What concepts was I almost understanding?&lt;/p&gt;

&lt;p&gt;That gave me direction without wrecking my confidence.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Understanding Beats Memorization&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Memorization has never been reliable for me. Context is.&lt;/p&gt;

&lt;p&gt;Instead of trying to memorize services, I focused on:&lt;br&gt;
    • What problem a service solves&lt;br&gt;
    • When it’s the wrong choice&lt;br&gt;
    • What trade-offs it makes&lt;/p&gt;

&lt;p&gt;That made it possible to reason through questions instead of freezing when I couldn’t recall a specific detail.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Short Sessions, Hard Stops&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I didn’t do marathon study sessions. I did short, focused bursts. Sometimes 20 minutes. Sometimes less.&lt;/p&gt;

&lt;p&gt;If my brain checked out, I stopped. Forcing it just made the next session worse.&lt;/p&gt;

&lt;p&gt;Progress wasn’t neat or linear, and that had to be okay.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Exam Day Was Still Uncomfortable&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I didn’t walk into the Cloud Practitioner exam feeling confident. I walked in nervous and fully expecting to overthink things.&lt;/p&gt;

&lt;p&gt;And I still passed.&lt;/p&gt;

&lt;p&gt;Not because I suddenly became a good test taker — but because I stopped fighting how my brain works and planned around it instead.&lt;/p&gt;

&lt;p&gt;⸻&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why I’m Sharing This&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If you’ve been avoiding certifications because you’re bad at tests, have ADHD, or don’t learn well from traditional study methods — you’re not alone.&lt;/p&gt;

&lt;p&gt;You’re not broken.&lt;br&gt;
You’re not lazy.&lt;br&gt;
And you’re not imagining how hard this can be.&lt;/p&gt;

&lt;p&gt;You don’t need a perfect study plan.&lt;br&gt;
You need one that doesn’t work against you.&lt;/p&gt;

&lt;p&gt;Passing one exam doesn’t make everything easy — but it does make the next one feel possible.&lt;/p&gt;

&lt;p&gt;And that’s enough to keep going.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What I Actually Used to Study&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I didn’t rely on a single resource. I bounced between a few depending on what I needed that day:&lt;br&gt;
    • AWS Skill Builder for official context and terminology&lt;br&gt;
    • Tutorials Dojo for realistic practice questions and explanations&lt;br&gt;
    • QA Academy to reinforce fundamentals and fill in gaps&lt;br&gt;
    • Pluralsight when I needed a concept explained differently&lt;/p&gt;

&lt;p&gt;I didn’t follow any of them linearly. I treated them as a menu and pulled from whatever helped the concept click.&lt;/p&gt;

&lt;p&gt;That flexibility mattered more than the specific platform.&lt;/p&gt;

</description>
      <category>aws</category>
      <category>certification</category>
      <category>cloudpractitioner</category>
    </item>
  </channel>
</rss>
