<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Rose Wabere</title>
    <description>The latest articles on Forem by Rose Wabere (@rosewabere).</description>
    <link>https://forem.com/rosewabere</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F2989146%2Fc72ecbdb-4eef-4e15-8f73-de4cb7d2086b.jpeg</url>
      <title>Forem: Rose Wabere</title>
      <link>https://forem.com/rosewabere</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/rosewabere"/>
    <language>en</language>
    <item>
      <title>REST vs GraphQL vs WebSockets vs Webhooks: A Real-World Decision Guide (With Code)</title>
      <dc:creator>Rose Wabere</dc:creator>
      <pubDate>Tue, 31 Mar 2026 20:34:03 +0000</pubDate>
      <link>https://forem.com/rosewabere/rest-vs-graphql-vs-websockets-vs-webhooks-a-real-world-decision-guide-with-code-2bem</link>
      <guid>https://forem.com/rosewabere/rest-vs-graphql-vs-websockets-vs-webhooks-a-real-world-decision-guide-with-code-2bem</guid>
      <description>&lt;p&gt;You have used all of these. But when someone asks you, maybe in an interview, or in a system design meeting, &lt;strong&gt;why&lt;/strong&gt; you chose WebSockets over polling, or webhooks over a queue, can you answer precisely?&lt;/p&gt;

&lt;p&gt;This isn't another definitions post. This article is about &lt;strong&gt;knowing which tool to reach for and why&lt;/strong&gt;, with code you can actually use.&lt;/p&gt;

&lt;p&gt;Quick mental model before we start:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Communication patterns  →  REST | GraphQL | WebSockets | Webhooks
Code execution model    →  async/await
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;These live at different layers. Conflating them is the most common source of confusion.&lt;/p&gt;




&lt;h2&gt;
  
  
  async/await: The Foundation, Not the Feature
&lt;/h2&gt;

&lt;p&gt;Let's kill one myth immediately: &lt;strong&gt;async/await is not a communication pattern. It's how your server handles waiting.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Every I/O operation — database queries, HTTP calls, file reads — makes your code wait. async/await ensures that waiting doesn't freeze every other user's request.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# BAD: blocks the event loop - all other requests stall for 40ms
&lt;/span&gt;&lt;span class="nd"&gt;@app.get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/order/{id}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_order&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;order&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SELECT * FROM orders WHERE id = %s&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# blocking
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;order&lt;/span&gt;

&lt;span class="c1"&gt;# GOOD: yields control during the wait - other requests run while DB responds
&lt;/span&gt;&lt;span class="nd"&gt;@app.get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/order/{id}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_order&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;order&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fetch_one&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SELECT * FROM orders WHERE id = $1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;order&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;When it matters most:&lt;/strong&gt; High-concurrency services. A delivery platform handling 500 simultaneous drivers checking order status. An NGO dashboard pulling live survey data.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The trap:&lt;/strong&gt; Calling a synchronous library inside an &lt;code&gt;async&lt;/code&gt; function. This blocks the entire event loop.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;  &lt;span class="c1"&gt;# synchronous - don't use this inside async functions
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;httpx&lt;/span&gt;     &lt;span class="c1"&gt;# async-capable - use this instead
&lt;/span&gt;
&lt;span class="c1"&gt;# WRONG
&lt;/span&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;fetch_rate&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://api.exchangerate.host/latest&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# blocks event loop
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# CORRECT
&lt;/span&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;fetch_rate&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;httpx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;AsyncClient&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://api.exchangerate.host/latest&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  REST: Default Choice for Good Reason
&lt;/h2&gt;

&lt;p&gt;Use REST when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The client initiates all interactions&lt;/li&gt;
&lt;li&gt;Data doesn't change faster than the user refreshes&lt;/li&gt;
&lt;li&gt;You're building standard CRUD
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;fastapi&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;FastAPI&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pydantic&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BaseModel&lt;/span&gt;

&lt;span class="n"&gt;app&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;FastAPI&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Order&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BaseModel&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;
    &lt;span class="n"&gt;items&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;total&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;

&lt;span class="nd"&gt;@app.get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/orders/{order_id}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_order&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;order_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;order&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fetch_one&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SELECT * FROM orders WHERE id = $1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;order_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;order&lt;/span&gt;

&lt;span class="nd"&gt;@app.post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/orders&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;create_order&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;order&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Order&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;INSERT INTO orders (customer_id, items, total) VALUES ($1, $2, $3) RETURNING id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;order&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;customer_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;order&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;items&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;order&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;total&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;order_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]}&lt;/span&gt;

&lt;span class="nd"&gt;@app.delete&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/orders/{order_id}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;cancel_order&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;order_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;UPDATE orders SET status = &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;cancelled&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt; WHERE id = $1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;order_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cancelled&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Real-world scenario:&lt;/strong&gt; A SACCO member portal. Members log in, check their loan balance, submit a loan application. All request/response. No data is changing while they're looking at a page. REST is perfect.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Production considerations:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Add proper HTTP caching headers (&lt;code&gt;Cache-Control&lt;/code&gt;, &lt;code&gt;ETag&lt;/code&gt;) - REST can be highly cacheable&lt;/li&gt;
&lt;li&gt;Version your APIs (&lt;code&gt;/v1/orders&lt;/code&gt;) from day one&lt;/li&gt;
&lt;li&gt;Return proper status codes (201 for created, 404 for not found, 422 for validation errors - FastAPI handles this automatically)&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  GraphQL: REST With Client-Controlled Queries
&lt;/h2&gt;

&lt;p&gt;Use GraphQL when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Multiple clients (mobile, web, third-party integrations) need different shapes of the same data&lt;/li&gt;
&lt;li&gt;You're constantly over-fetching or under-fetching with REST&lt;/li&gt;
&lt;li&gt;You have deeply nested, relational data
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Using Strawberry (best GraphQL library for FastAPI)
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;strawberry&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;fastapi&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;FastAPI&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;strawberry.fastapi&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;GraphQLRouter&lt;/span&gt;

&lt;span class="nd"&gt;@strawberry.type&lt;/span&gt;
&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Order&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;
    &lt;span class="n"&gt;total&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;
    &lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;

&lt;span class="nd"&gt;@strawberry.type&lt;/span&gt;
&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;User&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;email&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;orders&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Order&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="nd"&gt;@strawberry.type&lt;/span&gt;
&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nd"&gt;@strawberry.field&lt;/span&gt;
    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;user&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;User&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;get_user_with_orders&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;schema&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;strawberry&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Schema&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;Query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;graphql_router&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;GraphQLRouter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;schema&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;app&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;FastAPI&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;include_router&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;graphql_router&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;prefix&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/graphql&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now a mobile app can request exactly what it needs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight graphql"&gt;&lt;code&gt;&lt;span class="c"&gt;# Mobile - bandwidth-conscious, needs minimal data&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="k"&gt;query&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="n"&gt;orders&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;last&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="c"&gt;# Web dashboard - needs full details&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="k"&gt;query&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="n"&gt;email&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="n"&gt;orders&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="n"&gt;total&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="n"&gt;status&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="n"&gt;items&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;price&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Real-world scenario:&lt;/strong&gt; An NGO platform serving both field officers on 2G mobile and HQ analysts on desktops. Field officers need lightweight data. Analysts need full datasets. GraphQL lets one API serve both without maintaining separate endpoints.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Production considerations:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Implement depth limiting to prevent abusive nested queries&lt;/li&gt;
&lt;li&gt;Add query complexity analysis: prevent &lt;code&gt;user → orders → user → orders&lt;/code&gt; recursion&lt;/li&gt;
&lt;li&gt;GraphQL doesn't cache well at HTTP layer: use DataLoader for N+1 query prevention&lt;/li&gt;
&lt;li&gt;Don't default to GraphQL for simple services: it adds real overhead&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  WebSockets: When the Server Needs to Talk First
&lt;/h2&gt;

&lt;p&gt;Use WebSockets when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Data changes continuously, and the user needs to see updates immediately&lt;/li&gt;
&lt;li&gt;Polling would generate unacceptable load or latency&lt;/li&gt;
&lt;li&gt;Both client and server need to send messages freely
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;fastapi&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;FastAPI&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;WebSocket&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;WebSocketDisconnect&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;typing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;

&lt;span class="n"&gt;app&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;FastAPI&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# Connection manager for multiple clients
&lt;/span&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;ConnectionManager&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;active&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;WebSocket&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;

    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;connect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;websocket&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;WebSocket&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;websocket&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;accept&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;active&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;websocket&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;disconnect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;active&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;pop&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;send_to_user&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;ws&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;active&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;ws&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;ws&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;send_json&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;broadcast&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;ws&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;active&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;values&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
            &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;ws&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;send_json&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;manager&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ConnectionManager&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="nd"&gt;@app.websocket&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/ws/track/{driver_id}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;track_driver&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;websocket&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;WebSocket&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;driver_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;manager&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;connect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;driver_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;websocket&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;websocket&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;receive_json&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
            &lt;span class="c1"&gt;# Driver sent a location update - broadcast to assigned rider
&lt;/span&gt;            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;location_update&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;manager&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;send_to_user&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                    &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rider_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                    &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;driver_location&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;lat&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;lat&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;lng&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;lng&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]}&lt;/span&gt;
                &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="n"&gt;WebSocketDisconnect&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;manager&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;disconnect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;driver_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Real-world scenario:&lt;/strong&gt; A ride-hailing app showing a driver moving on the map in real time. The driver's app sends GPS coordinates every 3 seconds over a persistent WebSocket. The rider's app receives them without polling. This would require: 1,000 REST requests per ride if you used polling at 3-second intervals.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Production considerations:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Persistent connections consume server resources — plan for horizontal scaling early&lt;/li&gt;
&lt;li&gt;Use Redis Pub/Sub to share WebSocket state across multiple server instances&lt;/li&gt;
&lt;li&gt;Always handle &lt;code&gt;WebSocketDisconnect&lt;/code&gt; — clients drop off constantly (network, battery, background app)&lt;/li&gt;
&lt;li&gt;Heartbeats keep connections alive through load balancers that close idle connections after 60 seconds
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Heartbeat to keep connection alive
&lt;/span&gt;&lt;span class="nd"&gt;@app.websocket&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/ws/live&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;live_feed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;websocket&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;WebSocket&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;websocket&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;accept&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="c1"&gt;# Wait for message with 30s timeout
&lt;/span&gt;                &lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;wait_for&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;websocket&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;receive_json&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;30.0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;handle_message&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;TimeoutError&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="c1"&gt;# Send ping to keep connection alive
&lt;/span&gt;                &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;websocket&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;send_json&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ping&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="n"&gt;WebSocketDisconnect&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;pass&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Webhooks: Event Notification Between Services
&lt;/h2&gt;

&lt;p&gt;Use webhooks when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;An external system needs to notify your system that something happened&lt;/li&gt;
&lt;li&gt;You don't control the other system's push mechanism&lt;/li&gt;
&lt;li&gt;You want event-driven integration without maintaining a persistent connection
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;hmac&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;hashlib&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;fastapi&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;FastAPI&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Request&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;HTTPException&lt;/span&gt;

&lt;span class="n"&gt;app&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;FastAPI&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;WEBHOOK_SECRET&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;your_flutterwave_webhook_secret&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;verify_flutterwave_signature&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;bytes&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;signature&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;bool&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;expected&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;hmac&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;WEBHOOK_SECRET&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;encode&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
        &lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;hashlib&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sha256&lt;/span&gt;
    &lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;hexdigest&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;hmac&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;compare_digest&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sha256=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;expected&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;signature&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nd"&gt;@app.post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/webhooks/payment&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;payment_webhook&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Request&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# 1. Verify the request is actually from Flutterwave
&lt;/span&gt;    &lt;span class="n"&gt;signature&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;verif-hash&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;body&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;body&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="nf"&gt;verify_flutterwave_signature&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;body&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;signature&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;HTTPException&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;status_code&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;401&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;detail&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Invalid signature&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;payload&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;event_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;data&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="c1"&gt;# 2. Idempotency - Flutterwave WILL retry on 500s
&lt;/span&gt;    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;webhook_event_exists&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event_id&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;already_processed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="c1"&gt;# 3. Acknowledge immediately, process async
&lt;/span&gt;    &lt;span class="c1"&gt;# Don't do heavy work here -- return 200 fast, process in background
&lt;/span&gt;    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;background_tasks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_task&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;process_payment_event&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;mark_webhook_received&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;received&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;process_payment_event&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;order_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;data&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;meta&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;order_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;UPDATE orders SET status = &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;paid&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;, paid_at = NOW() WHERE id = $1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;order_id&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;send_confirmation_email&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;order_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Real-world scenario:&lt;/strong&gt; Your Kenyan e-commerce platform integrates Mpesa via Daraja API. When a customer pays, Safaricom calls your &lt;code&gt;/webhooks/mpesa&lt;/code&gt; endpoint with the transaction details. You mark the order paid and send a confirmation SMS. No polling. No persistent connection.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Production considerations:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Always return 200 fast; the webhook caller will retry if you're slow or error&lt;/li&gt;
&lt;li&gt;Never trust webhook data without verifying the signature&lt;/li&gt;
&lt;li&gt;Log every webhook payload for debugging; payment disputes will happen&lt;/li&gt;
&lt;li&gt;Queue heavy processing (emails, SMS, inventory updates) with a background worker&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Real-World Architecture: All Four Together
&lt;/h2&gt;

&lt;p&gt;Here's how a production fintech app in Kenya uses all of these together:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;┌──────────────────────────────────────────────────────────┐
│               Mobile/Web Client                          │
└───────┬───────────────────┬─────────────────┬────────────┘
        │                   │                 │
   REST (CRUD)         WebSocket         GraphQL
   POST /loans         /ws/notifications   /graphql
   GET /balance        (real-time alerts)  (analytics)
        │                   │                 │
┌───────▼───────────────────▼─────────────────▼────────────-┐
│                  FastAPI Backend                          │
│              (all async/await internally)                 │
└───────┬───────────────────────────────────────────────────┘
        │
   Webhook receiver
   POST /webhooks/mpesa   ← Safaricom calls this
   POST /webhooks/credit  ← Credit bureau calls this
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each pattern handles exactly what it's good at. The async/await inside FastAPI makes sure none of them block each other.&lt;/p&gt;




&lt;h2&gt;
  
  
  When to Use What: Decision Framework
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Signal&lt;/th&gt;
&lt;th&gt;Use&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Client requests data on demand&lt;/td&gt;
&lt;td&gt;REST&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Multiple clients need different data shapes&lt;/td&gt;
&lt;td&gt;GraphQL&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;User needs to see live updates without refreshing&lt;/td&gt;
&lt;td&gt;WebSocket&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;External service needs to notify your backend of events&lt;/td&gt;
&lt;td&gt;Webhook&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Your code waits on database, HTTP, or file I/O&lt;/td&gt;
&lt;td&gt;async/await&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Hard rules:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;If polling interval &amp;lt; 5 seconds and data changes frequently → switch to WebSocket&lt;/li&gt;
&lt;li&gt;If you have &amp;gt; 3 different client types with different data needs → consider GraphQL&lt;/li&gt;
&lt;li&gt;If you're integrating a payment provider, shipping tracker, or auth service → expect webhooks&lt;/li&gt;
&lt;li&gt;If you're running FastAPI → use &lt;code&gt;async def&lt;/code&gt; everywhere there's I/O, no exceptions&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Common Mistakes (Those That Hurt in Production)
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;1. &lt;code&gt;requests&lt;/code&gt; inside &lt;code&gt;async def&lt;/code&gt;&lt;/strong&gt;: blocks the entire event loop. Use &lt;code&gt;httpx&lt;/code&gt; with &lt;code&gt;await&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. No idempotency on webhook handler&lt;/strong&gt;: payment events get retried. Without idempotency checks, you'll charge customers twice.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. WebSocket without reconnection logic&lt;/strong&gt;: mobile networks drop. Your client-side WebSocket needs exponential backoff reconnection, or users see frozen data silently.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Assuming GraphQL is real-time&lt;/strong&gt;: GraphQL subscriptions require a separate WebSocket-based setup. Standard queries/mutations are still request/response.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5. No signature verification on webhooks&lt;/strong&gt;: your endpoint is public. Anyone can POST to it. Always verify HMAC signatures.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;6. Keeping heavy processing in the webhook handler&lt;/strong&gt;: the caller expects a fast response. Queue everything with a task worker (Celery, ARQ) and return 200 immediately.&lt;/p&gt;




&lt;p&gt;If you found this helpful, please share it with others who it might help. And if you have questions, kindly drop them in the comments below!&lt;/p&gt;

&lt;p&gt;&lt;em&gt;&lt;a href="https://rose-wabere.vercel.app" rel="noopener noreferrer"&gt;Rose Wabere&lt;/a&gt; - Data &amp;amp; Analytics Engineer, Nairobi. Building real-world data systems with Python, FastAPI, and whatever tool the problem actually needs.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>python</category>
      <category>architecture</category>
      <category>backend</category>
    </item>
    <item>
      <title>Building a Production-Grade Async Backend with FastAPI, SQLAlchemy, PostgreSQL, and Alembic</title>
      <dc:creator>Rose Wabere</dc:creator>
      <pubDate>Sun, 22 Mar 2026 22:47:50 +0000</pubDate>
      <link>https://forem.com/rosewabere/building-a-production-grade-async-backend-with-fastapi-sqlalchemy-postgresql-and-alembic-2ca4</link>
      <guid>https://forem.com/rosewabere/building-a-production-grade-async-backend-with-fastapi-sqlalchemy-postgresql-and-alembic-2ca4</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;A complete technical breakdown of how FastAPI, SQLAlchemy, asyncpg, PostgreSQL, Declarative Base, and Alembic work together as a production system - covering sessions, transactions, rollbacks, relationships, migrations, and architecture.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;p&gt;Most tutorials show you how to get something running. The focus of this article is on how it actually works and why each decision matters when you are building something that must survive real traffic, real failures, and real schema changes over time.&lt;/p&gt;

&lt;h2&gt;
  
  
  What You Are Actually Building
&lt;/h2&gt;

&lt;p&gt;A backend system is a coordinated pipeline. Every request travels through several layers before a response comes back.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Client → API Layer → Business Logic → Database Layer → Storage Engine → Response
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each tool in the stack owns one layer of this pipeline:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tool&lt;/th&gt;
&lt;th&gt;Responsibility&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;FastAPI&lt;/td&gt;
&lt;td&gt;Handles HTTP requests and routing&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SQLAlchemy&lt;/td&gt;
&lt;td&gt;Translates Python objects to SQL&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;asyncpg&lt;/td&gt;
&lt;td&gt;Async driver for PostgreSQL communication&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;PostgreSQL&lt;/td&gt;
&lt;td&gt;Persistent storage&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Declarative Base&lt;/td&gt;
&lt;td&gt;Defines schema as Python classes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Alembic&lt;/td&gt;
&lt;td&gt;Manages schema changes over time&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The reason most backend bugs are hard to fix is not incorrect code. It is incorrect mental models of what each layer is doing.&lt;/p&gt;




&lt;h2&gt;
  
  
  Sync vs Async: Why the Model Matters
&lt;/h2&gt;

&lt;p&gt;Your system can handle requests in two ways: blocking one at a time, or handling many concurrently.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Synchronous&lt;/strong&gt; means each request holds a worker thread until the database responds:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_users&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;User&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;all&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Asynchronous&lt;/strong&gt; means the event loop handles other work while waiting for I/O:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_users&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;select&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;User&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;scalars&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;all&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Key insight: async does not make individual queries faster. It increases throughput by allowing hundreds of concurrent requests without spinning up hundreds of threads.&lt;/p&gt;

&lt;p&gt;FastAPI runs on Python's asyncio event loop. Everything else in this stack is chosen to stay compatible with that.&lt;/p&gt;




&lt;h2&gt;
  
  
  Declarative Base: Defining Schema in Python
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sqlalchemy.orm&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;declarative_base&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sqlalchemy&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Column&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Integer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;String&lt;/span&gt;

&lt;span class="n"&gt;Base&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;declarative_base&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;User&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Base&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;__tablename__&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;users&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="nb"&gt;id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Column&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Integer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;primary_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Column&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;email&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Column&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This class is three things at once: a Python object, a table definition, and a mapping layer. Alembic reads this to generate migrations.&lt;/p&gt;




&lt;h2&gt;
  
  
  Async Engine and asyncpg
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sqlalchemy.ext.asyncio&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;create_async_engine&lt;/span&gt;

&lt;span class="n"&gt;DATABASE_URL&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;postgresql+asyncpg://user:password@localhost/db&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="n"&gt;engine&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;create_async_engine&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;DATABASE_URL&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Common mistake: using &lt;code&gt;postgresql://&lt;/code&gt; instead of &lt;code&gt;postgresql+asyncpg://&lt;/code&gt;. It will not error immediately, but it will block your event loop under load.&lt;/p&gt;




&lt;h2&gt;
  
  
  Sessions: The Most Critical Concept
&lt;/h2&gt;

&lt;p&gt;A session is not a database connection. It is a unit of work manager. It tracks modified objects, manages the transaction lifecycle, and handles cleanup.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sqlalchemy.ext.asyncio&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;AsyncSession&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sqlalchemy.orm&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;sessionmaker&lt;/span&gt;

&lt;span class="n"&gt;SessionLocal&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;sessionmaker&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;bind&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;engine&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;class_&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;AsyncSession&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;expire_on_commit&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;expire_on_commit=False&lt;/code&gt; keeps object data accessible after commit. Without it, accessing attributes after commit in an async context can raise errors because the session may already be closed.&lt;/p&gt;




&lt;h2&gt;
  
  
  Dependency Injection: Session Management at Scale
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;typing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;AsyncGenerator&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_db&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;AsyncGenerator&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;AsyncSession&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nc"&gt;SessionLocal&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;yield&lt;/span&gt; &lt;span class="n"&gt;session&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;FastAPI calls &lt;code&gt;get_db()&lt;/code&gt;, creates a session, injects it into your route, and closes it after the response. Even on exceptions.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@app.get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/users&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_users&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;AsyncSession&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Depends&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;get_db&lt;/span&gt;&lt;span class="p"&gt;)):&lt;/span&gt;
    &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;select&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;User&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;scalars&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;all&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Without this pattern, connection leaks are almost inevitable as the codebase grows.&lt;/p&gt;




&lt;h2&gt;
  
  
  Transactions and Rollbacks
&lt;/h2&gt;

&lt;p&gt;Every write operation in production must be treated as potentially failing.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;audit_log&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;commit&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;rollback&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;raise&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Without rollback: partial writes, inconsistent state, very hard to debug.&lt;br&gt;
With rollback: database stays in its previous clean state.&lt;/p&gt;


&lt;h2&gt;
  
  
  Relationships: Foreign Keys and Loading Strategies
&lt;/h2&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;User&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Base&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;__tablename__&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;users&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="nb"&gt;id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Column&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Integer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;primary_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Column&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;posts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;relationship&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Post&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;back_populates&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Base&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;__tablename__&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;posts&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="nb"&gt;id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Column&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Integer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;primary_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;title&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Column&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;user_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Column&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Integer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;ForeignKey&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;users.id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="n"&gt;user&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;relationship&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;User&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;back_populates&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;posts&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Loading relationships efficiently matters at scale. Use &lt;code&gt;selectinload&lt;/code&gt; to avoid the N+1 problem:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="nf"&gt;select&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;User&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;options&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;selectinload&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;User&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;posts&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;selectinload&lt;/code&gt; uses a second SELECT. &lt;code&gt;joinedload&lt;/code&gt; uses a JOIN. Understanding the difference affects query performance directly.&lt;/p&gt;




&lt;h2&gt;
  
  
  Alembic: Schema Evolution
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Generate migration after changing a model&lt;/span&gt;
alembic revision &lt;span class="nt"&gt;--autogenerate&lt;/span&gt; &lt;span class="nt"&gt;-m&lt;/span&gt; &lt;span class="s2"&gt;"add email to users"&lt;/span&gt;

&lt;span class="c"&gt;# Apply it&lt;/span&gt;
alembic upgrade &lt;span class="nb"&gt;head&lt;/span&gt;

&lt;span class="c"&gt;# Reverse it&lt;/span&gt;
alembic downgrade &lt;span class="nt"&gt;-1&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Critical setup: Alembic must import all your models in &lt;code&gt;env.py&lt;/code&gt; or it will not detect their changes.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# alembic/env.py
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;app.db.base&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Base&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;app.db.models&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;post&lt;/span&gt;
&lt;span class="n"&gt;target_metadata&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Base&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;metadata&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Production Project Structure
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;app/
├── main.py
├── api/
│   └── v1/
│       └── routes/
├── core/
│   └── config.py
├── db/
│   ├── base.py
│   ├── session.py
│   └── models/
├── schemas/
├── services/
└── repositories/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Routes stay thin. Services hold business logic. Repositories handle data access. This separation keeps large codebases testable and maintainable.&lt;/p&gt;




&lt;h2&gt;
  
  
  Common Mistakes That Break Production Systems
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Wrong: sync query in async context
&lt;/span&gt;&lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;User&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;all&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# Wrong: missing await
&lt;/span&gt;&lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;select&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;User&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="c1"&gt;# Wrong: manual session (no lifecycle guarantee)
&lt;/span&gt;&lt;span class="n"&gt;db&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;SessionLocal&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# Wrong: sync URL with async engine
&lt;/span&gt;&lt;span class="n"&gt;DATABASE_URL&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;postgresql://...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="c1"&gt;# Wrong: no rollback on failure
&lt;/span&gt;&lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;commit&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Every one of these mistakes has a production impact. None of them cause immediate errors in development.&lt;/p&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;A modern Python backend is a layered system where each component has a specific responsibility. The most important things to get right:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use async throughout. One sync database call can block your event loop and degrade performance for all concurrent requests.&lt;/li&gt;
&lt;li&gt;Use dependency injection for sessions. Manual session management leads to leaks.&lt;/li&gt;
&lt;li&gt;Always handle rollbacks on writes. Partial state is worse than no state.&lt;/li&gt;
&lt;li&gt;Use Alembic from day one. Manually altering production databases is how data gets corrupted.&lt;/li&gt;
&lt;li&gt;Structure your project into layers. Routes, services, repositories, and schemas each have a job.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Understanding why this stack is designed the way it is separates engineers who can debug it from engineers who can only copy it.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;I build real-world data systems and write about what I learn. Follow for more backend engineering and data content. GitHub: &lt;a href="https://github.com/RoseWabere" rel="noopener noreferrer"&gt;https://github.com/RoseWabere&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>python</category>
      <category>fastapi</category>
      <category>postgres</category>
      <category>backend</category>
    </item>
    <item>
      <title>Fix “invalid literal for int() with base 10: 'None'” in FastAPI (Asyncpg + Docker)</title>
      <dc:creator>Rose Wabere</dc:creator>
      <pubDate>Wed, 18 Mar 2026 03:59:00 +0000</pubDate>
      <link>https://forem.com/rosewabere/fix-invalid-literal-for-int-with-base-10-none-in-fastapi-asyncpg-docker-3e23</link>
      <guid>https://forem.com/rosewabere/fix-invalid-literal-for-int-with-base-10-none-in-fastapi-asyncpg-docker-3e23</guid>
      <description>&lt;p&gt;You're using Docker to deploy your FastAPI application with Aiven PostgreSQL, when all of a sudden you notice this traceback:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ValueError: invalid literal for int() with base 10: 'None'
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It refers to SQLAlchemy's URL parser and complains that the string "-None" is the port portion of your database URL. Your URL appears as &lt;code&gt;postgresql+asyncpg://user:pass@host:None/dbname&lt;/code&gt;. What caused that to occur? And above all, &lt;strong&gt;how do you fix it?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;In this tutorial, I'll walk you through the cause of this error and provide a clean solution. We'll also discuss best practices for managing environment variables in containerized FastAPI applications.&lt;/p&gt;

&lt;h2&gt;
  
  
  Understanding the Error
&lt;/h2&gt;

&lt;p&gt;The error occurs when SQLAlchemy's &lt;code&gt;make_url()&lt;/code&gt; function tries to parse the database URL and convert the port component to an integer, but receives the literal string &lt;code&gt;"None"&lt;/code&gt;. This happens because somewhere in your code, you're building the URL with a variable that evaluates to &lt;code&gt;None&lt;/code&gt;, and when interpolated into an f-string, None becomes the string &lt;code&gt;"None"&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Code That Breaks
&lt;/h2&gt;

&lt;p&gt;Consider this common pattern in database.py:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;

&lt;span class="n"&gt;DB_USER&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;DB_USER&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;DB_PASSWORD&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;DB_PASSWORD&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;DB_HOST&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;DB_HOST&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;DB_PORT&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;DB_PORT&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;DB_NAME&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;DB_NAME&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;ASYNC_DB_URL&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;postgresql+asyncpg://&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;DB_USER&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;DB_PASSWORD&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;@&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;DB_HOST&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;DB_PORT&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;DB_NAME&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;ASYNC_DB_URL&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;DATABASE_URL&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# attempt to override with full URL
&lt;/span&gt;
&lt;span class="n"&gt;engine&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;create_async_engine&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ASYNC_DB_URL&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When &lt;code&gt;DB_PORT&lt;/code&gt; is not set in your environment (such as a Docker container), it becomes &lt;code&gt;None&lt;/code&gt;. A URL with &lt;code&gt;:None&lt;/code&gt; in the port position is generated by the f-string. The engine creation still uses the final value even if you later reassign &lt;code&gt;ASYNC_DB_URL&lt;/code&gt; with the correct DATABASE_URL. Why, then, does it still have "None"? Because the assignment order is deceptive or because &lt;code&gt;DATABASE_URL&lt;/code&gt; may also be missing. &lt;/p&gt;

&lt;p&gt;In my instance, &lt;code&gt;DATABASE_URL&lt;/code&gt; was set, but a string containing "-None" was already created in the first line and used. No, the engine actually makes use of the final value. Since the second line sets &lt;code&gt;ASYNC_DB_URL&lt;/code&gt; to &lt;code&gt;None&lt;/code&gt; but the engine attempts to use &lt;code&gt;None&lt;/code&gt;, the issue must be that &lt;code&gt;DATABASE_URL&lt;/code&gt; is not set. That would result in a different error. The port "None" error we observe indicates that the f-string URL is being used.&lt;/p&gt;

&lt;p&gt;This only occurs in the event that &lt;code&gt;DATABASE_URL&lt;/code&gt; is not set, in which case the code reverts to the constructed URL (possibly through a &lt;code&gt;or&lt;/code&gt; clause). For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;ASYNC_DB_URL&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;DATABASE_URL&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;postgresql+asyncpg://&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;DB_USER&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;DB_PASSWORD&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;@&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;DB_HOST&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;:&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;DB_PORT&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;DB_NAME&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If &lt;code&gt;DATABASE_URL&lt;/code&gt; is empty, it uses the constructed URL, which has &lt;code&gt;None&lt;/code&gt; for port if any component is missing.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step-by-Step Debugging
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Check your environment variables&lt;/strong&gt; inside the container:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker &lt;span class="nb"&gt;exec&lt;/span&gt; &amp;lt;container_name&amp;gt; &lt;span class="nb"&gt;env&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Verify that &lt;code&gt;DATABASE_URL&lt;/code&gt; is present and correct, and that individual component variables are not set (unless you intentionally set them).&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Inspect the code&lt;/strong&gt; in &lt;code&gt;database.py&lt;/code&gt;. Look for any fallback logic that builds the URL from components. If you see something like the above, that's your culprit.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Simplify&lt;/strong&gt;: Remove the component-based construction entirely. Use only the full &lt;code&gt;DATABASE_URL&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;

&lt;span class="n"&gt;DATABASE_URL&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;DATABASE_URL&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;DATABASE_URL&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;ValueError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;DATABASE_URL environment variable is not set&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;engine&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;create_async_engine&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;DATABASE_URL&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This ensures you're not accidentally creating a malformed URL.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Test&lt;/strong&gt; by running your app again. The error should disappear.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Why the Single-Env-Var Approach is Better
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Consistency&lt;/strong&gt;: Most cloud database providers (Aiven, Heroku, etc.) give you a single &lt;code&gt;DATABASE_URL&lt;/code&gt;. Using it directly avoids mismatches.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Simplicity&lt;/strong&gt;: Fewer variables to manage in your .env and Docker configuration.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Portability&lt;/strong&gt;: Your app can be deployed anywhere without changing how the URL is assembled.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Security&lt;/strong&gt;: You avoid accidentally logging or exposing partial credentials.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Handling Async with asyncpg
&lt;/h3&gt;

&lt;p&gt;If you're using asyncpg with SQLAlchemy, your &lt;code&gt;DATABASE_URL&lt;/code&gt; must use the &lt;code&gt;postgresql+asyncpg://&lt;/code&gt; scheme. Aiven provides a URL like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;postgres://avnadmin:password@host:port/defaultdb?sslmode=require
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You need to change the scheme to &lt;code&gt;postgresql+asyncpg://&lt;/code&gt;. You can do this programmatically:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;raw_url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;DATABASE_URL&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;raw_url&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;raw_url&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;startswith&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;postgres://&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;raw_url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;raw_url&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;postgres://&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;postgresql+asyncpg://&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;engine&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;create_async_engine&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;raw_url&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Full Working Example
&lt;/h3&gt;

&lt;p&gt;Here's a robust database.py for FastAPI with Aiven:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sqlalchemy.ext.asyncio&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;create_async_engine&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;AsyncSession&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sqlalchemy.orm&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;sessionmaker&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;declarative_base&lt;/span&gt;

&lt;span class="n"&gt;DATABASE_URL&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;DATABASE_URL&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;DATABASE_URL&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;ValueError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;DATABASE_URL environment variable is required&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Ensure asyncpg scheme
&lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;DATABASE_URL&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;startswith&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;postgres://&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;DATABASE_URL&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;DATABASE_URL&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;postgres://&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;postgresql+asyncpg://&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;engine&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;create_async_engine&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;DATABASE_URL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;echo&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;AsyncSessionLocal&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;sessionmaker&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;engine&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;class_&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;AsyncSession&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;expire_on_commit&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;Base&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;declarative_base&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Environment Variables in Docker Compose
&lt;/h3&gt;

&lt;p&gt;In your docker-compose.yml, pass the DATABASE_URL via environment or an env_file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;services&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;app&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;.&lt;/span&gt;
    &lt;span class="na"&gt;env_file&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;.env&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And in your .env file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;DATABASE_URL=postgresql+asyncpg://avnadmin:password@host:port/defaultdb?sslmode=require
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;The &lt;code&gt;ValueError: invalid literal for int() with base 10: 'None'&lt;/code&gt; is a symptom of a configuration mismatch. By adopting a single &lt;code&gt;DATABASE_URL&lt;/code&gt; environment variable and eliminating component-based construction, you can avoid this and similar pitfalls. Remember to always validate required environment variables early and use defensive programming.&lt;/p&gt;

&lt;p&gt;If you found this helpful, please share it with others who might be struggling with the same issue. And if you have questions, kindly drop them in the comments below!&lt;/p&gt;

</description>
      <category>postgres</category>
      <category>docker</category>
      <category>fastapi</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>Deploying FastAPI with Aiven PostgreSQL and Docker (Production-Ready Guide)</title>
      <dc:creator>Rose Wabere</dc:creator>
      <pubDate>Tue, 17 Mar 2026 18:59:16 +0000</pubDate>
      <link>https://forem.com/rosewabere/deploying-fastapi-with-aiven-postgresql-and-docker-production-ready-guide-14e8</link>
      <guid>https://forem.com/rosewabere/deploying-fastapi-with-aiven-postgresql-and-docker-production-ready-guide-14e8</guid>
      <description>&lt;p&gt;PostgreSQL is a reliable database, and FastAPI has emerged as one of the most popular frameworks for creating high-performance APIs. However, maintaining your own Postgres instance can be a distraction when it comes time to deploy. This is where the managed cloud database service &lt;strong&gt;Aiven&lt;/strong&gt; comes into play. It takes away infrastructure issues and gives you complete access to PostgreSQL, making operations much less painful. &lt;/p&gt;

&lt;p&gt;In this tutorial, I'll walk you through setting up a FastAPI project, connecting it to Aiven PostgreSQL with asyncpg, and containerizing everything with Docker for both development and production. Along the way, you'll discover how each component fits into a contemporary backend stack and the reasons these tools are frequently used together.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Requirements:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Python 3.9+&lt;/li&gt;
&lt;li&gt;Docker &amp;amp; Docker Compose (for reproducible environments and easy deployment)&lt;/li&gt;
&lt;li&gt;An Aiven account (free tier works and is enough for testing)&lt;/li&gt;
&lt;li&gt;Basic familiarity with FastAPI and async/await (understanding non-blocking I/O will help)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Step 1: FastAPI Project Setup
&lt;/h2&gt;

&lt;p&gt;Create a new project folder and set up a virtual environment:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;mkdir &lt;/span&gt;myfastapi &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;cd &lt;/span&gt;myfastapi
python &lt;span class="nt"&gt;-m&lt;/span&gt; venv venv
&lt;span class="nb"&gt;source &lt;/span&gt;venv/bin/activate
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Install dependencies:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# requirements.txt
fastapi
uvicorn[standard]
asyncpg
python-dotenv
gunicorn
email-validator
python-multipart
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here’s what these packages do:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;FastAPI&lt;/strong&gt; - the web framework&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;uvicorn&lt;/strong&gt; - ASGI server for running FastAPI&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;asyncpg&lt;/strong&gt; - high-performance async PostgreSQL driver&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;python-dotenv&lt;/strong&gt; - loads environment variables from .env&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;gunicorn&lt;/strong&gt; - production-grade process manager&lt;/li&gt;
&lt;li&gt;others → useful utilities for forms and validation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Now, a minimal FastAPI app with a health check and a database test endpoint:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# app/main.py
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;fastapi&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;FastAPI&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;asyncpg&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;

&lt;span class="n"&gt;app&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;FastAPI&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;DATABASE_URL&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;DATABASE_URL&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nd"&gt;@app.get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;root&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;message&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Hello World&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nd"&gt;@app.get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/db-check&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;db_check&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;conn&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;asyncpg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;connect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;DATABASE_URL&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;version&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fetchval&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SELECT version()&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;conn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;close&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;postgres_version&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;version&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This example demonstrates:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A basic API endpoint (&lt;code&gt;/&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;A real database connection (&lt;code&gt;/db-check&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Async database access using &lt;code&gt;await&lt;/code&gt;, which allows FastAPI to handle many requests efficiently without blocking&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Step 2: Configure Aiven PostgreSQL
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Log in to Aiven Console, create a new PostgreSQL service (choose your cloud provider and region).&lt;/li&gt;
&lt;li&gt;Once the service is running, go to the Overview tab and copy the Service URI. It will look like:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;postgres://avnadmin:password@host:port/defaultdb?sslmode=require
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;Important: For asyncpg, the URI must use the &lt;code&gt;postgresql+asyncpg://&lt;/code&gt; scheme. Adjust it to:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;postgresql+asyncpg://avnadmin:password@host:port/defaultdb?sslmode=require
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;This tells your app to use the asyncpg driver instead of the default PostgreSQL driver.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Save this as &lt;code&gt;DATABASE_URL&lt;/code&gt; in a &lt;code&gt;.env&lt;/code&gt; file (but remember, never commit it!).&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 3: Dockerize the App
&lt;/h2&gt;

&lt;p&gt;Create a &lt;code&gt;Dockerfile&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="s"&gt; python:3.11-slim&lt;/span&gt;
&lt;span class="k"&gt;WORKDIR&lt;/span&gt;&lt;span class="s"&gt; /app&lt;/span&gt;
&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; requirements.txt .&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;--no-cache-dir&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; requirements.txt
&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; . .&lt;/span&gt;
&lt;span class="k"&gt;CMD&lt;/span&gt;&lt;span class="s"&gt; ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Docker lets you package your app with all its dependencies into a container, ensuring:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;consistent behavior across environments&lt;/li&gt;
&lt;li&gt;easy deployment to cloud platforms&lt;/li&gt;
&lt;li&gt;no “works on my machine” issues&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Build the image: &lt;code&gt;docker build -t myfastapi .&lt;/code&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 4: Docker Compose for Development
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;docker-compose.yml&lt;/code&gt; (dev version):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;3.9"&lt;/span&gt;
&lt;span class="na"&gt;services&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;app&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;.&lt;/span&gt;
    &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;8000:8000"&lt;/span&gt;
    &lt;span class="na"&gt;environment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;DATABASE_URL&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${DATABASE_URL}&lt;/span&gt;
    &lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;./app:/app/app&lt;/span&gt;   &lt;span class="c1"&gt;# live reload for code changes&lt;/span&gt;
    &lt;span class="na"&gt;command&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Key ideas here:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Volumes allow live code updates without rebuilding the container&lt;/li&gt;
&lt;li&gt;--reload enables hot-reloading for faster development&lt;/li&gt;
&lt;li&gt;Environment variables keep secrets out of your code&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Run with: docker compose --env-file .env up&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 5: Running SQL Scripts on Aiven
&lt;/h2&gt;

&lt;p&gt;Since we are not using a local database container, we execute scripts directly against Aiven using psql:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;psql &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$DATABASE_URL&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &amp;lt; scripts/01_create_tables.sql
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;What this approach does&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;treats Aiven as your single source of truth&lt;/li&gt;
&lt;li&gt;avoids environment drift between local and production&lt;/li&gt;
&lt;li&gt;is closer to how real production systems are managed&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Make sure your &lt;code&gt;DATABASE_URL&lt;/code&gt; in the shell uses the &lt;code&gt;postgresql://&lt;/code&gt;scheme (without &lt;code&gt;+asyncpg&lt;/code&gt;) for &lt;code&gt;psql&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 6: Docker Compose for Production
&lt;/h2&gt;

&lt;p&gt;Create &lt;code&gt;docker-compose.prod.yml&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;3.9"&lt;/span&gt;
&lt;span class="na"&gt;services&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;app&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;.&lt;/span&gt;
    &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;8000:8000"&lt;/span&gt;
    &lt;span class="na"&gt;environment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;DATABASE_URL&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${DATABASE_URL}&lt;/span&gt;
    &lt;span class="na"&gt;command&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;gunicorn -w 4 -k uvicorn.workers.UvicornWorker app.main:app --bind 0.0.0.0:8000&lt;/span&gt;
    &lt;span class="na"&gt;restart&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;always&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Differences from development:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;No volume mounts (immutable containers)&lt;/li&gt;
&lt;li&gt;Uses Gunicorn for better process management&lt;/li&gt;
&lt;li&gt;Multiple workers for handling concurrent traffic&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Run in detached mode:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker compose &lt;span class="nt"&gt;--env-file&lt;/span&gt; .env.production &lt;span class="nt"&gt;-f&lt;/span&gt; docker-compose.prod.yml up &lt;span class="nt"&gt;-d&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Step 7: Environment Variables &amp;amp; Security
&lt;/h2&gt;

&lt;p&gt;Never commit &lt;code&gt;.env&lt;/code&gt; files. Use &lt;code&gt;.env.example&lt;/code&gt; as a template.&lt;/p&gt;

&lt;p&gt;In production, inject secrets via your CI/CD or orchestration tool.&lt;/p&gt;

&lt;p&gt;Always use &lt;code&gt;sslmode=require&lt;/code&gt; with Aiven to enforce encrypted connections.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;You now have a FastAPI app running with a managed PostgreSQL database, fully containerized and ready for production. This setup lets you focus on writing code while Aiven handles the database heavy lifting.&lt;/p&gt;

&lt;p&gt;If you found this helpful, give it a ❤️ and share it with your network.&lt;/p&gt;

</description>
      <category>python</category>
      <category>docker</category>
      <category>postgres</category>
      <category>fastapi</category>
    </item>
    <item>
      <title>Nairobi Property Listings Scraping: 500+ Property Listings with Smart Size Extraction</title>
      <dc:creator>Rose Wabere</dc:creator>
      <pubDate>Wed, 18 Feb 2026 18:40:08 +0000</pubDate>
      <link>https://forem.com/rosewabere/nairobi-property-listings-scraping-500-property-listings-with-smart-size-extraction-5hno</link>
      <guid>https://forem.com/rosewabere/nairobi-property-listings-scraping-500-property-listings-with-smart-size-extraction-5hno</guid>
      <description>&lt;h3&gt;
  
  
  The Challenge
&lt;/h3&gt;

&lt;p&gt;Building a house price prediction model for Nairobi requires a dataset that simply doesn’t exist in the open. You have to build it yourself by scraping property portals. But real estate data is very messy, listings are inconsistent, poorly structured, and is often contradictory.&lt;/p&gt;

&lt;p&gt;Taking the size property for instance, one listing gives size as "4,350 sq. ft." Another says "Approx. 350 – 400 sqm." A third buries three villa sizes inside a paragraph.&lt;/p&gt;

&lt;p&gt;So, &lt;code&gt;how do you extract&lt;/code&gt; clean, usable data from this chaos?&lt;/p&gt;

&lt;p&gt;In this post, I’ll walk you through the architecture of a scraper that collected 528 listings with 10 structured fields, focusing on the size extraction logic that handles ranges, commas, and mixed units.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Project Structure&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;nairobi-house-price-prediction/
├── data/
│   └── raw_listings.csv          # raw csv output
├── notebooks/
│   └── extraction.ipynb          # scraping logic
├── src/
│   ├── scraper.py                 # Core scraping functions
│   └── utils.py                    # Helpers (parsers, cleaners)
├── data_dictionary.json            # Schema definition
└── requirements.txt
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Core Scraping Logic
&lt;/h3&gt;

&lt;p&gt;The scraper iterates over pages, extracts needed data from each listing card, and then fetches more data from the detail page for richer information. I used &lt;code&gt;requests&lt;/code&gt; with retries and &lt;code&gt;BeautifulSoup&lt;/code&gt; for parsing.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Size Extraction Problem
&lt;/h3&gt;

&lt;p&gt;Property size appears in many forms on the site:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Example&lt;/th&gt;
&lt;th&gt;Format&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;250 m²&lt;/td&gt;
&lt;td&gt;Single, metric&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4,350 sq. ft.&lt;/td&gt;
&lt;td&gt;Imperial with comma&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Approx. 350 – 400 sqm&lt;/td&gt;
&lt;td&gt;Range, metric&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;465 to 476 square meters&lt;/td&gt;
&lt;td&gt;Range with “to”&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;7,755 sq ft&lt;/td&gt;
&lt;td&gt;Imperial, no dot&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;240 SQM&lt;/td&gt;
&lt;td&gt;Uppercase&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Some listings even contain multiple size mentions (built‑up area, terrace, garden). We need to extract the first meaningful built‑up size.&lt;/p&gt;

&lt;p&gt;With all that chaos, the &lt;code&gt;extract_size_from_text()&lt;/code&gt; function handles it all.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;extract_size_from_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Extract built-up/property size from messy real estate descriptions.
    Returns original string of the most plausible size, or &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;N/A&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;N/A&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;,&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;candidates&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;  &lt;span class="c1"&gt;# (size_in_sqm, original_text)
&lt;/span&gt;
    &lt;span class="c1"&gt;# 1. Ranges in sqm
&lt;/span&gt;    &lt;span class="n"&gt;range_matches&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;findall&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;(\d+(\.\d+)?)\s*(?:–|-|to)\s*(\d+(\.\d+)?)\s*(sqm|m²|square meters?)&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;IGNORECASE&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;match&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;range_matches&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;low&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;float&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;match&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
        &lt;span class="n"&gt;high&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;float&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;match&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;high&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;candidates&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;high&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;match&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;–&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;match&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; sqm&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

    &lt;span class="c1"&gt;# 2. Single sqm
&lt;/span&gt;    &lt;span class="n"&gt;sqm_matches&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;findall&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;(\d+(\.\d+)?)\s*(sqm|m²|square meters?)&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;IGNORECASE&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;match&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;sqm_matches&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;val&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;float&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;match&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;val&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;candidates&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;val&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;match&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; sqm&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

    &lt;span class="c1"&gt;# 3. Square feet (convert to sqm)
&lt;/span&gt;    &lt;span class="n"&gt;sqft_matches&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;findall&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;(\d+(\.\d+)?)\s*(sq\.?\s*ft\.?|sqft)&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;IGNORECASE&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;match&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;sqft_matches&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;sqft&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;float&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;match&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;sqft&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mi"&gt;300&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;sqm&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sqft&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mf"&gt;0.092903&lt;/span&gt;
            &lt;span class="n"&gt;candidates&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;sqm&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;match&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; sq ft&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

    &lt;span class="c1"&gt;# 4. Acres (including fractions)
&lt;/span&gt;    &lt;span class="n"&gt;acre_matches&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;findall&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;(\d+/\d+|\d+(\.\d+)?)\s*-?\s*(acre)&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;IGNORECASE&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;match&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;acre_matches&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;raw&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;match&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;/&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;raw&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;num&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;den&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;raw&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;/&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;acres&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;float&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;num&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="nf"&gt;float&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;den&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;acres&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;float&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;raw&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;acres&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.05&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;sqm&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;acres&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mf"&gt;4046.86&lt;/span&gt;
            &lt;span class="n"&gt;candidates&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;sqm&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;raw&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; acre&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;candidates&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;plausible&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;candidates&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;plausible&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="c1"&gt;# Return the largest (most likely built-up area)
&lt;/span&gt;            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;max&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;plausible&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;])[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;N/A&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Why This Approach?
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Unit normalization&lt;/strong&gt; allows fair comparison across different measurement systems.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Thresholds&lt;/strong&gt; (30 sqm, 300 sqft, 0.05 acre) eliminate noise from tiny areas that are clearly not the main house or are clearly not logical.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Max selection&lt;/strong&gt; handles listings that describe multiple areas (e.g., terraces, gardens). This means that we take the largest, which typically corresponds to the built‑up area.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Regex patterns&lt;/strong&gt; are flexible enough to catch variations like "sqm", "m²", "square meters", "sq. ft.", "sqft", and acre fractions.&lt;/p&gt;

&lt;h3&gt;
  
  
  Integration with the Scraper
&lt;/h3&gt;

&lt;p&gt;The scraper collects size from two places:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Listing card&lt;/strong&gt; – quick size from swiper slides or truncated description.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Detail page&lt;/strong&gt; – full description that contains a more accurate size. If found, it overrides the card size.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;extract_bedrooms_bathrooms_size&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;listing&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;bedrooms&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bathrooms&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;N/A&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;size_from_swiper&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;N/A&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="c1"&gt;# ... extract from swiper slides ...
&lt;/span&gt;
    &lt;span class="c1"&gt;# Extract from description on the card
&lt;/span&gt;    &lt;span class="n"&gt;desc_div&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;listing&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;find&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;div&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;truncatedDescription&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;desc_div&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;desc_text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;desc_div&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;strip&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;size_from_desc&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;extract_size_from_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;desc_text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;size&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;size_from_desc&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;size_from_desc&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;N/A&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="n"&gt;size_from_swiper&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;bedrooms&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;bathrooms&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;size&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The detail page&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;extract_bedrooms_bathrooms_size&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;listing&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;bedrooms&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;bathrooms&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;N/A&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="n"&gt;size_from_swiper&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;N/A&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="c1"&gt;# ... extract from swiper slides ...
&lt;/span&gt;
    &lt;span class="c1"&gt;# Extract from description on the card
&lt;/span&gt;    &lt;span class="n"&gt;desc_div&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;listing&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;find&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;div&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;truncatedDescription&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;desc_div&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;desc_text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;desc_div&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;strip&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;size_from_desc&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;extract_size_from_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;desc_text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;size&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;size_from_desc&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;size_from_desc&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;N/A&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="n"&gt;size_from_swiper&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;bedrooms&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;bathrooms&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;size&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then in &lt;code&gt;scrape_listing&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;size_from_detail&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;N/A&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;size&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;size_from_detail&lt;/span&gt;  &lt;span class="c1"&gt;# override card size
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Error Handling &amp;amp; Resilience
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Retry logic&lt;/strong&gt; with &lt;code&gt;fetch_page&lt;/code&gt; – requests up to 3 times with a 2‑second delay between attempts.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fallback data&lt;/strong&gt;– if detail page fails, return basic info (size from card).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Checkpointing&lt;/strong&gt; – main loop stops when max_listings stops at &lt;code&gt;max_listings&lt;/code&gt; (800) is reached to avoid over‑scraping.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Polite delays&lt;/strong&gt; – 1s between detail requests, 2s between pages to prevent IP blocking.&lt;/p&gt;

&lt;h3&gt;
  
  
  Data Dictionary: Schema as Code
&lt;/h3&gt;

&lt;p&gt;I defined the schema upfront to ensure consistency:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;data_dictionary&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Column&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Title&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;String&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Property name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Column&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Property Type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;String&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Apartment, Townhouse, etc.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="c1"&gt;# ... etc.
&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is saved as &lt;code&gt;data_dictionary.json&lt;/code&gt;, serving as documentation for collaborators and future‑me.&lt;/p&gt;

&lt;p&gt;Results after running pages 1–40:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;START_PAGE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
&lt;span class="n"&gt;END_PAGE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;40&lt;/span&gt;      
&lt;span class="n"&gt;MAX_LISTINGS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;800&lt;/span&gt;     &lt;span class="c1"&gt;# max
&lt;/span&gt;
&lt;span class="c1"&gt;# Scrape
&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;scrape_pages&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;START_PAGE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;END_PAGE&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;MAX_LISTINGS&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;  &lt;span class="c1"&gt;# 800
&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;columns&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# ['Title', 'Property Type', 'Price', 'Location', 'Bedrooms', 'Bathrooms', 'Size', 'Amenities', 'Surroundings', 'Created At']
&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;800 listings from pages 1–40&lt;/li&gt;
&lt;li&gt;10 fields per listing&lt;/li&gt;
&lt;li&gt;Size captured in listings if available&lt;/li&gt;
&lt;li&gt;0 critical failures – thanks to retries and fallbacks&lt;/li&gt;
&lt;li&gt;Raw CSV saved to data/raw_listings.csv&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Lessons for Production Scrapers
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Always normalise units&lt;/strong&gt; – you can't compare apples and oranges.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Filter implausible values&lt;/strong&gt; – they corrupt your model.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Have a fallback&lt;/strong&gt; – if detail page fails, keep the card data.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Document your schema&lt;/strong&gt; – you'll thank yourself later.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Respect the source&lt;/strong&gt; – rate limiting isn't optional.&lt;/p&gt;

&lt;h4&gt;
  
  
  Next Steps: Data Cleaning
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Load raw_listings.csv&lt;/li&gt;
&lt;li&gt;Remove duplicates&lt;/li&gt;
&lt;li&gt;Standardize location strings&lt;/li&gt;
&lt;li&gt;Convert price to integer (remove “KSh”, commas)&lt;/li&gt;
&lt;li&gt;Convert size to numeric (extract first number from ranges, convert acres to sqm)&lt;/li&gt;
&lt;li&gt;Create features: price_per_sqft, amenity_score (count of amenities), month from Created At&lt;/li&gt;
&lt;li&gt;Basic EDA and save as clean_listings.csv&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Get the Code&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;All code is available on GitHub: &lt;a href="https://github.com/Rozieroz/Nairobi-House-Price-Prediction" rel="noopener noreferrer"&gt;github&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Contributions welcome! Feel free to open issues or PRs.&lt;/p&gt;

&lt;p&gt;Connect with Me&lt;/p&gt;

&lt;p&gt;LinkedIn: &lt;a href="https://www.linkedin.com/in/rosewabere/" rel="noopener noreferrer"&gt;linkedin&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Python #WebScraping #BeautifulSoup #DataScience #MachineLearning #Kenya #OpenSource #Regex
&lt;/h1&gt;

</description>
      <category>webscraping</category>
      <category>python</category>
      <category>datascience</category>
      <category>regex</category>
    </item>
    <item>
      <title>Fixing Kali Linux's "Failed to Set Beacon Parameters" Hotspot Error: A Troubleshooting Tale</title>
      <dc:creator>Rose Wabere</dc:creator>
      <pubDate>Sat, 07 Feb 2026 20:22:38 +0000</pubDate>
      <link>https://forem.com/rosewabere/fixing-kali-linuxs-failed-to-set-beacon-parameters-hotspot-error-a-troubleshooting-tale-1ddc</link>
      <guid>https://forem.com/rosewabere/fixing-kali-linuxs-failed-to-set-beacon-parameters-hotspot-error-a-troubleshooting-tale-1ddc</guid>
      <description>&lt;p&gt;Have you ever needed to turn your Kali Linux machine into a Wi-Fi hotspot, only to be stopped cold by a cryptic error? You install &lt;code&gt;hostapd&lt;/code&gt;, write your config, and run the command, brimming with confidence. Then, the terminal spits back:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Failed to set beacon parameters
Interface initialization failed
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Your hotspot is dead on arrival. This isn't just a minor bug—it's a fundamental hardware limitation that most tutorials gloss over. I hit this wall myself while setting up a penetration testing lab. Through trial, error, and deep diving, I found the solution isn't just another &lt;code&gt;apt install&lt;/code&gt; command. It's about understanding the &lt;strong&gt;single-channel limitation&lt;/strong&gt; of your wireless card and using the right tool to bypass it.&lt;/p&gt;

&lt;p&gt;This is the story of that troubleshooting journey and the definitive guide to creating a reliable Wi-Fi-to-Wi-Fi hotspot on Kali Linux.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Scene of the Crime: A Standard Setup That Should Have Worked
&lt;/h2&gt;

&lt;p&gt;My goal was simple: share my laptop's existing Wi-Fi internet connection with another device through a new hotspot, all using Kali. The standard approach for this is using &lt;code&gt;hostapd&lt;/code&gt; (the host access point daemon). I followed the typical steps:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; Installed the required packages: &lt;code&gt;hostapd&lt;/code&gt; and &lt;code&gt;dnsmasq&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt; Created a clean &lt;code&gt;/etc/hostapd/hostapd.conf&lt;/code&gt; file, setting my SSID, password, and channel.&lt;/li&gt;
&lt;li&gt; Configured &lt;code&gt;dnsmasq&lt;/code&gt; for DHCP and set up IP forwarding with &lt;code&gt;iptables&lt;/code&gt;.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;I ran &lt;code&gt;sudo hostapd /etc/hostapd.conf&lt;/code&gt;, expecting to see &lt;code&gt;AP-ENABLED&lt;/code&gt;. Instead, I was greeted with the "Failed to set beacon parameters" error. The interface would flip between enabled and disabled states before failing completely. A search online showed I wasn't alone—this is a common, frustrating roadblock.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Investigation: Why Your Wi-Fi Card Refuses to Cooperate
&lt;/h2&gt;

&lt;p&gt;The core issue is physics, not software. Most internal Wi-Fi cards are designed to operate on &lt;strong&gt;one radio frequency at a time&lt;/strong&gt;.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;The Problem:&lt;/strong&gt; If your Kali laptop is connected to a router on &lt;strong&gt;Channel 6&lt;/strong&gt;, the card's radio is tuned to that frequency. You cannot force the same physical radio to simultaneously broadcast a hotspot on &lt;strong&gt;Channel 11&lt;/strong&gt;. It's like asking a musician to play two different notes on the same instrument at the exact same moment—it's impossible for many consumer-grade cards.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;The Error:&lt;/strong&gt; This hardware conflict manifests as the &lt;code&gt;Failed to set beacon parameters&lt;/code&gt; error. The driver receives conflicting commands and fails.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;The Realization:&lt;/strong&gt; The standard &lt;code&gt;hostapd&lt;/code&gt; method works perfectly for creating a hotspot from a &lt;em&gt;wired&lt;/em&gt; Ethernet connection, because the Wi-Fi card has only one job: broadcasting. The "Wi-Fi to Wi-Fi" scenario is where it falls apart.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Solution: Introducing &lt;code&gt;linux-wifi-hotspot&lt;/code&gt;
&lt;/h2&gt;

&lt;p&gt;The breakthrough came when I discovered the &lt;code&gt;linux-wifi-hotspot&lt;/code&gt; tool. Its key advantage is right in the description on its GitHub page: it is "able to create a hotspot using the same wifi card which is connected to an AP already". It does this by creating a virtual interface and handling the complex routing and channel coordination automatically.&lt;/p&gt;

&lt;p&gt;Here’s a comparison of the two primary methods:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;The Manual &lt;code&gt;hostapd&lt;/code&gt; Method&lt;/th&gt;
&lt;th&gt;The &lt;code&gt;linux-wifi-hotspot&lt;/code&gt; Method&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Core Purpose&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Creates a standard access point (AP).&lt;/td&gt;
&lt;td&gt;Creates an AP &lt;strong&gt;specifically while connected to another network&lt;/strong&gt;.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;"Wi-Fi to Wi-Fi"&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Usually fails due to single-channel limitation.&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Designed to solve this problem.&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Ease of Use&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Requires manual config of &lt;code&gt;hostapd&lt;/code&gt;, &lt;code&gt;dnsmasq&lt;/code&gt;, &lt;code&gt;iptables&lt;/code&gt;.&lt;/td&gt;
&lt;td&gt;Simple GUI or one-line terminal command.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Best For&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Sharing a wired Ethernet connection or advanced custom setups.&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;Sharing your current Wi-Fi connection&lt;/strong&gt; quickly and reliably.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Step-by-Step: Building and Using Your Reliable Hotspot
&lt;/h3&gt;

&lt;p&gt;The tool isn't always in the default Kali repositories, so we build it from source. Here’s the process that finally worked:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Install the Dependencies:&lt;/strong&gt;&lt;br&gt;
First, we need the libraries for building the software and the engines that power the hotspot.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;apt update
&lt;span class="nb"&gt;sudo &lt;/span&gt;apt &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-y&lt;/span&gt; libgtk-3-dev build-essential gcc g++ pkg-config make hostapd dnsmasq libqrencode-dev libpng-dev libglib2.0-dev
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;2. Build and Install &lt;code&gt;linux-wifi-hotspot&lt;/code&gt; from Source:&lt;/strong&gt;&lt;br&gt;
Building from source ensures we get the latest version with all features intact.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/lakinduakash/linux-wifi-hotspot
&lt;span class="nb"&gt;cd &lt;/span&gt;linux-wifi-hotspot
make
&lt;span class="nb"&gt;sudo &lt;/span&gt;make &lt;span class="nb"&gt;install&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;3. Launch the Hotspot (The Correct Way):&lt;/strong&gt;&lt;br&gt;
You can use the GUI (&lt;code&gt;wihotspot&lt;/code&gt;), but the command line gives us the precise control needed to solve the channel conflict.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;First, identify your current Wi-Fi channel:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;iwlist wlan0 channel | &lt;span class="nb"&gt;grep &lt;/span&gt;Current
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Note the channel number (e.g., &lt;code&gt;Current Frequency:2.437 GHz (Channel 6)&lt;/code&gt;).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Now, launch the hotspot, forcing it to the SAME channel:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;create_ap &lt;span class="nt"&gt;-c&lt;/span&gt; 6 wlan0 wlan0 MySecureHotspot MyStrongPassword123
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;The &lt;code&gt;-c 6&lt;/code&gt; flag is the magic key.&lt;/strong&gt; It forces the hotspot to broadcast on Channel 6, eliminating the channel conflict. Replace &lt;code&gt;6&lt;/code&gt; with your identified channel, and &lt;code&gt;wlan0&lt;/code&gt; with your interface name if different.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Expected Success Output:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Config dir: /tmp/create_ap.wlan0.conf.xyz
PID: 12345
...
AP-Enabled
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Keep that terminal window open—closing it will turn off the hotspot. To run it in the background, you can add the &lt;code&gt;-m&lt;/code&gt; flag to the command.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Takeaway: More Than Just a Fixed Hotspot
&lt;/h2&gt;

&lt;p&gt;Solving this problem taught me a crucial lesson in troubleshooting: &lt;strong&gt;understand the tool's purpose&lt;/strong&gt;. I was trying to use a hammer (&lt;code&gt;hostapd&lt;/code&gt;) to drive a screw (Wi-Fi-to-Wi-Fi sharing). The real fix was finding the right screwdriver (&lt;code&gt;linux-wifi-hotspot&lt;/code&gt;).&lt;/p&gt;

&lt;p&gt;This process highlights essential skills for any security professional or Linux user:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;Diagnosing Hardware Limitations:&lt;/strong&gt; Learning to use &lt;code&gt;iw list&lt;/code&gt; to check for AP mode support and understanding channel conflicts is foundational.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Systematic Troubleshooting:&lt;/strong&gt; Moving from a generic error message to identifying the single-channel limitation mirrors real-world diagnostic workflows.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Leveraging the Right Tool:&lt;/strong&gt; The open-source community often has specialized tools for niche problems. Finding and using them is a key competency.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;By matching channels and using &lt;code&gt;create_ap&lt;/code&gt;, you're not just running a command; you're applying a deeper understanding of your system to bypass a hardware constraint. This transforms your Kali machine from a device stumped by a basic error into a powerful, flexible networking hub.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Have you encountered other stubborn Kali Linux networking errors?&lt;/strong&gt; Share your story in the comments—let's build a repository of solutions for the community.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Tags:&lt;/strong&gt; #KaliLinux #CyberSecurity #LinuxNetworking #WiFiHotspot #FailedToSetBeaconParameters #Troubleshooting #LinuxTips #InfoSec&lt;/p&gt;

</description>
      <category>cybersecurity</category>
      <category>linux</category>
      <category>networking</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>Building Production-Grade Data Analytics Pipelines: A Real-World Case Study in Government Data</title>
      <dc:creator>Rose Wabere</dc:creator>
      <pubDate>Wed, 21 Jan 2026 11:43:44 +0000</pubDate>
      <link>https://forem.com/rosewabere/building-production-grade-data-analytics-pipelines-a-real-world-case-study-in-government-data-22pf</link>
      <guid>https://forem.com/rosewabere/building-production-grade-data-analytics-pipelines-a-real-world-case-study-in-government-data-22pf</guid>
      <description>&lt;p&gt;&lt;em&gt;How I transformed suppressed government data into actionable intelligence for policy makers—and what it taught me about data engineering, stakeholder communication, and building systems that survive real-world scrutiny.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1551288049-bebda4e38f71%3Fw%3D800" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimages.unsplash.com%2Fphoto-1551288049-bebda4e38f71%3Fw%3D800" alt="Banner: Data visualization showing data quality assessment" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem
&lt;/h2&gt;

&lt;p&gt;A government agency tasked me with analyzing a multi-year, multi-dimensional dataset covering hierarchical performance metrics across thousands of locations and demographic subgroups. Think: sprawling Excel files with cryptic codes, privacy-protected values, and conflicting data quality requirements.&lt;/p&gt;

&lt;p&gt;The catch? About &lt;strong&gt;30-40% of the data was suppressed&lt;/strong&gt; using privacy-protection codes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;*&lt;/code&gt; = Small populations (suppressed for confidentiality)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;&amp;lt;&lt;/code&gt; or &lt;code&gt;&amp;gt;&lt;/code&gt; = Boundary conditions (threshold effects)&lt;/li&gt;
&lt;li&gt;Numeric values = Reportable data&lt;/li&gt;
&lt;li&gt;Blank = Missing or unreported&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;They needed actionable insights from incomplete data. &lt;strong&gt;Fast.&lt;/strong&gt; But I knew that garbage in = garbage out. I couldn't just drop this into a model and hope for the best.&lt;/p&gt;

&lt;p&gt;This is the story of how I built a production-grade analytics pipeline that turns suppressed, incomplete data into trustworthy intelligence—and what it taught me about working with real-world datasets.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Challenge: Data as a Governance Problem
&lt;/h2&gt;

&lt;p&gt;Most tutorials show you clean, happy datasets. Real life? Real life shows up with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Real government data looks like this:
&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;head&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# Output:
&lt;/span&gt;   &lt;span class="n"&gt;Location&lt;/span&gt; &lt;span class="n"&gt;Name&lt;/span&gt;         &lt;span class="n"&gt;Subgroup&lt;/span&gt;         &lt;span class="n"&gt;Metric&lt;/span&gt; &lt;span class="n"&gt;Rate&lt;/span&gt;
&lt;span class="mi"&gt;0&lt;/span&gt;  &lt;span class="n"&gt;Location&lt;/span&gt; &lt;span class="n"&gt;A&lt;/span&gt;            &lt;span class="n"&gt;All&lt;/span&gt;              &lt;span class="mf"&gt;65.5&lt;/span&gt;
&lt;span class="mi"&gt;1&lt;/span&gt;  &lt;span class="n"&gt;Location&lt;/span&gt; &lt;span class="n"&gt;A&lt;/span&gt;            &lt;span class="n"&gt;Subgroup&lt;/span&gt; &lt;span class="n"&gt;X&lt;/span&gt;       &lt;span class="o"&gt;*&lt;/span&gt;
&lt;span class="mi"&gt;2&lt;/span&gt;  &lt;span class="n"&gt;Location&lt;/span&gt; &lt;span class="n"&gt;A&lt;/span&gt;            &lt;span class="n"&gt;Subgroup&lt;/span&gt; &lt;span class="n"&gt;Y&lt;/span&gt;       &lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="mi"&gt;95&lt;/span&gt;
&lt;span class="mi"&gt;3&lt;/span&gt;  &lt;span class="n"&gt;Location&lt;/span&gt; &lt;span class="n"&gt;A&lt;/span&gt;            &lt;span class="n"&gt;Subgroup&lt;/span&gt; &lt;span class="n"&gt;Z&lt;/span&gt;       &lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;
&lt;span class="mi"&gt;4&lt;/span&gt;  &lt;span class="n"&gt;Location&lt;/span&gt; &lt;span class="n"&gt;B&lt;/span&gt;            &lt;span class="n"&gt;All&lt;/span&gt;              &lt;span class="mf"&gt;54.2&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;The core challenge wasn't technical—it was conceptual.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;How do you analyze data when some values are intentionally hidden for privacy? How do you communicate with confidence when 30-40% of your dataset is suppressed? How do you avoid making bad decisions based on incomplete information?&lt;/p&gt;

&lt;p&gt;Traditional approaches would either:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Ignore suppressed data&lt;/strong&gt;   (lose 30-40% of context)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Estimate suppressed values&lt;/strong&gt;   (introduce bias and legal risk)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Include them as zeros or arbitrary values&lt;/strong&gt;   (create false conclusions)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Give up&lt;/strong&gt;   (deliver nothing)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Classify and track systematically&lt;/strong&gt;  (the right way)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;I needed approach #5.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Solution: A Multi-Layer Data Architecture
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Layer 1: Suppression Classification
&lt;/h3&gt;

&lt;p&gt;First, I built a classifier that understood what each code meant:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;identify_suppression_type&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Classify suppression types in educational data.
    Returns classification for auditing and analysis purposes.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;isna&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;missing&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;

    &lt;span class="n"&gt;str_val&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;str_val&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;*&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;complete_suppression&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;      &lt;span class="c1"&gt;# Privacy protection
&lt;/span&gt;    &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;str_val&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;startswith&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;&amp;gt;&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;threshold_high&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;             &lt;span class="c1"&gt;# &amp;gt;98% (ceiling)
&lt;/span&gt;    &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;str_val&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;startswith&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;&amp;lt;&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;threshold_low&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;              &lt;span class="c1"&gt;# &amp;lt;2% (floor)
&lt;/span&gt;    &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="nf"&gt;any&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;isdigit&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;str_val&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;.&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;''&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;%&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;''&lt;/span&gt;&lt;span class="p"&gt;)):&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;valid_numeric&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;              &lt;span class="c1"&gt;# Actual data
&lt;/span&gt;    &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;unknown_text&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;               &lt;span class="c1"&gt;# Data quality issue
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Why this matters:&lt;/strong&gt; This function became the "source of truth" for understanding data quality. Every record now had a classification, enabling downstream decision-making.&lt;/p&gt;

&lt;h3&gt;
  
  
  Layer 2: Dual-Column Architecture
&lt;/h3&gt;

&lt;p&gt;Here's where I made a deliberate architectural choice that surprised stakeholders but saved countless hours of debugging:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;clean_government_data&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;target_column&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Preserve original values while creating analytics-ready columns.

    Architecture:
    - Original column: Unchanged (audit trail for compliance)
    - Numeric column: Converted values (analytics calculations)
    - Flag column: Suppression type (governance tracking)
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;df_clean&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;copy&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="c1"&gt;# Create metadata column (governance and compliance)
&lt;/span&gt;    &lt;span class="n"&gt;df_clean&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;target_column&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;_flag&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df_clean&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;target_column&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;apply&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;identify_suppression_type&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Create analytics column (separate from original)
&lt;/span&gt;    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;convert_to_numeric&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;isna&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;nan&lt;/span&gt;

        &lt;span class="n"&gt;str_val&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

        &lt;span class="c1"&gt;# Suppress non-numeric indicators
&lt;/span&gt;        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;str_val&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;*&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;''&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="n"&gt;str_val&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;startswith&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;&amp;gt;&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="n"&gt;str_val&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;startswith&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;&amp;lt;&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;nan&lt;/span&gt;

        &lt;span class="c1"&gt;# Convert valid values
&lt;/span&gt;        &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;float&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;str_val&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;%&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;''&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;ValueError&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;nan&lt;/span&gt;

    &lt;span class="n"&gt;df_clean&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;target_column&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;_numeric&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df_clean&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;target_column&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;apply&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;convert_to_numeric&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;df_clean&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Result&lt;/strong&gt;: Three versions of every datapoint:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Original:        65.5  (or *, &amp;lt;, &amp;gt;)
Numeric:         65.5  (or NaN)
Flag:            'valid_numeric' (or 'suppression_type')
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Why this architecture?&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt; Audit trail (legal requirement for government work)&lt;/li&gt;
&lt;li&gt; Recalculation possible (if methodology changes)&lt;/li&gt;
&lt;li&gt; No information loss (original preserved)&lt;/li&gt;
&lt;li&gt; Type safety (NaN, not corrupted values)&lt;/li&gt;
&lt;li&gt; Compliance-ready (can demonstrate data handling integrity)&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  Layer 3: Intelligent Missing Data Handling
&lt;/h3&gt;

&lt;p&gt;Most analyses would dropna() and move on. I went deeper:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;verify_data_coverage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df_cleaned&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;column_name&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Understand suppression patterns to assess analysis reliability.
    This determines confidence levels for any downstream insights.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;numeric_col&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;column_name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;_numeric&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
    &lt;span class="n"&gt;flag_col&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;column_name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;_flag&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;

    &lt;span class="n"&gt;total_rows&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df_cleaned&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;valid_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df_cleaned&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;numeric_col&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;notna&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;suppressed&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df_cleaned&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;flag_col&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;complete_suppression&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;threshold_issues&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df_cleaned&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;flag_col&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;threshold_high&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; 
        &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df_cleaned&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;flag_col&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;threshold_low&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="n"&gt;coverage_pct&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;valid_data&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;total_rows&lt;/span&gt;

    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Data Coverage for &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;column_name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;  Valid data: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;valid_data&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;total_rows&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; (&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;coverage_pct&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;%)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;  Privacy suppression: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;suppressed&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; (&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;suppressed&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;total_rows&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;%)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;  Threshold issues: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;threshold_issues&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; (&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;threshold_issues&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;total_rows&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;%)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;total&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;total_rows&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;valid&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;valid_data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;suppressed&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;suppressed&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;coverage_pct&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;coverage_pct&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Output:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Data Coverage for Performance Metric:
  Valid data: 15,234/20,700 (73.6%)
  Privacy suppression: 3,102 (15.0%)
  Threshold issues: 1,854 (8.9%)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This transparency gave stakeholders the confidence they needed: "We can analyze 73.6% of data with high confidence. The missing 26.4% is either privacy-protected or boundary conditions. Here's what that means for our conclusions..."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key insight&lt;/strong&gt;: By quantifying data quality upfront, I established credibility and set realistic expectations about what the analysis could deliver.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Analysis: From Data to Decisions
&lt;/h2&gt;

&lt;p&gt;With clean data, the analysis became straightforward. But the presentation? That's where the real work happened.&lt;/p&gt;

&lt;h3&gt;
  
  
  Discovery #1: Disparities Emerge When You Measure Correctly
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;subgroup_stats&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;location_df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;groupby&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Demographic_Subgroup&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Metric_numeric&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;agg&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;count&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;mean&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;median&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;std&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;min&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;max&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="c1"&gt;# When analyzed across valid data only:
# Subgroup_A: 52.1% (below system average)
# Subgroup_B: 77.1% (above system average)
# System Average: 65.5%
# Disparity: ~25 percentage points
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Key insight for stakeholders:&lt;/strong&gt; Once you properly handle suppressed data, disparities become visible and quantifiable. This is the foundation for evidence-based decision-making.&lt;/p&gt;

&lt;h3&gt;
  
  
  Discovery #2: Geography Is a Powerful Predictor
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Location-level analysis reveals dramatic variation
&lt;/span&gt;&lt;span class="n"&gt;location_rates&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;location_df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;groupby&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Location_ID&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Metric_numeric&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Highest performing: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;location_rates&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;max&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;%&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Lowest performing: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;location_rates&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;min&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;%&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Range: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;location_rates&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;max&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;location_rates&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;min&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;%&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Individual locations vary dramatically. This variation tells you something important: &lt;strong&gt;context matters&lt;/strong&gt;. Different resources, different strategies, different outcomes.&lt;/p&gt;

&lt;h3&gt;
  
  
  Discovery #3: Hierarchical Analysis Reveals System-Level Patterns
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Analyze at multiple levels:
&lt;/span&gt;&lt;span class="n"&gt;location_level&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;location_df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;groupby&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Location&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;agg&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Metric_numeric&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;mean&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;count&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]})&lt;/span&gt;
&lt;span class="n"&gt;district_level&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;location_df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;groupby&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;District&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;agg&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Metric_numeric&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;mean&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;count&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]})&lt;/span&gt;
&lt;span class="n"&gt;regional_level&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;location_df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;groupby&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Region&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;agg&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Metric_numeric&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;mean&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;count&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]})&lt;/span&gt;
&lt;span class="n"&gt;system_level&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;location_df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Metric_numeric&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Insight&lt;/strong&gt;: What looks like variation at the location level often has district-level explanations. System-level patterns suggest policy impacts.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Visualization: Making Data Speak
&lt;/h2&gt;

&lt;p&gt;Raw numbers don't move decision-makers. &lt;strong&gt;Color-coded, clearly labeled visualizations do.&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;matplotlib.pyplot&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;plt&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;seaborn&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;sns&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_risk_color&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;metric_value&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Color-code by performance tier for immediate visual understanding.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;metric_value&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;70&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;#388e3c&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;      &lt;span class="c1"&gt;# Green: Strong performance
&lt;/span&gt;    &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;metric_value&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;65&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;#fbc02d&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;      &lt;span class="c1"&gt;# Yellow: Acceptable
&lt;/span&gt;    &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;metric_value&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;60&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;#f57c00&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;      &lt;span class="c1"&gt;# Orange: Needs attention
&lt;/span&gt;    &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;#d32f2f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;      &lt;span class="c1"&gt;# Red: Critical
&lt;/span&gt;
&lt;span class="c1"&gt;# Create multi-panel risk-stratified visualization
&lt;/span&gt;&lt;span class="n"&gt;fig&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;axes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;subplots&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;figsize&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;12&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;span class="c1"&gt;# Panel 1: All subgroups ranked by performance
&lt;/span&gt;&lt;span class="n"&gt;subgroup_stats&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;location_df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;groupby&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Demographic_Subgroup&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Metric_numeric&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;sort_values&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;colors&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nf"&gt;get_risk_color&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;val&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;val&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;subgroup_stats&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;values&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="n"&gt;axes&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;barh&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;subgroup_stats&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt; &lt;span class="n"&gt;subgroup_stats&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;values&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;color&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;colors&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;axes&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;set_yticks&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;subgroup_stats&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt;
&lt;span class="n"&gt;axes&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;set_yticklabels&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;subgroup_stats&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;axes&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;axvline&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;location_df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Metric_numeric&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; 
                   &lt;span class="n"&gt;color&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;purple&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;linestyle&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;--&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;label&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;System Average&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;axes&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;set_title&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Performance by Demographic Group (Ranked)&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
                     &lt;span class="n"&gt;fontweight&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;bold&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;axes&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;legend&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# Panel 2: Critical performance focus
&lt;/span&gt;&lt;span class="n"&gt;critical&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;subgroup_stats&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;subgroup_stats&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;60&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;critical&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;axes&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;barh&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;critical&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt; &lt;span class="n"&gt;critical&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;values&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
                    &lt;span class="n"&gt;color&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;#d32f2f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;critical&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="n"&gt;axes&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;set_title&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Groups Requiring Targeted Support (&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;critical&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; groups)&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
                         &lt;span class="n"&gt;fontweight&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;bold&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Panel 3: Disparity visualization
&lt;/span&gt;&lt;span class="n"&gt;highest&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;subgroup_stats&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;max&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;lowest&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;subgroup_stats&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;min&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;disparity&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;highest&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;lowest&lt;/span&gt;
&lt;span class="n"&gt;axes&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;bar&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Lowest Performing&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;System Average&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Highest Performing&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; 
              &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;lowest&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;location_df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Metric_numeric&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;highest&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
              &lt;span class="n"&gt;color&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;#d32f2f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;#9c27b0&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;#388e3c&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="n"&gt;axes&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;set_title&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Performance Disparity Gap: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;disparity&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;%&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
                     &lt;span class="n"&gt;fontweight&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;bold&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Panel 4: Performance distribution
&lt;/span&gt;&lt;span class="n"&gt;performance_dist&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Series&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Strong (&amp;gt;70%)&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;subgroup_stats&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;70&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Acceptable (65-70%)&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;subgroup_stats&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mi"&gt;65&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;subgroup_stats&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="mi"&gt;70&lt;/span&gt;&lt;span class="p"&gt;)).&lt;/span&gt;&lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Needs Attention (60-65%)&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;subgroup_stats&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mi"&gt;60&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;subgroup_stats&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;65&lt;/span&gt;&lt;span class="p"&gt;)).&lt;/span&gt;&lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Critical (&amp;lt;60%)&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;subgroup_stats&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;60&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="n"&gt;axes&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;bar&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;performance_dist&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt; &lt;span class="n"&gt;performance_dist&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;values&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
              &lt;span class="n"&gt;color&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;#388e3c&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;#fbc02d&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;#f57c00&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;#d32f2f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="n"&gt;axes&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;set_xticks&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;performance_dist&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt;
&lt;span class="n"&gt;axes&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;set_xticklabels&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;performance_dist&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;rotation&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;45&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ha&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;right&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;axes&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;set_title&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Performance Distribution Across All Groups&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fontweight&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;bold&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;tight_layout&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;show&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Impact&lt;/strong&gt;: These visualizations communicate what statistics can't—the &lt;em&gt;scale&lt;/em&gt; of disparities and the &lt;em&gt;distribution&lt;/em&gt; of challenges. Decision-makers understand immediately which groups need attention.&lt;/p&gt;




&lt;h2&gt;
  
  
  Lessons Learned: What Real Government Clients Actually Care About
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Lesson #1: Data Quality &amp;gt; Data Quantity
&lt;/h3&gt;

&lt;p&gt;The agency had 20,000+ records. But 30-40% was suppressed. That's &lt;em&gt;not&lt;/em&gt; 20,000 usable datapoints—it's ~12,000-14,000.&lt;/p&gt;

&lt;p&gt;Most analysts would hide this fact. I led with it:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Here's what we can analyze with high confidence (73.6% of records), here's what's privacy-protected (15%), and here's what's boundary conditions (8.9%). Our recommendations are based on properly validated data across thousands of locations."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Result&lt;/strong&gt;: They trusted me &lt;em&gt;more&lt;/em&gt;, not less. Transparency about limitations builds credibility.&lt;/p&gt;

&lt;h3&gt;
  
  
  Lesson #2: Preserve the Audit Trail
&lt;/h3&gt;

&lt;p&gt;When I showed them three columns (original → numeric → flag), they initially thought I was overcomplicating things.&lt;/p&gt;

&lt;p&gt;Months later, when budget and policy questions arose: "How did you calculate that disparity? Can you show the original data?"&lt;/p&gt;

&lt;p&gt;I could pull the exact original value, show the classification logic, show the calculation, and audit the entire pipeline.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;That's when they realized the real value.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Code that can be audited survives. Government agencies &lt;em&gt;demand&lt;/em&gt; auditability. It's not optional.&lt;/p&gt;

&lt;h3&gt;
  
  
  Lesson #3: Stakeholders Care About Impact, Not Methods
&lt;/h3&gt;

&lt;p&gt;I spent weeks optimizing the data pipeline and architecture. The stakeholders spent 30 seconds looking at the visualizations.&lt;/p&gt;

&lt;p&gt;They didn't care that I used pandas groupby operations or matplotlib color schemes. They cared that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The data quality was clear&lt;/li&gt;
&lt;li&gt;The disparities were quantifiable&lt;/li&gt;
&lt;li&gt;They had actionable next steps&lt;/li&gt;
&lt;li&gt;The methodology could be audited&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Lesson&lt;/strong&gt;: Show 30 seconds of impact, not 3 hours of methodology. (Save the methodology for technical documentation and code comments.)&lt;/p&gt;

&lt;h3&gt;
  
  
  Lesson #4: One Number Changes Everything
&lt;/h3&gt;

&lt;p&gt;The single most powerful metric? &lt;strong&gt;The disparity magnitude.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Not "Group A is at 52%" and "Group B is at 77%"...&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"We have a 25-point performance disparity across demographic groups."&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That number led directly to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Identifying priority populations&lt;/li&gt;
&lt;li&gt;Budget allocation decisions&lt;/li&gt;
&lt;li&gt;Program creation&lt;/li&gt;
&lt;li&gt;Policy discussions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Find the one number that captures the essence of the problem.&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Lesson #5: Government Work Requires Different Thinking
&lt;/h3&gt;

&lt;p&gt;This wasn't a startup where you ship fast and iterate. This was government where:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Everything must be documented&lt;/li&gt;
&lt;li&gt;All decisions need justification&lt;/li&gt;
&lt;li&gt;All calculations must be verifiable&lt;/li&gt;
&lt;li&gt;Privacy and compliance are non-negotiable&lt;/li&gt;
&lt;li&gt;Stakeholders need to understand &lt;em&gt;why&lt;/em&gt;, not just &lt;em&gt;what&lt;/em&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;This requires a different approach than commercial data science.&lt;/strong&gt; You're not optimizing for speed—you're optimizing for integrity, transparency, and defensibility.&lt;/p&gt;

&lt;h2&gt;
  
  
  Technical Skills Showcased
&lt;/h2&gt;

&lt;p&gt;Here's what this project demonstrated to potential clients and recruiters:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Data Engineering&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Multi-source data consolidation (spreadsheets → Pandas)&lt;/li&gt;
&lt;li&gt;Schema design for compliance (original + numeric + flag columns)&lt;/li&gt;
&lt;li&gt;Data quality assessment (suppression classification)&lt;/li&gt;
&lt;li&gt;Hierarchical data processing (locations → districts → regions → system)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Data Analysis&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Descriptive statistics with missing data handling&lt;/li&gt;
&lt;li&gt;Comparative analysis across hierarchical groupings&lt;/li&gt;
&lt;li&gt;Disparity quantification and gap analysis&lt;/li&gt;
&lt;li&gt;Statistical confidence assessment based on coverage %&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Data Visualization&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Matplotlib + Seaborn for publication-quality charts&lt;/li&gt;
&lt;li&gt;Performance-based color coding (green/yellow/orange/red)&lt;/li&gt;
&lt;li&gt;Multi-panel dashboards for comprehensive understanding&lt;/li&gt;
&lt;li&gt;Clear labeling and legend design for non-technical audiences&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Problem-Solving&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Handling suppressed/sensitive data (privacy protection)&lt;/li&gt;
&lt;li&gt;Architectural decisions (dual-column approach for auditability)&lt;/li&gt;
&lt;li&gt;Stakeholder communication (translating data → insights)&lt;/li&gt;
&lt;li&gt;Government compliance thinking (documentation, verification)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Statistical Thinking&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Sample size implications of data suppression&lt;/li&gt;
&lt;li&gt;Understanding confidence in partial datasets&lt;/li&gt;
&lt;li&gt;Appropriate analysis methods given data constraints&lt;/li&gt;
&lt;li&gt;Avoiding false precision and overstatement&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Python Proficiency&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Pandas for data manipulation and groupby operations&lt;/li&gt;
&lt;li&gt;NumPy for numeric handling&lt;/li&gt;
&lt;li&gt;Matplotlib for visualization&lt;/li&gt;
&lt;li&gt;Jupyter notebooks for reproducible analysis&lt;/li&gt;
&lt;li&gt;Functions and classes for code organization&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  The Business Impact
&lt;/h2&gt;

&lt;p&gt;What started as a data cleaning exercise became a comprehensive analytics solution:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4 hierarchical levels processed&lt;/strong&gt; (Location, District, Region, System)&lt;br&gt;
&lt;strong&gt;20,000+ records analyzed&lt;/strong&gt; (with clear confidence levels)&lt;br&gt;
&lt;strong&gt;Disparities quantified&lt;/strong&gt; (specific gaps identified across demographic groups)&lt;br&gt;
&lt;strong&gt;Resource allocation decisions informed&lt;/strong&gt; (data-driven budget prioritization)&lt;br&gt;
&lt;strong&gt;5 evidence-based recommendations&lt;/strong&gt; delivered to leadership&lt;br&gt;
&lt;strong&gt;Reproducible pipeline&lt;/strong&gt; established (for ongoing monitoring)&lt;br&gt;
 &lt;strong&gt;Audit trail complete&lt;/strong&gt; (every calculation traceable to source data)&lt;/p&gt;




&lt;h2&gt;
  
  
  What I'd Do Differently
&lt;/h2&gt;

&lt;p&gt;If I could go back:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Involve stakeholders earlier&lt;/strong&gt; - I built the analysis framework in isolation. More collaborative design upfront would've ensured alignment with actual decision-making needs.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Create interactive dashboards&lt;/strong&gt; - Static visualizations are good; interactive tools are better. "What if we focus on X region?" dashboard would've been powerful.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Qualitative context&lt;/strong&gt; - "Why" matters as much as "what." Interviews with field staff would've contextualized the numbers.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Longer-term monitoring&lt;/strong&gt; - Set up quarterly reporting to track progress on recommendations and refine strategies.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  The Key Takeaway
&lt;/h2&gt;

&lt;p&gt;Real-world data work isn't about building the most sophisticated model. It's about:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Understanding the domain&lt;/strong&gt; (government operations, compliance, policy constraints)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Handling messy reality&lt;/strong&gt; (suppressed data, missing values, privacy protection)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Building trust&lt;/strong&gt; (transparent methodology, auditable results, clear communication)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Communicating impact&lt;/strong&gt; (visualizations, clear narratives, actionable next steps)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Creating reproducibility&lt;/strong&gt; (clean code, documentation, maintainability)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If you can do these five things, you can work with &lt;em&gt;any&lt;/em&gt; dataset for &lt;em&gt;any&lt;/em&gt; client—whether it's government, nonprofit, or private sector.&lt;/p&gt;




&lt;h2&gt;
  
  
  For Recruiters &amp;amp; Potential Clients
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;If you're looking for someone who:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt; Transforms messy, suppressed data into actionable intelligence&lt;/li&gt;
&lt;li&gt; Communicates insights to non-technical stakeholders and decision-makers&lt;/li&gt;
&lt;li&gt; Handles data governance, privacy, and compliance requirements&lt;/li&gt;
&lt;li&gt; Builds production-grade analytics systems that survive scrutiny&lt;/li&gt;
&lt;li&gt; Thinks about impact and stakeholder value, not just technical elegance&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;...I'm interested in talking.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The tools matter less than the thinking. Python, R, SQL, Excel—they're all just ways to implement sound analytical thinking. That's what I bring to every project.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Specific strengths:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Government and public sector data projects&lt;/li&gt;
&lt;li&gt;Handling sensitive/suppressed data responsibly&lt;/li&gt;
&lt;li&gt;Complex hierarchical analysis&lt;/li&gt;
&lt;li&gt;Stakeholder communication and visualization&lt;/li&gt;
&lt;li&gt;Building trust through transparency&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Code &amp;amp; Documentation
&lt;/h2&gt;

&lt;p&gt;The technical approach used in this project is documented in detail at: &lt;code&gt;data_handling.md&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Key files demonstrate:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Suppression classification logic&lt;/li&gt;
&lt;li&gt;Dual-column architecture pattern&lt;/li&gt;
&lt;li&gt;Data quality assessment functions&lt;/li&gt;
&lt;li&gt;Visualization frameworks&lt;/li&gt;
&lt;li&gt;Best practices for government data projects&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;strong&gt;What data challenges are you facing? What's preventing you from getting clear answers from your data? What would change if you could confidently analyze your complete dataset?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Let me know in the comments—I'd love to help you think through it.&lt;/strong&gt;&lt;/p&gt;




&lt;h3&gt;
  
  
  Tags
&lt;/h3&gt;

&lt;h1&gt;
  
  
  DataScience #DataAnalytics #Python #Government #DataQuality #RealWorldDataAnalysis #DataVisualization #DataEngineering #StakeholderCommunication
&lt;/h1&gt;

</description>
      <category>analytics</category>
      <category>dataengineering</category>
      <category>datascience</category>
      <category>privacy</category>
    </item>
    <item>
      <title>Building a Modern Data Platform to Track Kenya’s Food Prices — A Data Engineering Case Study</title>
      <dc:creator>Rose Wabere</dc:creator>
      <pubDate>Thu, 13 Nov 2025 14:36:25 +0000</pubDate>
      <link>https://forem.com/rosewabere/building-a-modern-data-platform-to-track-kenyas-food-prices-a-data-engineering-case-study-15m2</link>
      <guid>https://forem.com/rosewabere/building-a-modern-data-platform-to-track-kenyas-food-prices-a-data-engineering-case-study-15m2</guid>
      <description>&lt;p&gt;Food price volatility has always been a sensitive issue across Kenya. From urban households in Nairobi to smallholder farmers in Turkana, the ripple effects of fluctuating food prices are felt everywhere. In this article, I share my end-to-end journey designing and implementing a &lt;strong&gt;data platform&lt;/strong&gt; that tracks food prices across Kenyan counties, detects inflation patterns, and visualizes trends for policymakers, consumers, and researchers.&lt;/p&gt;

&lt;p&gt;This project is both a &lt;strong&gt;portfolio centerpiece&lt;/strong&gt; and a &lt;strong&gt;real-world solution&lt;/strong&gt; — demonstrating how modern data engineering practices can power impactful analytics for the public good.&lt;/p&gt;




&lt;h2&gt;
  
  
  Project Goals &amp;amp; Context
&lt;/h2&gt;

&lt;p&gt;The main objective:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;To design a scalable, automated pipeline that cleans, aggregates, and visualizes food price data from multiple Kenyan data sources — delivering real-time, actionable insights.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Key outcomes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Build a unified data model to harmonize market, commodity, and regional data.&lt;/li&gt;
&lt;li&gt;Automate the ETL process using PySpark.&lt;/li&gt;
&lt;li&gt;Store processed data in PostgreSQL for analytics.&lt;/li&gt;
&lt;li&gt;Build dashboards in Grafana for spatial and temporal insights.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  High-Level Architecture
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[Raw Data Sources]
    |-- WFP HDX Food Prices
    |-- KNBS &amp;amp; FAOSTAT Datasets
    |-- Government Market Surveys
        ↓
[Ingestion Layer]
    pandas 
        ↓
[Processing Layer]
    PySpark Data Cleaning + Normalization
        ↓
[Storage Layer]
    PostgreSQL (Star Schema)
        ↓
[Visualization Layer]
    Grafana Dashboards / Streamlit Web App
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This modular setup ensures scalability — you can plug in additional data sources or visualization layers (like Power BI or Superset) without refactoring the pipeline.&lt;/p&gt;




&lt;h2&gt;
  
  
  Data Ingestion
&lt;/h2&gt;

&lt;p&gt;I used publicly available data from the &lt;strong&gt;World Food Programme (WFP)&lt;/strong&gt; via the Humanitarian Data Exchange (HDX).&lt;br&gt;&lt;br&gt;
Data was fetched dynamically from the remote CSV URL and processed in &lt;strong&gt;PySpark&lt;/strong&gt; for scalability.&lt;/p&gt;
&lt;h3&gt;
  
  
  Code Example: Reading the Data in PySpark
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pyspark.sql&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;SparkSession&lt;/span&gt;

&lt;span class="c1"&gt;# Initialize Spark Session
&lt;/span&gt;&lt;span class="n"&gt;spark&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;SparkSession&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;builder&lt;/span&gt; \
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;appName&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;KenyaFoodPrices&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; \
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;config&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;spark.jars.packages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;org.postgresql:postgresql:42.7.3&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; \
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getOrCreate&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# Remote CSV from HDX
&lt;/span&gt;&lt;span class="n"&gt;csv_url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://data.humdata.org/dataset/e0d3fba6-f9a2-45d7-b949-140c455197ff/resource/517ee1bf-2437-4f8c-aa1b-cb9925b9d437/download/wfp_food_prices_ken.csv&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;fetch_and_load_csv&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;csv_url&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;decode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;utf-8&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Split into header and data
&lt;/span&gt;    &lt;span class="n"&gt;lines&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;splitlines&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;rdd&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;spark&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sparkContext&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;parallelize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;lines&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;spark&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;read&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;csv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rdd&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;header&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;inferSchema&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;

&lt;span class="c1"&gt;# Load into Spark DataFrame
&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;fetch_and_load_csv&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;show&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;This approach avoids downloading files manually and scales well in scheduled jobs (for example, through Airflow or Databricks).&lt;/p&gt;


&lt;h2&gt;
  
  
  Data Transformation with PySpark
&lt;/h2&gt;

&lt;p&gt;PySpark handled the heavy ETL logic — particularly useful for large datasets spanning multiple years and counties.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pyspark.sql&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;SparkSession&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pyspark.sql.functions&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;col&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;trim&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;lower&lt;/span&gt;

&lt;span class="n"&gt;spark&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;SparkSession&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;builder&lt;/span&gt; \
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;appName&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;KenyaFoodPriceETL&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; \
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;config&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;spark.jars&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;postgresql-42.6.0.jar&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; \
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getOrCreate&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# Load data
&lt;/span&gt;&lt;span class="n"&gt;raw_df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;spark&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;read&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;csv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;combined_raw.csv&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;header&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;inferSchema&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Clean commodity and location names
&lt;/span&gt;&lt;span class="n"&gt;clean_df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;raw_df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;withColumn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;commodity&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;trim&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;col&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;commodity&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))))&lt;/span&gt; \
                 &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;withColumn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;region&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;trim&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;col&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;region&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))))&lt;/span&gt; \
                 &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;col&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;price&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;isNotNull&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;

&lt;span class="c1"&gt;# Normalize price units (example conversion)
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pyspark.sql.functions&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;when&lt;/span&gt;

&lt;span class="n"&gt;clean_df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;clean_df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;withColumn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;price_per_kg&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nf"&gt;when&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;col&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;unit&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;90kg&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;col&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;price&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;90&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;when&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;col&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;unit&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;1kg&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;col&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;price&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;otherwise&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;col&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;price&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This logic ensures consistency — converting all prices to a comparable unit (per kilogram), filtering out nulls, and standardizing categorical columns.&lt;/p&gt;




&lt;h2&gt;
  
  
  Data Modeling: Star Schema Design
&lt;/h2&gt;

&lt;p&gt;To make the data warehouse efficient for analytics, I implemented a &lt;strong&gt;star schema&lt;/strong&gt; model in PostgreSQL.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fact Table:&lt;/strong&gt; &lt;code&gt;fact_prices&lt;/code&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Column&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;date_id&lt;/td&gt;
&lt;td&gt;Foreign key to &lt;code&gt;date_dim&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;location_id&lt;/td&gt;
&lt;td&gt;Foreign key to &lt;code&gt;location_dim&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;product_id&lt;/td&gt;
&lt;td&gt;Foreign key to &lt;code&gt;product_dim&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;price_type_id&lt;/td&gt;
&lt;td&gt;Retail/Wholesale category&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;price&lt;/td&gt;
&lt;td&gt;Price per KG&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;quantity&lt;/td&gt;
&lt;td&gt;Quantity sold&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Dimension Tables:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;product_dim&lt;/code&gt;: commodity, category&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;location_dim&lt;/code&gt;: region, district, market, latitude, longitude&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;date_dim&lt;/code&gt;: day, month, year&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;price_type_dim&lt;/code&gt;: retail/wholesale indicator&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Table Creation via SQLAlchemy
&lt;/h2&gt;

&lt;p&gt;Before loading the data from Spark, I pre-created the tables using &lt;strong&gt;SQLAlchemy&lt;/strong&gt; — ensuring primary keys and constraints were properly set.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sqlalchemy&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;create_engine&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;MetaData&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Table&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Column&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Integer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Float&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ForeignKey&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;dotenv&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;load_dotenv&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;

&lt;span class="nf"&gt;load_dotenv&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;engine&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;create_engine&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;POSTGRES_URL&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="n"&gt;metadata&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;MetaData&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;product_dim&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Table&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;product_dim&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nc"&gt;Column&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;product_id&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Integer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;primary_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="nc"&gt;Column&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;commodity&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="nc"&gt;Column&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;category&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;String&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;fact_prices&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Table&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;fact_prices&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nc"&gt;Column&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;price_id&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Integer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;primary_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="nc"&gt;Column&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;product_id&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Integer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;ForeignKey&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;product_dim.product_id&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
    &lt;span class="nc"&gt;Column&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;location_id&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Integer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;ForeignKey&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;location_dim.location_id&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
    &lt;span class="nc"&gt;Column&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;price&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Float&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="nc"&gt;Column&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;date_id&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Integer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;ForeignKey&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;date_dim.date_id&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_all&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;engine&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This step ensures referential integrity and avoids schema drift.&lt;/p&gt;




&lt;h2&gt;
  
  
  Writing Data to PostgreSQL
&lt;/h2&gt;

&lt;p&gt;Once tables were set, PySpark’s JDBC writer handled efficient parallel writes to PostgreSQL.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;clean_df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;write&lt;/span&gt; \
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;format&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;jdbc&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; \
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;option&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;POSTGRES_URL&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; \
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;option&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;dbtable&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;fact_prices&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; \
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;option&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;POSTGRES_USER&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; \
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;option&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;password&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;POSTGRES_PASSWORD&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; \
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;mode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;append&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; \
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;save&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With proper partitioning, this approach efficiently handled millions of rows.&lt;/p&gt;




&lt;h2&gt;
  
  
  Visualization with Grafana
&lt;/h2&gt;

&lt;p&gt;Grafana provided a clean, dynamic way to visualize the data — connected directly to PostgreSQL.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key Dashboards:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Price Trend Over Time&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Monthly price lines by commodity&lt;/li&gt;
&lt;li&gt;Compare multiple items&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;Latest Prices by Commodity and Region&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Price comparison by location&lt;/li&gt;
&lt;li&gt;Can be used to detect inflation by region&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;National Average Price Dashboard&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Avg price per commodity by latest month&lt;/li&gt;
&lt;li&gt;Used for cost-of-living analysis&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvw8ayuuoyyb7oqk61upa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvw8ayuuoyyb7oqk61upa.png" alt=" " width="800" height="333"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Grafana panels were linked to SQL queries like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt;
    &lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;year&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;month&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;commodity&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;AVG&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;price&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;avg_price&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;fact_prices&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;
&lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;date_dim&lt;/span&gt; &lt;span class="n"&gt;d&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;date_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;date_id&lt;/span&gt;
&lt;span class="k"&gt;JOIN&lt;/span&gt; &lt;span class="n"&gt;product_dim&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;product_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;product_id&lt;/span&gt;
&lt;span class="k"&gt;GROUP&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;year&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;month&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;commodity&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;year&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;month&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Challenges &amp;amp; Lessons Learned
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Inconsistent Units:&lt;/strong&gt; Prices were listed as 90kg bags, 1kg tins, or liters. Unit normalization required extensive mapping logic.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Duplicate Commodity Names:&lt;/strong&gt; Variants like “Beans (Dry)” vs “Dry Beans” — solved with fuzzy matching.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Spatial Mapping:&lt;/strong&gt; Used latitude/longitude data to create a geographic layer for Grafana maps.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Performance Tuning:&lt;/strong&gt; Spark’s &lt;code&gt;.repartition()&lt;/code&gt; and broadcast joins improved load times.&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Real-World Impact
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Enables WFP or local governments to monitor inflation regionally&lt;/li&gt;
&lt;li&gt;Helps farmers and consumers understand price dynamics&lt;/li&gt;
&lt;li&gt;Open to integrating other indicators (e.g., rainfall, NDVI)&lt;/li&gt;
&lt;li&gt;Ready for real-time (Kafka + Debezium integration possible)&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Next Steps
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Integrate &lt;strong&gt;Apache Airflow&lt;/strong&gt; for automated ETL scheduling.&lt;/li&gt;
&lt;li&gt;Use &lt;strong&gt;Prophet or ARIMA&lt;/strong&gt; models to forecast inflation.&lt;/li&gt;
&lt;li&gt;Add &lt;strong&gt;Kafka + Debezium CDC&lt;/strong&gt; for real-time updates.&lt;/li&gt;
&lt;li&gt;Expand visualizations with weather and trade data overlays.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Takeaways
&lt;/h2&gt;

&lt;p&gt;This project blends &lt;strong&gt;engineering rigor&lt;/strong&gt; with &lt;strong&gt;social impact&lt;/strong&gt; — using open data to empower informed decisions. It demonstrates proficiency in:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Distributed data processing (PySpark)&lt;/li&gt;
&lt;li&gt;SQL and data modeling best practices&lt;/li&gt;
&lt;li&gt;Workflow orchestration (Airflow-ready design)&lt;/li&gt;
&lt;li&gt;Visualization and analytics integration (Grafana)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It’s a strong addition to any data engineering portfolio — especially for roles focusing on public data, infrastructure, or analytics engineering.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Links&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  GitHub: &lt;a href="https://github.com/Rozieroz/Kenyan-Food-Price-Tracker-and-Inflation-Analysis" rel="noopener noreferrer"&gt;github.com/RoseWabere/Kenyan-Food-Price-Tracker-and-Inflation-Analysis&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Keywords:&lt;/strong&gt; Kenya food prices, PySpark ETL, PostgreSQL data warehouse, Grafana dashboard, data engineering portfolio, Airflow pipeline, public data analytics, inflation analysis Kenya&lt;/p&gt;

</description>
      <category>spark</category>
      <category>pyspark</category>
      <category>grafana</category>
      <category>dataengineering</category>
    </item>
    <item>
      <title>What is Data Engineering?</title>
      <dc:creator>Rose Wabere</dc:creator>
      <pubDate>Thu, 01 May 2025 23:34:04 +0000</pubDate>
      <link>https://forem.com/rosewabere/what-is-data-engineering-5f1h</link>
      <guid>https://forem.com/rosewabere/what-is-data-engineering-5f1h</guid>
      <description>&lt;p&gt;With data being a highly valuable asset in today’s data-driven world, roles requiring expertise and knowledge on data have been on the rise as institutions hire data experts to help make the most out of the data obtained.&lt;/p&gt;

&lt;p&gt;This article looks into data engineering, its roles and responsibilities, differences between data engineers and data scientists, skills needed, why one would consider a data engineering career path, where data engineers can work, and why companies need to hire data engineers.&lt;/p&gt;

&lt;h2&gt;
  
  
  Defining data engineering
&lt;/h2&gt;

&lt;p&gt;Data engineering is a field focused on designing and building systems used in the collection, organization, and processing of data. When the data is collected, it is often in a raw and unstructured format that requires the input of a data engineer to transform it so it can be used in analysis and operations.&lt;/p&gt;

&lt;p&gt;In an era where data is considered to be a valuable asset, data engineers are needed to help ensure data is available whenever and wherever it is needed. With businesses generating a lot of data, these professionals are tasked with preparing and optimizing the data for analysis and operations.&lt;/p&gt;

&lt;h2&gt;
  
  
  What are the roles and responsibilities of data engineers?
&lt;/h2&gt;

&lt;p&gt;Some of the roles and responsibilities of data engineers include:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Data pipeline management and designing&lt;/strong&gt;: Data engineers are tasked with designing and building automated ETL pipelines, which help ensure that data flows smoothly from sources to data warehouses or lakes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Managing data infrastructure&lt;/strong&gt;: As organizations increasingly accumulate more data, engineers help select and manage storage architectures that can handle the growing datasets efficiently and maintain performance and scalability.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Data modelling&lt;/strong&gt;: Involves creating data models and schemas that make storing and retrieving data efficient.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Optimizing performance&lt;/strong&gt;: This is achieved by improving query speeds, cost management, and streamlining the infrastructure.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Ensuring data security and compliance&lt;/strong&gt;: It is the responsibility of a data engineer to ensure that policies on data access and usage are implemented.&lt;/p&gt;

&lt;h2&gt;
  
  
  Data engineering vs data science
&lt;/h2&gt;

&lt;p&gt;The roles of data engineers and data scientists are often confused. Although they work together, they have differences in aspects such as goals, tasks, skills, and the tools they use.&lt;/p&gt;

&lt;p&gt;Before delving into the differences, let’s look at what data science is. Data science is a multi-disciplinary field that integrates maths, artificial intelligence, analytics, and machine learning.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Goals:&lt;/strong&gt; While data engineers aim to build infrastructure for data collection and make data accessible and usable, data scientists aim to use data to extract insights and predict outcomes while utilizing machine learning, thus enhancing and improving the decision-making processes.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Tasks:&lt;/strong&gt; Data engineers are mainly tasked with building ETL pipelines, warehousing, and modelling, while data scientists are tasked with analytics, machine learning, and visualization.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Skills:&lt;/strong&gt; Data engineers need skills in programming, databases, ETL, and architecture, while data scientists need skills like mathematics, statistics, and machine learning.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Tools used&lt;/strong&gt;: Data engineers mainly use tools such as Python, Apache Spark, Kafka, Airflow, and SQL, while the main tools for data scientists are Python, R, TensorFlow, scikit-learn, Jupyter, and Notebooks.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Data engineering is the backbone of data science, analytics, and machine learning, as it helps make life easier for other data specialists. Data engineers will build the infrastructure to handle data, and the data scientists and analysts will use that data to gain insights crucial for making informed decisions.&lt;/p&gt;

&lt;h2&gt;
  
  
  Data engineering skills
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Programming skills.&lt;/strong&gt; Knowledge of programming languages such as Python, Java, or Scala is needed.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Database knowledge.&lt;/strong&gt; Data engineers need to understand relational databases like SQL for structured data and NOSQL for unstructured data, which are essential in storing and managing data.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Familiarity with ETL (extract, transform, load).&lt;/strong&gt; Data engineers deal with different data sources from which they have to clean the data and load it to the required destinations, thus they need to know how to use tools such as Apache Airflow.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Cloud computing knowledge.&lt;/strong&gt; The expansive use and adoption of cloud platforms like AWS and Azure to store data necessitates that data engineers have cloud computing expertise.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Big data technologies.&lt;/strong&gt; To handle huge data sets, it is essential to be familiar with frameworks such as Apache Kafka, Hadoop, and Apache Spark, which largely help examine complex data sets.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Problem-solving and analytical skills.&lt;/strong&gt; Data engineers often work with complicated systems where they encounter problems that require them to be quick and effective.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Why consider a data engineering career?
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;High demand:&lt;/strong&gt; Opportunities for data engineering continue to be on the rise as organizations increasingly become data-driven.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Lucrative opportunities:&lt;/strong&gt; Data engineers are among the highest-paid professionals in the tech field for their expertise and skills.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Diverse opportunities:&lt;/strong&gt; Work opportunities are not limited since the profession is spread across fields such as finance, retail, technology, and health care.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Diverse and transferable skillset:&lt;/strong&gt; since data engineering combines skillsets in database management, cloud technologies, and systems architecture, it opens doors across different industries such as data science, data analytics, and software engineering, among others.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Basis for AI:&lt;/strong&gt; Data engineers create pipelines and the infrastructure that powers AI and predictive analysis, without which analytics and machine learning would not be possible&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Where do data engineers work?
&lt;/h2&gt;

&lt;p&gt;Data engineers work in different industries such as media, finance, technology, education, healthcare, retail, and government. Let’s dive into some of their responsibilities in these sectors:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;E-commerce and retail&lt;/strong&gt;  - A data engineer can build recommendation systems by processing a client’s behaviour, transactions, and product data. Most importantly, the data engineer will help streamline operations by enabling real-time analytics for inventory management, logistics, and forecasting.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Healthcare&lt;/strong&gt;  - He/she may integrate and process patient data to support diagnostics and personalized treatment, which helps to improve patient care and research.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Finance&lt;/strong&gt;  - A data engineer may create pipelines that can help in fraud detection, risk assessment, and customer analytics. The engineer will also play the critical role of ensuring data quality and integrity are maintained, which is critical for performing audits, filings, and making strategic decisions.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Government&lt;/strong&gt;  - With the government, data engineers undertake responsibilities such as ensuring data privacy and compliance, and integrating census, tax, and public service data.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Media&lt;/strong&gt;  - The data engineer can optimize content delivery, analyze viewer engagement, personalize user experiences, and support recommendation systems.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Why should companies hire data engineers?
&lt;/h2&gt;

&lt;p&gt;Companies and institutions certainly need data engineers, as they help enable businesses to have a strategic advantage as well as utilize the full potential of their data in the following ways:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Data security. With data being a valuable asset and in an age where data breaches and privacy concerns are dominant, a data engineer will implement secure and reliable data systems, ensuring data safety and accessibility.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Cost saving. Data engineers help reduce costs by providing solutions tailored to the business needs. By optimizing data storage and processing, they also help reduce cost and improve resource utilization.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Scalability. Companies increasingly face large data volumes, and data engineers help ensure the data infrastructure remains flexible with the business needs. Scalability is especially crucial for businesses facing high growth and needing data solutions to keep up with the growth.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Improved analytics and insights. Data engineers create well-structured data pipelines and storage solutions that help data scientists and analysts access and analyze data easily, leading to quicker insights and timely decision-making.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Data quality: High data quality is crucial since inaccuracies can easily lead to flawed analysis and misguided business decisions. Data engineers will validate and cleanse the data and monitor systems to identify and fix irregularities in the datasets.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Data engineers are beyond technical specialists; they are business enablers, as their expertise in data plays a crucial role. By ensuring the required data is available at the right time and in the right format, they help institutions move faster, make better decisions, and remain competitive.&lt;/p&gt;

&lt;p&gt;Data volumes will undoubtedly continue to grow, and so will advancements in technologies like AI, cloud computing, and big data, which further fuels the need for data engineers. It is the companies that invest in building strong data engineering teams that will be positioned for long-term success.&lt;/p&gt;

</description>
      <category>dataengineering</category>
      <category>programming</category>
      <category>database</category>
      <category>backend</category>
    </item>
  </channel>
</rss>
