<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: maureen chepkirui</title>
    <description>The latest articles on Forem by maureen chepkirui (@maureen_chepkirui_03c48a2).</description>
    <link>https://forem.com/maureen_chepkirui_03c48a2</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3723983%2F7300bc2c-0e57-4227-ae80-d023f33cf053.png</url>
      <title>Forem: maureen chepkirui</title>
      <link>https://forem.com/maureen_chepkirui_03c48a2</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/maureen_chepkirui_03c48a2"/>
    <language>en</language>
    <item>
      <title>Building an Automated Data Pipeline</title>
      <dc:creator>maureen chepkirui</dc:creator>
      <pubDate>Wed, 21 Jan 2026 15:19:46 +0000</pubDate>
      <link>https://forem.com/maureen_chepkirui_03c48a2/building-an-automated-data-pipeline-o77</link>
      <guid>https://forem.com/maureen_chepkirui_03c48a2/building-an-automated-data-pipeline-o77</guid>
      <description>&lt;h1&gt;
  
  
  Building an Automated Data Pipeline: From GA4 to Amazon Redshift
&lt;/h1&gt;

&lt;p&gt;In my current role as a Data Engineer, I realized that data is only as good as its availability. Moving data from Google Analytics (GA4) into a format that a business can actually use for strategy is a common challenge. &lt;/p&gt;

&lt;p&gt;Here is how I solved this using an AWS-native architecture.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Architecture
&lt;/h2&gt;

&lt;p&gt;The goal was to create a "Single Source of Truth." I designed a pipeline that moves data from the edge into a centralized warehouse.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 1: Extraction with Python
&lt;/h3&gt;

&lt;p&gt;I use Python scripts to interact with the GA4 API. This allows us to pull specific dimensions and metrics that are relevant to our business KPIs.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2: The Landing Zone (Amazon S3)
&lt;/h3&gt;

&lt;p&gt;Raw data shouldn't go straight into a database. I load the raw JSON/CSV files into &lt;strong&gt;Amazon S3&lt;/strong&gt; first. &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Why S3?&lt;/strong&gt; It acts as a durable, low-cost "Data Lake." If something goes wrong in the later stages, we always have our raw data safely stored in S3.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Step 3: The Warehouse (Amazon Redshift)
&lt;/h3&gt;

&lt;p&gt;From S3, I use the &lt;code&gt;COPY&lt;/code&gt; command to ingest data into &lt;strong&gt;Amazon Redshift&lt;/strong&gt;. &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Optimization:&lt;/strong&gt; I focus on ETL optimization to ensure data accuracy. This process has helped us achieve 98% data accuracy while reducing errors by 35%.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Business Impact
&lt;/h2&gt;

&lt;p&gt;By leveraging AWS, we transformed our reporting process:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Manual reporting time was cut by 50%.&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Data availability increased by 40%.&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Executives now access real-time insights&lt;/strong&gt; through Apache Superset dashboards.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Moving from physical networking into cloud data engineering has taught me that &lt;strong&gt;automation is the key to scalability.&lt;/strong&gt; If you are just starting with AWS, mastering S3 and Redshift is a fantastic way to understand how the cloud handles massive amounts of information.&lt;/p&gt;

</description>
      <category>aws</category>
      <category>dataengineering</category>
      <category>python</category>
      <category>cloud</category>
    </item>
    <item>
      <title>From Splicing Fibers to Scaling Clouds: My Journey to the AWS Community</title>
      <dc:creator>maureen chepkirui</dc:creator>
      <pubDate>Wed, 21 Jan 2026 15:06:39 +0000</pubDate>
      <link>https://forem.com/maureen_chepkirui_03c48a2/from-splicing-fibers-to-scaling-clouds-my-journey-to-the-aws-community-799</link>
      <guid>https://forem.com/maureen_chepkirui_03c48a2/from-splicing-fibers-to-scaling-clouds-my-journey-to-the-aws-community-799</guid>
      <description>&lt;h1&gt;
  
  
  From Fiber Splicing to Data Pipelines: Why I’m Taking My "Layer 1" Skills to the AWS Cloud
&lt;/h1&gt;

&lt;p&gt;For many developers, "The Cloud" is an abstract concept, a place where servers exist in a digital vacuum. But my journey started somewhere very different. It started in the trenches of the &lt;strong&gt;Physical Layer.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Before I was a Data Engineer, I was working with the "plumbing" of the internet: &lt;strong&gt;Fiber Optics.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Power of the Physical Layer
&lt;/h2&gt;

&lt;p&gt;I spent my early career mastering OTDR (Optical Time-Domain Reflectometer) testing, power meter diagnostics, and the delicate art of cable splicing. I’ve held the physical strands of glass that carry the world’s data in my hands. &lt;/p&gt;

&lt;p&gt;In the world of Fiber, you learn a hard truth: &lt;strong&gt;If the physical connection isn't perfect, the most sophisticated software in the world won't matter.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Pivot: From the "Highway" to the "Traffic"
&lt;/h2&gt;

&lt;p&gt;While I loved building the "highways" (the fiber networks), I became fascinated by the "traffic" (the data) flowing through them. This curiosity led me to &lt;strong&gt;Data Engineering&lt;/strong&gt;, where I now work on the higher layers of the stack.&lt;/p&gt;

&lt;p&gt;Today, instead of splicing cables, I am building end-to-end data pipelines. My current workflow involves:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Extraction:&lt;/strong&gt; Pulling data from sources like GA4 using Python.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Storage:&lt;/strong&gt; Managing raw data in &lt;strong&gt;Amazon S3&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Processing:&lt;/strong&gt; Ingesting and optimizing data into &lt;strong&gt;Amazon Redshift&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;By implementing these AWS-driven pipelines, I’ve been able to improve data availability by &lt;strong&gt;40%&lt;/strong&gt; and reduce processing time by &lt;strong&gt;30%&lt;/strong&gt;. &lt;/p&gt;

&lt;h2&gt;
  
  
  Why I’m Joining the AWS Community
&lt;/h2&gt;

&lt;p&gt;I am applying to be an &lt;strong&gt;AWS Community Builder&lt;/strong&gt; because I believe the best engineers are those who understand the full stack, from the light pulses in a fiber cable to the SQL queries in a data warehouse.&lt;/p&gt;

&lt;p&gt;As a builder in Kenya, I want to show that:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;Hardware skills are a superpower:&lt;/strong&gt; My background in network diagnostics helps me understand cloud latency and infrastructure in a way that pure software developers might miss.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Learning in Public is key:&lt;/strong&gt; I want to document how I use AWS tools to solve real-world data problems, helping other hardware engineers bridge the gap into the cloud.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The cloud is just someone else's computer, but that computer is still connected by fiber. I’m excited to keep building on both!&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Connect with me on &lt;a href="https://linkedin.com/in/maureen-chepkirui-5977ba262" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>aws</category>
      <category>cloudcomputing</category>
      <category>dataengineering</category>
      <category>fiberoptics</category>
    </item>
  </channel>
</rss>
