Building an Automated Data Pipeline

maureen chepkirui — Wed, 21 Jan 2026 15:19:46 +0000

Building an Automated Data Pipeline: From GA4 to Amazon Redshift

In my current role as a Data Engineer, I realized that data is only as good as its availability. Moving data from Google Analytics (GA4) into a format that a business can actually use for strategy is a common challenge.

Here is how I solved this using an AWS-native architecture.

The Architecture

The goal was to create a "Single Source of Truth." I designed a pipeline that moves data from the edge into a centralized warehouse.

Step 1: Extraction with Python

I use Python scripts to interact with the GA4 API. This allows us to pull specific dimensions and metrics that are relevant to our business KPIs.

Step 2: The Landing Zone (Amazon S3)

Raw data shouldn't go straight into a database. I load the raw JSON/CSV files into Amazon S3 first.

Why S3? It acts as a durable, low-cost "Data Lake." If something goes wrong in the later stages, we always have our raw data safely stored in S3.

Step 3: The Warehouse (Amazon Redshift)

From S3, I use the COPY command to ingest data into Amazon Redshift.

Optimization: I focus on ETL optimization to ensure data accuracy. This process has helped us achieve 98% data accuracy while reducing errors by 35%.

Business Impact

By leveraging AWS, we transformed our reporting process:

Manual reporting time was cut by 50%.
Data availability increased by 40%.
Executives now access real-time insights through Apache Superset dashboards.

Conclusion

Moving from physical networking into cloud data engineering has taught me that automation is the key to scalability. If you are just starting with AWS, mastering S3 and Redshift is a fantastic way to understand how the cloud handles massive amounts of information.

From Splicing Fibers to Scaling Clouds: My Journey to the AWS Community

maureen chepkirui — Wed, 21 Jan 2026 15:06:39 +0000

From Fiber Splicing to Data Pipelines: Why I’m Taking My "Layer 1" Skills to the AWS Cloud

For many developers, "The Cloud" is an abstract concept, a place where servers exist in a digital vacuum. But my journey started somewhere very different. It started in the trenches of the Physical Layer.

Before I was a Data Engineer, I was working with the "plumbing" of the internet: Fiber Optics.

The Power of the Physical Layer

I spent my early career mastering OTDR (Optical Time-Domain Reflectometer) testing, power meter diagnostics, and the delicate art of cable splicing. I’ve held the physical strands of glass that carry the world’s data in my hands.

In the world of Fiber, you learn a hard truth: If the physical connection isn't perfect, the most sophisticated software in the world won't matter.

The Pivot: From the "Highway" to the "Traffic"

While I loved building the "highways" (the fiber networks), I became fascinated by the "traffic" (the data) flowing through them. This curiosity led me to Data Engineering, where I now work on the higher layers of the stack.

Today, instead of splicing cables, I am building end-to-end data pipelines. My current workflow involves:

Extraction: Pulling data from sources like GA4 using Python.
Storage: Managing raw data in Amazon S3.
Processing: Ingesting and optimizing data into Amazon Redshift.

By implementing these AWS-driven pipelines, I’ve been able to improve data availability by 40% and reduce processing time by 30%.

Why I’m Joining the AWS Community

I am applying to be an AWS Community Builder because I believe the best engineers are those who understand the full stack, from the light pulses in a fiber cable to the SQL queries in a data warehouse.

As a builder in Kenya, I want to show that:

Hardware skills are a superpower: My background in network diagnostics helps me understand cloud latency and infrastructure in a way that pure software developers might miss.
Learning in Public is key: I want to document how I use AWS tools to solve real-world data problems, helping other hardware engineers bridge the gap into the cloud.

The cloud is just someone else's computer, but that computer is still connected by fiber. I’m excited to keep building on both!

Connect with me on LinkedIn

Forem: maureen chepkirui