<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: Rotich Kelly</title>
    <description>The latest articles on Forem by Rotich Kelly (@kelly_10).</description>
    <link>https://forem.com/kelly_10</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3001130%2F04c11377-5215-47a0-9f93-5d01db4e0002.jpg</url>
      <title>Forem: Rotich Kelly</title>
      <link>https://forem.com/kelly_10</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/kelly_10"/>
    <language>en</language>
    <item>
      <title>From Binance to Grafana: Building a Real-Time Crypto Dashboard with CDC &amp; Cassandra</title>
      <dc:creator>Rotich Kelly</dc:creator>
      <pubDate>Thu, 03 Jul 2025 13:10:11 +0000</pubDate>
      <link>https://forem.com/kelly_10/from-binance-to-grafana-building-a-real-time-crypto-dashboard-with-cdc-cassandra-4c26</link>
      <guid>https://forem.com/kelly_10/from-binance-to-grafana-building-a-real-time-crypto-dashboard-with-cdc-cassandra-4c26</guid>
      <description>&lt;p&gt;&lt;strong&gt;Introduction&lt;/strong&gt;&lt;br&gt;
In this project, I set out to solve a common challenge in crypto analytics: how do you reliably stream, store, and visualize fast-moving market data in real time?&lt;/p&gt;

&lt;p&gt;To tackle this, I built a production-grade real-time data pipeline that extracts data from the Binance API, loads it into a relational database like PostgreSQL, replicates updates into Cassandra using Change Data Capture (CDC) via Debezium, and finally visualizes key metrics using Grafana.&lt;/p&gt;

&lt;p&gt;The goal? Create a seamless, scalable pipeline capable of tracking market trends, spotting top-performing tokens, and powering real-time dashboardswith zero manual refreshes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why This Project Matters&lt;/strong&gt;&lt;br&gt;
Cryptocurrency markets operate 24/7 with volatile price action and massive data velocity. Traditional batch pipelines or manual scripts simply can’t keep up. I wanted to build something that could:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Continuously ingest price, volume, and trade data from Binance&lt;/li&gt;
&lt;li&gt;Store data in a queryable format for structured analytics&lt;/li&gt;
&lt;li&gt;Replicate changes across systems without breaking consistency&lt;/li&gt;
&lt;li&gt;Enable live, real-time dashboards to support fast decisions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This documentation walks through each phase, from API extraction to CDC to visualization and shares lessons learned along the way.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Data Extraction&lt;/strong&gt;&lt;br&gt;
Here are the five endpoints I used for this project, each chosen to cover a critical slice of the crypto market landscape:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Latest Prices – /api/v3/ticker/price&lt;/strong&gt;&lt;br&gt;
This endpoint returns the latest market price for every trading pair. It’s lightweight and perfect for fast polling.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;24h Ticker Stats – /api/v3/ticker/24hr&lt;/strong&gt;&lt;br&gt;
Provides daily stats like price change %, volume, and high/low prices, essential for tracking the top gainers and market trends.&lt;br&gt;
fields:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;priceChangePercent&lt;/li&gt;
&lt;li&gt;highPrice&lt;/li&gt;
&lt;li&gt;volume&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Order Book – /api/v3/depth&lt;/strong&gt;&lt;br&gt;
Returns bid/ask levels for a given trading pair, offering insights into market depth and liquidity. Useful for advanced analytics like order flow tracking.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Recent Trades – /api/v3/trades&lt;/strong&gt;&lt;br&gt;
Gives a rolling list of individual trade executions, including price, quantity, and timestamps — key for volatility and momentum indicators.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Klines (Candlesticks) – /api/v3/klines&lt;/strong&gt;&lt;br&gt;
Returns historical OHLCV candlestick data for any symbol over a chosen interval (e.g., 1m, 5m, 1h). This powers most visual time-series analysis in Grafana.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Binance Extraction Script Workflow&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Send API request to a specific Binance endpoint (e.g., /ticker/price).&lt;/li&gt;
&lt;li&gt;Parse JSON response and extract key fields like symbol, price, volume, timestamp.&lt;/li&gt;
&lt;li&gt;Format and clean data (add timestamps, remove duplicates).&lt;/li&gt;
&lt;li&gt;Insert into PostgreSQL staging tables with proper indexing.&lt;/li&gt;
&lt;li&gt;Log activity and handle API failures with retries.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Clone the Project Repository&lt;/strong&gt;&lt;br&gt;
We start by cloning the CDC project files including table schemas, Debezium config, and consumer scripts:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;git clone https://github.com/yourusername/crypto-cdc-pipeline.git
cd crypto-cdc-pipeline
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Install Java, Kafka, and Zookeeper&lt;/strong&gt;&lt;br&gt;
Kafka and Debezium require Java, so we install OpenJDK:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;sudo apt update
sudo apt install -y default-jdk
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Next, download and extract Kafka:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;wget https://downloads.apache.org/kafka/3.6.0/kafka_2.13-3.6.0.tgz
tar -xzf kafka_2.13-3.6.0.tgz
mv kafka_2.13-3.6.0 kafka &amp;amp;&amp;amp; cd kafka
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Start Zookeeper and Kafka in two separate terminal windows or tmux panes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Terminal 1
bin/zookeeper-server-start.sh config/zookeeper.properties

# Terminal 2
bin/kafka-server-start.sh config/server.properties
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Install &amp;amp; Configure Debezium Kafka Connect&lt;/strong&gt;&lt;br&gt;
Debezium captures Postgres changes via Kafka Connect. We install it like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;cd ~/kafka
mkdir plugins &amp;amp;&amp;amp; cd plugins
wget https://repo1.maven.org/maven2/io/debezium/debezium-connector-postgres/2.5.1.Final/debezium-connector-postgres-2.5.1.Final-plugin.tar.gz
tar -xvf debezium-connector-postgres-2.5.1.Final-plugin.tar.gz
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Update Kafka Connect config to load this plugin:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;nano ../config/connect-distributed.properties

# Add or edit:
plugin.path=/home/youruser/kafka/plugins
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Start Kafka Connect:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;cd ~/kafka
bin/connect-distributed.sh config/connect-distributed.properties
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Create Required Internal Kafka Topics&lt;/strong&gt;&lt;br&gt;
These are used by Kafka Connect to manage connector state:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;bin/kafka-topics.sh --bootstrap-server localhost:9092 --create --topic connect-configs --partitions 1 --replication-factor 1 --config cleanup.policy=compact

bin/kafka-topics.sh --bootstrap-server localhost:9092 --create --topic connect-offsets --partitions 1 --replication-factor 1 --config cleanup.policy=compact

bin/kafka-topics.sh --bootstrap-server localhost:9092 --create --topic connect-status --partitions 1 --replication-factor 1 --config cleanup.policy=compact
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Prepare PostgreSQL for CDC&lt;/strong&gt;&lt;br&gt;
In Dbeaver, log in to your database and create a publication:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;CREATE PUBLICATION my_publication FOR TABLE binance.binance_latest_prices, binance.binance_24h_stats, binance.binance_klines, binance.binance_order_book, binance.binance_recent_trades;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Make sure your user has replication rights.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Register the Debezium Connector&lt;/strong&gt;&lt;br&gt;
Create the config file postgres-connector.json:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;{
  "name": "postgres-connector",
  "config": {
    "connector.class": "io.debezium.connector.postgresql.PostgresConnector",
    "database.hostname": "&amp;lt;aiven-host&amp;gt;",
    "database.port": "25060",
    "database.user": "&amp;lt;username&amp;gt;",
    "database.password": "&amp;lt;password&amp;gt;",
    "database.dbname": "defaultdb",
    "topic.prefix": "crypto_pg",
    "plugin.name": "pgoutput",
    "publication.name": "my_publication",
    "slot.name": "cdc_slot",
    "table.include.list": "binance.binance_latest_prices,binance.binance_24h_stats,binance.binance_klines,binance.binance_order_book,binance.binance_recent_trades",
    "database.sslmode": "require"
  }
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Register the connector:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;curl -X POST http://localhost:8083/connectors \
  -H "Content-Type: application/json" \
  --data @postgres-connector.json
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Check status:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;curl http://localhost:8083/connectors/postgres-connector/status | jq
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Install Cassandra&lt;/strong&gt;&lt;br&gt;
Cassandra doesn’t support Python 3.12+. You must use Python 3.10 or 3.11.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Check your messages in the topic&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;bin/kafka-console-consumer.sh \
  --bootstrap-server localhost:9092 \
  --topic &amp;lt;your-topic-name&amp;gt; \
  --from-beginning
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Install Cassandra:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;echo "deb https://debian.cassandra.apache.org 41x main" | sudo tee /etc/apt/sources.list.d/cassandra.sources.list
curl https://downloads.apache.org/cassandra/KEYS | sudo apt-key add -
sudo apt update
sudo apt install cassandra
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Start the Cassandra service:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;sudo systemctl start cassandra
sudo systemctl status cassandra
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Create Cassandra Keyspace &amp;amp; Tables&lt;/strong&gt;&lt;br&gt;
Enter the CQL shell:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;cqlsh
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Create a keyspace:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;CREATE KEYSPACE crypto_data WITH REPLICATION = {'class': 'SimpleStrategy', 'replication_factor': 1};
USE crypto_data;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Create each table one by one(cassandra schema in the repo):&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Create Python Consumers for CDC&lt;/strong&gt;&lt;br&gt;
Each script listens to a Kafka topic and writes to a Cassandra table.&lt;br&gt;
Run each script:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;python consumer_latest_prices.py
python consumer_24h_stats.py
python consumer_klines.py
python consumer_order_book.py
python consumer_recent_trades.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Connect Cassandra to Grafana&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;grafana-cli plugins install hadesarchitect-cassandra-datasource
systemctl restart grafana-server
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Add datasource:&lt;/strong&gt; In Grafana UI, go to Configuration → Data Sources and add “Apache Cassandra” (community).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Configure:&lt;/strong&gt; Set Contact point to hostname:9042 (e.g. localhost:9042 if on the same VM), fill in Keyspace, User, and Password&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Test &amp;amp; save:&lt;/strong&gt; Click Save &amp;amp; Test. Once “Database Connection OK” appears, you can create dashboards querying Cassandra.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Grafana Dashboard&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1e5zfpo1ou0odjqdwzur.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1e5zfpo1ou0odjqdwzur.png" alt="Image description" width="800" height="449"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F636cea3231fmcpi5cnx0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F636cea3231fmcpi5cnx0.png" alt="Image description" width="800" height="449"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Check out the repo:&lt;br&gt;
&lt;a href="https://github.com/KellyKiprop/Binance-CDC-Project" rel="noopener noreferrer"&gt;https://github.com/KellyKiprop/Binance-CDC-Project&lt;/a&gt;&lt;/p&gt;

</description>
    </item>
    <item>
      <title>“Who Will Win Formula 1 in 2025? I Asked the Data.”</title>
      <dc:creator>Rotich Kelly</dc:creator>
      <pubDate>Mon, 16 Jun 2025 14:02:44 +0000</pubDate>
      <link>https://forem.com/kelly_10/who-wins-in-formula-1-2025-i-asked-the-data-3jnl</link>
      <guid>https://forem.com/kelly_10/who-wins-in-formula-1-2025-i-asked-the-data-3jnl</guid>
      <description>&lt;p&gt;&lt;em&gt;(Spoiler: It’s Not Just About Speed, It’s About Python and Prediction Power.)&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;What do you get when you cross a Formula 1 obsession with a machine learning brain? A Saturday spent not watching races, but building models to predict the next F1 world champion. And yes, I let the data do the talking, because who needs crystal balls when you’ve got RandomForestRegressor and Linear Regression models?&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;From Track to Tech: The Data Science Pitstop&lt;/strong&gt;&lt;br&gt;
Every great F1 race begins in the garage, and so did mine, with historical data on drivers, races, and standings. I engineered features like:&lt;br&gt;
• Age &amp;amp; Experience (because wisdom can beat raw speed)&lt;br&gt;
• Wins per Season (obviously)&lt;br&gt;
• Team Affiliation, color-coded for storytelling flair &lt;br&gt;
It wasn’t just about crunching numbers. I wanted to tell a story, one where the data reveals the drama.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;The Models: Linear vs Random Forest&lt;/strong&gt;&lt;br&gt;
I started simple: a Linear Regression model. It was decent, but something was missing, it just couldn’t capture the twists and turns of an F1 season.&lt;br&gt;
Then I unleashed the big gun: Random Forest Regressor &lt;br&gt;
Model   RMSE    R² Score&lt;br&gt;
Linear Regression   ~1499   0.28&lt;br&gt;
Random Forest   ~1284   0.47&lt;br&gt;
The improvement? Like going from a pit stop in 6 seconds to a clean 2.1. &lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Predicting the 2025 Grid: Who Tops the Podium?&lt;/strong&gt;&lt;br&gt;
After training and testing, here’s what my model predicted for 2025:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Max Verstappen – The king stays king.&lt;/li&gt;
&lt;li&gt;Lando Norris – The new kid has talent.&lt;/li&gt;
&lt;li&gt;Charles Leclerc – The Ferrari fire is still burning.
And here's the fun part, visualized with team colors, so you don’t just see the points, you feel the rivalry.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1c47czi5k964ic3cwpnl.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1c47czi5k964ic3cwpnl.png" alt="Formula1 team colors" width="800" height="513"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Data is the New Commentator&lt;/strong&gt;&lt;br&gt;
I didn’t just want to model the data, I wanted to speak its language. With visualizations like:&lt;br&gt;
• Driver performance over years&lt;br&gt;
• Age vs points (yes, experience matters)&lt;br&gt;
• Team-colored bar charts of predicted standings&lt;br&gt;
…it became less about predictions and more about painting the future of Formula 1.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;What’s Next?&lt;/strong&gt;&lt;br&gt;
This is just Lap 1. I’m exploring:&lt;br&gt;
• Real-time dashboards&lt;br&gt;
• Weather &amp;amp; track condition variables&lt;br&gt;
• Constructor vs Driver impact modeling&lt;/p&gt;

&lt;p&gt;Still Building!!&lt;br&gt;
Check it out -&amp;gt; &lt;a href="https://github.com/KellyKiprop/Formula1_2025_Prediction" rel="noopener noreferrer"&gt;https://github.com/KellyKiprop/Formula1_2025_Prediction&lt;/a&gt;&lt;/p&gt;

</description>
    </item>
    <item>
      <title>From Podcasts to Pipelines: Building a YouTube Analytics Engine Inspired by Mic Cheque</title>
      <dc:creator>Rotich Kelly</dc:creator>
      <pubDate>Sat, 07 Jun 2025 20:01:55 +0000</pubDate>
      <link>https://forem.com/kelly_10/from-podcasts-to-pipelines-building-a-youtube-analytics-engine-inspired-by-mic-cheque-2a79</link>
      <guid>https://forem.com/kelly_10/from-podcasts-to-pipelines-building-a-youtube-analytics-engine-inspired-by-mic-cheque-2a79</guid>
      <description>&lt;p&gt;&lt;em&gt;"Now when you watch it you'll understand!"&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Ever been so hooked on a podcast that you ended up building a full-blown data pipeline because of it? No? Just me? Cool. Let me tell you the story anyway.&lt;/p&gt;

&lt;p&gt;It all started with the Mic Cheque Podcast—a brilliant blend of humor, deep takes, and real talk that kept popping up on my YouTube feed. As a data engineering enthusiast and a fan of the pod, I had one question buzzing in my head:&lt;/p&gt;

&lt;p&gt;&lt;em&gt;What makes some podcast episodes go viral while others stay in the shadows?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Idea&lt;/strong&gt;&lt;br&gt;
What if I could track, analyze, and visualize the performance of the podcast episodes using actual YouTube data?&lt;br&gt;
Boom—project idea locked. I decided to build a fully automated YouTube Data Pipeline with the goal of creating a live analytics dashboard to answer burning questions like:&lt;/p&gt;

&lt;p&gt;Which episodes are going viral?&lt;/p&gt;

&lt;p&gt;What days do high-performing episodes drop?&lt;/p&gt;

&lt;p&gt;Is there a pattern between guest appearances and views?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Stack&lt;/strong&gt;&lt;br&gt;
To make it real, I pulled out the big guns:&lt;/p&gt;

&lt;p&gt;Python + Airflow: For automating the entire pipeline from extract to load&lt;/p&gt;

&lt;p&gt;YouTube API: For fetching episode metadata and stats&lt;/p&gt;

&lt;p&gt;PostgreSQL (Aiven): As the data warehouse&lt;/p&gt;

&lt;p&gt;Apache Spark: For heavy lifting (a.k.a. transforming the raw data)&lt;/p&gt;

&lt;p&gt;Grafana: For visualizing performance trends that even a podcast guest would appreciate&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Flow&lt;/strong&gt;&lt;br&gt;
Here’s what the pipeline does:&lt;/p&gt;

&lt;p&gt;Extract data from YouTube using the YouTube Data API&lt;/p&gt;

&lt;p&gt;Transform it using Spark (cleaning, enriching with time-based insights, classifying performance)&lt;/p&gt;

&lt;p&gt;Load it into a PostgreSQL instance hosted on Aiven&lt;/p&gt;

&lt;p&gt;Visualize the trends using Grafana—complete with charts showing view counts, likes, comments, publishing patterns, and a "performance class" metric&lt;/p&gt;

&lt;p&gt;And the best part? All this runs automatically thanks to Airflow.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Result&lt;/strong&gt;&lt;br&gt;
After plugging it into Grafana, the dashboard popped! Pie charts for performance class, time series of views by month, and even weekday publishing trends. It's like giving a brain to your favorite podcast channel.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What I Learned&lt;/strong&gt;&lt;br&gt;
Airflow can be your best friend or your worst enemy (don’t fight it—configure it properly!)&lt;/p&gt;

&lt;p&gt;PostgreSQL on Aiven is smooth, but Grafana’s port configs can mess you up if you're not careful&lt;/p&gt;

&lt;p&gt;Data pipelines are lit when they bring your passions and skills together&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Wrap-Up&lt;/strong&gt;&lt;br&gt;
This started as a fun side project but turned into a powerful learning experience. Whether you're a Mic Cheque stan, a data nerd, or both—I'd highly recommend building a pipeline around something you genuinely love.&lt;/p&gt;

&lt;p&gt;Your data has a story. You just need to build the mic for it.&lt;/p&gt;

&lt;p&gt;Want to see the code? Check out the GitHub Repo &lt;br&gt;
&lt;a href="https://github.com/KellyKiprop/Youtube-Data-Pipeline" rel="noopener noreferrer"&gt;https://github.com/KellyKiprop/Youtube-Data-Pipeline&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fewyz284mbo6kz66rrnno.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fewyz284mbo6kz66rrnno.png" alt="Image description" width="800" height="282"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fojqinwzlj021xtun03kh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fojqinwzlj021xtun03kh.png" alt="Image description" width="800" height="265"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
    </item>
    <item>
      <title>How I Automated Crypto Price Tracking with Apache Airflow &amp; CoinGecko</title>
      <dc:creator>Rotich Kelly</dc:creator>
      <pubDate>Sat, 07 Jun 2025 13:20:01 +0000</pubDate>
      <link>https://forem.com/kelly_10/how-i-automated-crypto-price-tracking-with-apache-airflow-coingecko-28ab</link>
      <guid>https://forem.com/kelly_10/how-i-automated-crypto-price-tracking-with-apache-airflow-coingecko-28ab</guid>
      <description>&lt;p&gt;&lt;em&gt;"Because waking up at 2am to check Bitcoin prices isn’t scalable."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Idea Behind the Project&lt;/strong&gt;&lt;br&gt;
Crypto markets never sleep, and neither should your data pipeline. I built a fully automated ETL pipeline using Apache Airflow that extracts hourly snapshots of crypto data from CoinGecko, stores them in a PostgreSQL database (on Aiven), and sets the stage for real-time analysis, dashboards, and trading models.&lt;/p&gt;

&lt;p&gt;Whether you're tracking Bitcoin moonshots or studying volume dips in altcoins, this pipeline's got your back.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What It Actually Does&lt;/strong&gt;&lt;br&gt;
Every hour, my pipeline:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Pulls real-time data from the CoinGecko API for 15 top &lt;br&gt;
cryptocurrencies (BTC, ETH, SOL, etc.) &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Captures price, market cap, and trading volume&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Stores the data in a time-series friendly PostgreSQL table: crypto.crypto_prices&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Retries automatically on failure and logs each step in Airflow’s UI &lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;No manual refresh. No scripts to rerun. Just clean, structured data—hour after hour.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Stack Breakdown&lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;Tools&lt;/strong&gt;&lt;br&gt;
Apache Airflow         DAG orchestration + scheduling&lt;br&gt;
Python 3.12        Core logic and scripting&lt;br&gt;
CoinGecko API          Crypto market data&lt;br&gt;
PostgreSQL (Aiven)     Cloud-hosted database storage&lt;br&gt;
psycopg2           PostgreSQL database connector&lt;br&gt;
python-dotenv          Secure secret management via .env&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How I Built It&lt;/strong&gt;&lt;br&gt;
This project lives in my GitHub repo, and it’s fully reproducible:&lt;/p&gt;

&lt;p&gt;Clone the repo&lt;br&gt;
git clone &lt;a href="https://github.com/KellyKiprop/Crypto-price-pipeline.git" rel="noopener noreferrer"&gt;https://github.com/KellyKiprop/Crypto-price-pipeline.git&lt;/a&gt;&lt;br&gt;
cd crypto-etl-pipeline&lt;br&gt;
Create a virtual environment &amp;amp; install dependencies&lt;/p&gt;

&lt;p&gt;&lt;code&gt;&lt;br&gt;
python -m venv venv&lt;br&gt;
source venv/bin/activate  # Or venv\Scripts\activate on Windows&lt;br&gt;
pip install -r requirements.txt&lt;br&gt;
&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Set up environment variables&lt;br&gt;
Create a .env file in the root folder with your PostgreSQL config:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;&lt;br&gt;
DB_NAME=defaultdb&lt;br&gt;
DB_HOST=your-db-host.aivencloud.com&lt;br&gt;
DB_USER=avnadmin&lt;br&gt;
DB_PASSWORD=yourpassword&lt;br&gt;
DB_PORT=17440&lt;br&gt;
&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Initialize and start Airflow&lt;br&gt;
&lt;code&gt;&lt;br&gt;
airflow db init&lt;br&gt;
airflow webserver --port 8080&lt;br&gt;
airflow scheduler&lt;br&gt;
&lt;/code&gt;&lt;br&gt;
Trigger the DAG from the UI&lt;br&gt;
Visit localhost:8080, toggle coin_price_etl_dag, and trigger a run.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Testing the Results&lt;/strong&gt;&lt;br&gt;
Check the crypto.crypto_prices table in your database — you should see hourly updates like:&lt;/p&gt;

&lt;p&gt;name    symbol  price   market_cap  total_volume    timestamp&lt;br&gt;
Bitcoin BTC 71,231  ... ... 2025-06-07T12:00:00Z&lt;/p&gt;

&lt;p&gt;You can now plug this into:&lt;/p&gt;

&lt;p&gt;Grafana dashboards &lt;/p&gt;

&lt;p&gt;ML models for price prediction &lt;/p&gt;

&lt;p&gt;Long-term trend analysis &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why This Matters&lt;/strong&gt;&lt;br&gt;
It’s a hands-on intro to Airflow and ETL pipelines&lt;/p&gt;

&lt;p&gt;Crypto data is noisy and dynamic — perfect for learning real-time data workflows&lt;/p&gt;

&lt;p&gt;You’ll walk away with a reusable pattern for any public API + DB project&lt;/p&gt;

&lt;p&gt;About the Author&lt;br&gt;
Kelly Kiprop&lt;br&gt;
&lt;a href="mailto:kipropkelly4@gmail.com"&gt;kipropkelly4@gmail.com&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This project uses Airflow to extract hourly crypto data from CoinGecko and store it in PostgreSQL. It’s cloud-ready, beginner-friendly, and built for real-time insight.&lt;/p&gt;

</description>
      <category>dataengineering</category>
      <category>data</category>
      <category>cryptocurrency</category>
    </item>
    <item>
      <title>Starting my Data Analyst Journey at LuxDev bootcamp</title>
      <dc:creator>Rotich Kelly</dc:creator>
      <pubDate>Sun, 13 Apr 2025 13:46:38 +0000</pubDate>
      <link>https://forem.com/kelly_10/starting-my-data-analyst-journey-1100</link>
      <guid>https://forem.com/kelly_10/starting-my-data-analyst-journey-1100</guid>
      <description>&lt;p&gt;I’m super excited to share that I’ve officially kicked off my journey into the world of Data Analytics .Only two weeks ago, I enrolled in the Lux Dev Bootcamp, and it has been a enriching experience so far its been an enriching ride. I'm currently deep into by week 2, with a focus on solidifying the foundations of Excel and SQL the pillars of any strong data analysis workflow.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why data analysis?&lt;/strong&gt;&lt;br&gt;
Data surrounds us, and I've always asked myself how uncooked data can tell such strong stories. Is it finding insights, making more informed decisions, or just addressing real issues—data is the key to them all. That's what I was drawn to it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What I'm Learning Right Now&lt;/strong&gt;&lt;br&gt;
Excel: From pivot tables to formulas, I'm learning to manipulate and analyze data like a pro.&lt;br&gt;
SQL: I’m building up my querying skills—getting comfortable with SELECT statements, filtering data, and joining tables to extract meaningful insights.&lt;br&gt;
Power BI: I’ve started exploring how to bring data to life through interactive dashboards and visual storytelling—it's a game changer!&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Putting Skills Into Practice&lt;/strong&gt;&lt;br&gt;
Right now, I’m working on a sales data project, a fast food shop selling pizza to be specific,  which is helping me apply what I’ve learned in a real-world context. I’m analyzing trends, identifying key metrics, and working my way toward building my very first dashboard!&lt;/p&gt;

&lt;p&gt;It's exhilarating , but there's nothing quite like seeing raw data come to life visually. I can't wait to post the finished product soon!&lt;/p&gt;

&lt;p&gt;What I'm Finding Enjoyable about the Bootcamp is the hands-on learning not theory only—we're doing hands-on exercises and real case studies. Additionally, we're all in the same situation, and there's just such encouragement and collaboration. Furthermore, each week is an improvement on the last one, and I can already feel my confidence growing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;My Goals&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Mastering Excel, SQL, and data visualization tools&lt;/li&gt;
&lt;li&gt;Completing more real-world projects and case studies&lt;/li&gt;
&lt;li&gt;Building a good data portfolio &lt;/li&gt;
&lt;li&gt;Landing my first job as a junior data analyst&lt;/li&gt;
&lt;li&gt;Stay curious, continue learning, and enjoy the ride with this amazing tech community&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Let's make data fun!&lt;/p&gt;

&lt;h1&gt;
  
  
  DataAnalytics #SQL #Excel #PowerBI #DataDashboard #SalesProject #LuxDev #BootcampJourney #CareerSwitch #DevCommunity
&lt;/h1&gt;

</description>
    </item>
  </channel>
</rss>
