Forem

# dataengineering

Posts

👋 Sign in for the ability to sort posts by relevant, latest, or top.
🔥 Day 4: RDD Internals - Partitions, Shuffles & Repartitioning Demystified
Cover image for 🔥 Day 4: RDD Internals - Partitions, Shuffles & Repartitioning Demystified

🔥 Day 4: RDD Internals - Partitions, Shuffles & Repartitioning Demystified

Comments
2 min read
The Developer's Guide to Normalizing Historical Airline Flight Data for Machine Learning
Cover image for The Developer's Guide to Normalizing Historical Airline Flight Data for Machine Learning

The Developer's Guide to Normalizing Historical Airline Flight Data for Machine Learning

Comments
6 min read
Overview of Real-Time Data Synchronization from MySQL to VeloDB

Overview of Real-Time Data Synchronization from MySQL to VeloDB

5
Comments
5 min read
Stop Writing df.describe(): Automate EDA with D-Tale (The Lazy Engineer's Way)

Stop Writing df.describe(): Automate EDA with D-Tale (The Lazy Engineer's Way)

Comments
3 min read
Build a Local Lead Gen Machine: Scraping Google Maps with n8n (Reliably)

Build a Local Lead Gen Machine: Scraping Google Maps with n8n (Reliably)

Comments
3 min read
CHW Monthly Activity Aggregation: Turning Visit Logs into Insight

CHW Monthly Activity Aggregation: Turning Visit Logs into Insight

Comments
5 min read
🔥 Day 2: Understanding Spark Architecture - How Spark Executes Your Code Internally
Cover image for 🔥 Day 2: Understanding Spark Architecture - How Spark Executes Your Code Internally

🔥 Day 2: Understanding Spark Architecture - How Spark Executes Your Code Internally

Comments
2 min read
RAG Isn’t a Modeling Problem. It’s a Data Engineering Problem.
Cover image for RAG Isn’t a Modeling Problem. It’s a Data Engineering Problem.

RAG Isn’t a Modeling Problem. It’s a Data Engineering Problem.

1
Comments
6 min read
Apache Data Lakehouse Weekly: December 30, 2025 – January 5, 2026
Cover image for Apache Data Lakehouse Weekly: December 30, 2025 – January 5, 2026

Apache Data Lakehouse Weekly: December 30, 2025 – January 5, 2026

1
Comments
4 min read
Marmot: Data catalog without the complex infrastructure
Cover image for Marmot: Data catalog without the complex infrastructure

Marmot: Data catalog without the complex infrastructure

1
Comments
3 min read
TDD for dbt: unit testing the way it should be

TDD for dbt: unit testing the way it should be

2
Comments
12 min read
Building a Medical-Grade Knowledge Graph: Mapping Drug Interactions with Neo4j and LlamaIndex 🩺💻

Building a Medical-Grade Knowledge Graph: Mapping Drug Interactions with Neo4j and LlamaIndex 🩺💻

Comments 1
3 min read
Schema, COPY, MERGE, and Immutability — A First-Principles Guide for Data Engineers

Schema, COPY, MERGE, and Immutability — A First-Principles Guide for Data Engineers

Comments
5 min read
HackerRank 'The Pads' MySQL

HackerRank 'The Pads' MySQL

Comments
3 min read
🔥 Day 5: Introduction to DataFrames - The Most Importantce of Spark API
Cover image for 🔥 Day 5: Introduction to DataFrames - The Most Importantce of Spark API

🔥 Day 5: Introduction to DataFrames - The Most Importantce of Spark API

Comments
2 min read
👋 Sign in for the ability to sort posts by relevant, latest, or top.