Forem

# dataengineering

Posts

👋 Sign in for the ability to sort posts by relevant, latest, or top.
Lightweight big data processing technology

Lightweight big data processing technology

5
Comments
9 min read
SQL: Doing GROUP BY in CsvPath
Cover image for SQL: Doing GROUP BY in CsvPath

SQL: Doing GROUP BY in CsvPath

Comments
5 min read
🔥 Day 3: RDDs - The Foundation of Spark
Cover image for 🔥 Day 3: RDDs - The Foundation of Spark

🔥 Day 3: RDDs - The Foundation of Spark

Comments
2 min read
🔥 Day 4: RDD Internals - Partitions, Shuffles & Repartitioning Demystified
Cover image for 🔥 Day 4: RDD Internals - Partitions, Shuffles & Repartitioning Demystified

🔥 Day 4: RDD Internals - Partitions, Shuffles & Repartitioning Demystified

Comments
2 min read
The 16GB RAM Hell (And Why You Don’t Need a Cluster to Escape It)
Cover image for The 16GB RAM Hell (And Why You Don’t Need a Cluster to Escape It)

The 16GB RAM Hell (And Why You Don’t Need a Cluster to Escape It)

Comments
20 min read
The Developer's Guide to Normalizing Historical Airline Flight Data for Machine Learning
Cover image for The Developer's Guide to Normalizing Historical Airline Flight Data for Machine Learning

The Developer's Guide to Normalizing Historical Airline Flight Data for Machine Learning

Comments
6 min read
Overview of Real-Time Data Synchronization from MySQL to VeloDB

Overview of Real-Time Data Synchronization from MySQL to VeloDB

5
Comments
5 min read
Stop Writing df.describe(): Automate EDA with D-Tale (The Lazy Engineer's Way)

Stop Writing df.describe(): Automate EDA with D-Tale (The Lazy Engineer's Way)

Comments
3 min read
CHW Monthly Activity Aggregation: Turning Visit Logs into Insight

CHW Monthly Activity Aggregation: Turning Visit Logs into Insight

Comments
5 min read
🔥 Day 2: Understanding Spark Architecture - How Spark Executes Your Code Internally
Cover image for 🔥 Day 2: Understanding Spark Architecture - How Spark Executes Your Code Internally

🔥 Day 2: Understanding Spark Architecture - How Spark Executes Your Code Internally

Comments
2 min read
HackerRank 'The Pads' MySQL

HackerRank 'The Pads' MySQL

Comments
3 min read
🔥 Day 5: Introduction to DataFrames - The Most Importantce of Spark API
Cover image for 🔥 Day 5: Introduction to DataFrames - The Most Importantce of Spark API

🔥 Day 5: Introduction to DataFrames - The Most Importantce of Spark API

Comments
2 min read
Generating Table Schema from AWS Glue Table

Generating Table Schema from AWS Glue Table

2
Comments
1 min read
Building a Production-Ready Data Pipeline on AWS: A Hands-On Guide for Data Engineers
Cover image for Building a Production-Ready Data Pipeline on AWS: A Hands-On Guide for Data Engineers

Building a Production-Ready Data Pipeline on AWS: A Hands-On Guide for Data Engineers

1
Comments
3 min read
Comparing Great Expectations and CsvPath Framework
Cover image for Comparing Great Expectations and CsvPath Framework

Comparing Great Expectations and CsvPath Framework

Comments
8 min read
Financial Transaction Data Reconciler PayPal

Financial Transaction Data Reconciler PayPal

Comments
5 min read
Introducing dremioframe - A Pythonic DataFrame Interface for Dremio
Cover image for Introducing dremioframe - A Pythonic DataFrame Interface for Dremio

Introducing dremioframe - A Pythonic DataFrame Interface for Dremio

Comments
9 min read
Stifel Modern Data Platform

Stifel Modern Data Platform

Comments
4 min read
Core Microsoft Fabric Concepts
Cover image for Core Microsoft Fabric Concepts

Core Microsoft Fabric Concepts

1
Comments
3 min read
Implementing a CDC pipeline with Debezium
Cover image for Implementing a CDC pipeline with Debezium

Implementing a CDC pipeline with Debezium

Comments
8 min read
LogInSight: A Lightweight CloudWatch Log Analytics Tool for Faster Debugging and Real-Time Insights
Cover image for LogInSight: A Lightweight CloudWatch Log Analytics Tool for Faster Debugging and Real-Time Insights

LogInSight: A Lightweight CloudWatch Log Analytics Tool for Faster Debugging and Real-Time Insights

2
Comments
3 min read
Building Streaming Iceberg Tables for Real-Time Logistics Analytics

Building Streaming Iceberg Tables for Real-Time Logistics Analytics

Comments
4 min read
Building a Scalable Community Health Worker Analytics Platform: My Journey with dbt and Snowflake
Cover image for Building a Scalable Community Health Worker Analytics Platform: My Journey with dbt and Snowflake

Building a Scalable Community Health Worker Analytics Platform: My Journey with dbt and Snowflake

Comments
4 min read
Day 8: Accelerating Spark Joins - Broadcast, Shuffle Optimization & Skew Handling
Cover image for Day 8: Accelerating Spark Joins - Broadcast, Shuffle Optimization & Skew Handling

Day 8: Accelerating Spark Joins - Broadcast, Shuffle Optimization & Skew Handling

Comments
2 min read
A Stranger In a New Town: CsvPath metadata fields
Cover image for A Stranger In a New Town: CsvPath metadata fields

A Stranger In a New Town: CsvPath metadata fields

Comments
6 min read
loading...