Forem

# dataengineering

Posts

👋 Sign in for the ability to sort posts by relevant, latest, or top.
PySpark to Pandas/scikit-learn: A Practical Migration Guide for Data Engineers Learning ML

PySpark to Pandas/scikit-learn: A Practical Migration Guide for Data Engineers Learning ML

Comments
7 min read
ETL vs ELT: Which One Should You Use and Why?

ETL vs ELT: Which One Should You Use and Why?

1
Comments
6 min read
Entity Resolution at Scale: Matching Products Across Amazon, Reddit, and RTINGS

Entity Resolution at Scale: Matching Products Across Amazon, Reddit, and RTINGS

Comments
4 min read
Apache Data Lakehouse Weekly: April 3–9, 2026
Cover image for Apache Data Lakehouse Weekly: April 3–9, 2026

Apache Data Lakehouse Weekly: April 3–9, 2026

Comments
7 min read
AWS Lake Formation: Why Your Data Lake Permissions Are Probably a Mess (And How to Fix That)

AWS Lake Formation: Why Your Data Lake Permissions Are Probably a Mess (And How to Fix That)

Comments
3 min read
ETL VS ELT: WHICH ONE SHOULD YOU USE AND WHY?

ETL VS ELT: WHICH ONE SHOULD YOU USE AND WHY?

Comments
5 min read
Airflow vs Prefect vs Dagster: Picking the Right Orchestrator in 2026

Airflow vs Prefect vs Dagster: Picking the Right Orchestrator in 2026

Comments
6 min read
Advanced SQL Techniques for Data Analytics Every Data Analyst Should Know
Cover image for Advanced SQL Techniques for Data Analytics Every Data Analyst Should Know

Advanced SQL Techniques for Data Analytics Every Data Analyst Should Know

Comments
6 min read
Your Customer Table Has Duplicates You Can't See With SQL How I Built a Cross-Platform Identity Resolution Layer for a Dark Kitchen Data Platform

Your Customer Table Has Duplicates You Can't See With SQL How I Built a Cross-Platform Identity Resolution Layer for a Dark Kitchen Data Platform

3
Comments
8 min read
How to Bypass the Pandas "Object Tax": Building an 8x Faster CSV Engine in C
Cover image for How to Bypass the Pandas "Object Tax": Building an 8x Faster CSV Engine in C

How to Bypass the Pandas "Object Tax": Building an 8x Faster CSV Engine in C

Comments
2 min read
PostgreSQL Foreign Data Wrappers: Cross-Database Queries Explained
Cover image for PostgreSQL Foreign Data Wrappers: Cross-Database Queries Explained

PostgreSQL Foreign Data Wrappers: Cross-Database Queries Explained

Comments
4 min read
How Google Maps Predicts Traffic in Real Time: Live Data and ETA Explained

How Google Maps Predicts Traffic in Real Time: Live Data and ETA Explained

Comments
3 min read
How Gudu SQL Omni Works: Accurate Offline Data Lineage Analysis in VS Code

How Gudu SQL Omni Works: Accurate Offline Data Lineage Analysis in VS Code

Comments
3 min read
ETL vs ELT: Which One Should You Use and Why?

ETL vs ELT: Which One Should You Use and Why?

3
Comments
4 min read
ETL vs ELT in Data Engineering: Key Differences and Use Cases Explained

ETL vs ELT in Data Engineering: Key Differences and Use Cases Explained

Comments
3 min read
👋 Sign in for the ability to sort posts by relevant, latest, or top.