Forem

# dataengineering

Posts

👋 Sign in for the ability to sort posts by relevant, latest, or top.
Understanding Salesforce Data 360 Objects: The Core of the Unified Customer Profile
Cover image for Understanding Salesforce Data 360 Objects: The Core of the Unified Customer Profile

Understanding Salesforce Data 360 Objects: The Core of the Unified Customer Profile

Comments
3 min read
Day 12: UDF vs Pandas UDF
Cover image for Day 12: UDF vs Pandas UDF

Day 12: UDF vs Pandas UDF

Comments
2 min read
 Day 2: Data Engineering vs Data Science vs Data Analytics

 Day 2: Data Engineering vs Data Science vs Data Analytics

Comments
2 min read
Day 11: Choosing the Right File Format in Spark
Cover image for Day 11: Choosing the Right File Format in Spark

Day 11: Choosing the Right File Format in Spark

Comments
2 min read
Navigating the Future: Key Data Engineering Trends for 2024 and Beyond

Navigating the Future: Key Data Engineering Trends for 2024 and Beyond

Comments
6 min read
Day 7: Mastering Joins, Unions, and GroupBy in PySpark - The Core ETL Operations
Cover image for Day 7: Mastering Joins, Unions, and GroupBy in PySpark - The Core ETL Operations

Day 7: Mastering Joins, Unions, and GroupBy in PySpark - The Core ETL Operations

Comments
2 min read
Top Open-Source Data Engineering Tools- Unravelling the Best in 2026

Top Open-Source Data Engineering Tools- Unravelling the Best in 2026

Comments
10 min read
map

map

Comments
1 min read
Data Engineering in 30 Days - Day 2

Data Engineering in 30 Days - Day 2

Comments
2 min read
Why Frontend Teams Should Care About Data Modeling for Real-Time Dashboards
Cover image for Why Frontend Teams Should Care About Data Modeling for Real-Time Dashboards

Why Frontend Teams Should Care About Data Modeling for Real-Time Dashboards

Comments
2 min read
Refactoring a Mature Airflow Project: A Practical Guide to Scaling from Solo Development to Team Collaboration

Refactoring a Mature Airflow Project: A Practical Guide to Scaling from Solo Development to Team Collaboration

Comments
4 min read
Apache Dev List Digest: Iceberg, Polaris, Arrow & Parquet (Nov 24-Dec 8, 2025)
Cover image for Apache Dev List Digest: Iceberg, Polaris, Arrow & Parquet (Nov 24-Dec 8, 2025)

Apache Dev List Digest: Iceberg, Polaris, Arrow & Parquet (Nov 24-Dec 8, 2025)

Comments
6 min read
How to Guarantee True Ordering in Complex Kafka Replays: Solving the Determinism Nightmare
Cover image for How to Guarantee True Ordering in Complex Kafka Replays: Solving the Determinism Nightmare

How to Guarantee True Ordering in Complex Kafka Replays: Solving the Determinism Nightmare

Comments
4 min read
Day 9: Spark SQL Deep Dive - Temp Views, Query Execution & Optimization Tips for Data Engineers
Cover image for Day 9: Spark SQL Deep Dive - Temp Views, Query Execution & Optimization Tips for Data Engineers

Day 9: Spark SQL Deep Dive - Temp Views, Query Execution & Optimization Tips for Data Engineers

Comments
2 min read
AWSChallenge - Week 2
Cover image for AWSChallenge - Week 2

AWSChallenge - Week 2

Comments
4 min read
Day 10: Partitioning vs Bucketing - The Spark Optimization Guide Every Data Engineer Needs
Cover image for Day 10: Partitioning vs Bucketing - The Spark Optimization Guide Every Data Engineer Needs

Day 10: Partitioning vs Bucketing - The Spark Optimization Guide Every Data Engineer Needs

Comments
2 min read
Deepening My Roots in the Data Ecosystem - Choosing Depth Over Breadth

Deepening My Roots in the Data Ecosystem - Choosing Depth Over Breadth

Comments
2 min read
Automate Python Manual Extraction: Build End-to-End PDF -> LLM -> SQL Flows with CocoIndex, Ollama, and Postgres
Cover image for Automate Python Manual Extraction: Build End-to-End PDF -> LLM -> SQL Flows with CocoIndex, Ollama, and Postgres

Automate Python Manual Extraction: Build End-to-End PDF -> LLM -> SQL Flows with CocoIndex, Ollama, and Postgres

Comments
3 min read
DP-600 Fabric Analytics Engineer – Structured Study Notes
Cover image for DP-600 Fabric Analytics Engineer – Structured Study Notes

DP-600 Fabric Analytics Engineer – Structured Study Notes

Comments
11 min read
The Boring Debug Checklist That Fixes Most “RAG Failures”
Cover image for The Boring Debug Checklist That Fixes Most “RAG Failures”

The Boring Debug Checklist That Fixes Most “RAG Failures”

Comments
2 min read
Function Calling and Tool Use: Turning LLMs into Action-Taking Agents

Function Calling and Tool Use: Turning LLMs into Action-Taking Agents

Comments
18 min read
dremioframe & iceberg: Pythonic interfaces for Dremio and Apache Iceberg
Cover image for dremioframe & iceberg: Pythonic interfaces for Dremio and Apache Iceberg

dremioframe & iceberg: Pythonic interfaces for Dremio and Apache Iceberg

Comments
8 min read
Lightweight big data processing technology

Lightweight big data processing technology

5
Comments
9 min read
SQL: Doing GROUP BY in CsvPath
Cover image for SQL: Doing GROUP BY in CsvPath

SQL: Doing GROUP BY in CsvPath

Comments
5 min read
🔥 Day 3: RDDs - The Foundation of Spark
Cover image for 🔥 Day 3: RDDs - The Foundation of Spark

🔥 Day 3: RDDs - The Foundation of Spark

Comments
2 min read
loading...