Forem

# dataengineering

Posts

đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.
Refactoring a Mature Airflow Project: A Practical Guide to Scaling from Solo Development to Team Collaboration

Refactoring a Mature Airflow Project: A Practical Guide to Scaling from Solo Development to Team Collaboration

Comments
4 min read
Apache Dev List Digest: Iceberg, Polaris, Arrow & Parquet (Nov 24-Dec 8, 2025)
Cover image for Apache Dev List Digest: Iceberg, Polaris, Arrow & Parquet (Nov 24-Dec 8, 2025)

Apache Dev List Digest: Iceberg, Polaris, Arrow & Parquet (Nov 24-Dec 8, 2025)

Comments
6 min read
2025 Year in Review: Apache Iceberg, Polaris, Parquet, and Arrow
Cover image for 2025 Year in Review: Apache Iceberg, Polaris, Parquet, and Arrow

2025 Year in Review: Apache Iceberg, Polaris, Parquet, and Arrow

Comments
6 min read
Context Engineering (Part 1): The Architecture of Recall

Context Engineering (Part 1): The Architecture of Recall

Comments 1
3 min read
Day 9: Spark SQL Deep Dive - Temp Views, Query Execution & Optimization Tips for Data Engineers
Cover image for Day 9: Spark SQL Deep Dive - Temp Views, Query Execution & Optimization Tips for Data Engineers

Day 9: Spark SQL Deep Dive - Temp Views, Query Execution & Optimization Tips for Data Engineers

Comments
2 min read
AWSChallenge - Week 2
Cover image for AWSChallenge - Week 2

AWSChallenge - Week 2

Comments
4 min read
Day 10: Partitioning vs Bucketing - The Spark Optimization Guide Every Data Engineer Needs
Cover image for Day 10: Partitioning vs Bucketing - The Spark Optimization Guide Every Data Engineer Needs

Day 10: Partitioning vs Bucketing - The Spark Optimization Guide Every Data Engineer Needs

Comments
2 min read
Deepening My Roots in the Data Ecosystem - Choosing Depth Over Breadth

Deepening My Roots in the Data Ecosystem - Choosing Depth Over Breadth

Comments
2 min read
Automate Python Manual Extraction: Build End-to-End PDF -> LLM -> SQL Flows with CocoIndex, Ollama, and Postgres
Cover image for Automate Python Manual Extraction: Build End-to-End PDF -> LLM -> SQL Flows with CocoIndex, Ollama, and Postgres

Automate Python Manual Extraction: Build End-to-End PDF -> LLM -> SQL Flows with CocoIndex, Ollama, and Postgres

Comments
3 min read
The Boring Debug Checklist That Fixes Most “RAG Failures”
Cover image for The Boring Debug Checklist That Fixes Most “RAG Failures”

The Boring Debug Checklist That Fixes Most “RAG Failures”

Comments
2 min read
Function Calling and Tool Use: Turning LLMs into Action-Taking Agents

Function Calling and Tool Use: Turning LLMs into Action-Taking Agents

Comments
18 min read
dremioframe & iceberg: Pythonic interfaces for Dremio and Apache Iceberg
Cover image for dremioframe & iceberg: Pythonic interfaces for Dremio and Apache Iceberg

dremioframe & iceberg: Pythonic interfaces for Dremio and Apache Iceberg

Comments
8 min read
AWS Lambda and AWS Glue Python Shell in the Context of Lightweight ETL

AWS Lambda and AWS Glue Python Shell in the Context of Lightweight ETL

3
Comments
7 min read
SQL: Doing GROUP BY in CsvPath
Cover image for SQL: Doing GROUP BY in CsvPath

SQL: Doing GROUP BY in CsvPath

Comments
5 min read
🔥 Day 3: RDDs - The Foundation of Spark
Cover image for 🔥 Day 3: RDDs - The Foundation of Spark

🔥 Day 3: RDDs - The Foundation of Spark

Comments
2 min read
đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.