Forem

# dataengineering

Posts

đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.
Tutorial: Intro to Apache Iceberg with Apache Polaris and Apache Spark
Cover image for Tutorial: Intro to Apache Iceberg with Apache Polaris and Apache Spark

Tutorial: Intro to Apache Iceberg with Apache Polaris and Apache Spark

2
Comments 4
20 min read
Comprehensive Guide: kwargs vs XCom in Python & Airflow
Cover image for Comprehensive Guide: kwargs vs XCom in Python & Airflow

Comprehensive Guide: kwargs vs XCom in Python & Airflow

Comments
4 min read
Precise Data Extraction: Pattern-Based Partitioning for Structured Extraction
Cover image for Precise Data Extraction: Pattern-Based Partitioning for Structured Extraction

Precise Data Extraction: Pattern-Based Partitioning for Structured Extraction

1
Comments
3 min read
Apache Gravitino 1.0.0 — From Metadata Management to Contextual Engineering
Cover image for Apache Gravitino 1.0.0 — From Metadata Management to Contextual Engineering

Apache Gravitino 1.0.0 — From Metadata Management to Contextual Engineering

1
Comments
7 min read
Chinese DBA's Story: Hu Zhonghao - The Journey of Becoming a DBA for Domestic Distributed Databases

Chinese DBA's Story: Hu Zhonghao - The Journey of Becoming a DBA for Domestic Distributed Databases

Comments 1
7 min read
Apache Kafka in Data engineering
Cover image for Apache Kafka in Data engineering

Apache Kafka in Data engineering

6
Comments 1
1 min read
đź§­System Design Roadmap for Data Engineers

đź§­System Design Roadmap for Data Engineers

5
Comments
3 min read
Orchestrating and Observing Data Pipelines with Airflow, PostgreSQL, and Polar

Orchestrating and Observing Data Pipelines with Airflow, PostgreSQL, and Polar

2
Comments
3 min read
đź’Ą Polars vs. Pandas: Why Your Next ETL Pipeline Should Run on Rust (Part 1/5)

đź’Ą Polars vs. Pandas: Why Your Next ETL Pipeline Should Run on Rust (Part 1/5)

1
Comments
2 min read
Building a Production-Ready Data Lake: PostgreSQL to S3 with AWS DMS, Glue, and Athena using CDK

Building a Production-Ready Data Lake: PostgreSQL to S3 with AWS DMS, Glue, and Athena using CDK

Comments
8 min read
(â…ˇ) A Complete Guide to Core Data Warehouse Design Standards: From Layers, Types to Lifecycle

(â…ˇ) A Complete Guide to Core Data Warehouse Design Standards: From Layers, Types to Lifecycle

Comments
6 min read
Building Distributed Systems with Ray—Just Like Running a Restaurant
Cover image for Building Distributed Systems with Ray—Just Like Running a Restaurant

Building Distributed Systems with Ray—Just Like Running a Restaurant

1
Comments
7 min read
Synthetic Data for RAG: Safe Generation, Deduplication, and Drift-Aware Curation in 2025

Synthetic Data for RAG: Safe Generation, Deduplication, and Drift-Aware Curation in 2025

Comments
10 min read
The State of Apache Iceberg v4 - October 2025 Edition
Cover image for The State of Apache Iceberg v4 - October 2025 Edition

The State of Apache Iceberg v4 - October 2025 Edition

1
Comments
6 min read
Data Automation: A Deep Dive

Data Automation: A Deep Dive

1
Comments
5 min read
đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.