Forem

# dataengineering

Posts

👋 Sign in for the ability to sort posts by relevant, latest, or top.
2025-2026 Guide to Learning about Apache Iceberg, Data Lakehouse & Agentic AI
Cover image for 2025-2026 Guide to Learning about Apache Iceberg, Data Lakehouse & Agentic AI

2025-2026 Guide to Learning about Apache Iceberg, Data Lakehouse & Agentic AI

Comments
9 min read
Evolution of Processing: SPL One-Click Acceleration for Log-to-Metric Conversion

Evolution of Processing: SPL One-Click Acceleration for Log-to-Metric Conversion

Comments
6 min read
My First Data Engineering Project: Building a Real-Time IoT Pipeline on Azure

My First Data Engineering Project: Building a Real-Time IoT Pipeline on Azure

Comments
6 min read
The Data Engineer’s Codex: From First Principles to the Modern Lakehouse
Cover image for The Data Engineer’s Codex: From First Principles to the Modern Lakehouse

The Data Engineer’s Codex: From First Principles to the Modern Lakehouse

6
Comments
10 min read
Containerization for Data Engineering: A Practical Guide with Docker and Docker Compose
Cover image for Containerization for Data Engineering: A Practical Guide with Docker and Docker Compose

Containerization for Data Engineering: A Practical Guide with Docker and Docker Compose

Comments
2 min read
Join OSA CON 2025: Two Days of Open‑Source Analytics and AI (Nov. 4–5)
Cover image for Join OSA CON 2025: Two Days of Open‑Source Analytics and AI (Nov. 4–5)

Join OSA CON 2025: Two Days of Open‑Source Analytics and AI (Nov. 4–5)

Comments
3 min read
AWS Glue for ETL

AWS Glue for ETL

Comments
5 min read
What to use for data preparation in report, query or analysis business?

What to use for data preparation in report, query or analysis business?

5
Comments
10 min read
Optimizing Data Processing on AWS with Data Compaction

Optimizing Data Processing on AWS with Data Compaction

2
Comments
7 min read
Real-Time Earthquake CDC Pipeline

Real-Time Earthquake CDC Pipeline

Comments
5 min read
The Offline Data Engineer: Building Resilient API Pipelines that Work on an Airplane
Cover image for The Offline Data Engineer: Building Resilient API Pipelines that Work on an Airplane

The Offline Data Engineer: Building Resilient API Pipelines that Work on an Airplane

4
Comments
5 min read
Understanding Kafka Architecture, Schema Registry, ksqlDB, PostgreSQL, Couchbase, and Microservices

Understanding Kafka Architecture, Schema Registry, ksqlDB, PostgreSQL, Couchbase, and Microservices

2
Comments
3 min read
The "Shift-Left" Imperative: Implementing Data Contracts in CI/CD Pipeline
Cover image for The "Shift-Left" Imperative: Implementing Data Contracts in CI/CD Pipeline

The "Shift-Left" Imperative: Implementing Data Contracts in CI/CD Pipeline

Comments
4 min read
Building a 75,000-Product Image Feature Dataset for the Amazon ML Challenge 2025

Building a 75,000-Product Image Feature Dataset for the Amazon ML Challenge 2025

1
Comments
4 min read
An Exploration of the Commercial Iceberg Catalog Ecosystem
Cover image for An Exploration of the Commercial Iceberg Catalog Ecosystem

An Exploration of the Commercial Iceberg Catalog Ecosystem

Comments
14 min read
🧠 ClickHouse LEFT JOINs: Why join_use_nulls Matters
Cover image for 🧠 ClickHouse LEFT JOINs: Why join_use_nulls Matters

🧠 ClickHouse LEFT JOINs: Why join_use_nulls Matters

5
Comments
2 min read
Getting Started Building a Data Platform

Getting Started Building a Data Platform

Comments
3 min read
Building a Universal Lakehouse Catalog: Beyond Iceberg Tables
Cover image for Building a Universal Lakehouse Catalog: Beyond Iceberg Tables

Building a Universal Lakehouse Catalog: Beyond Iceberg Tables

Comments
10 min read
Real-time Data Analytics at Scale: Integrating Apache Flink and Apache Doris with Flink Doris Connector and Flink CDC

Real-time Data Analytics at Scale: Integrating Apache Flink and Apache Doris with Flink Doris Connector and Flink CDC

Comments
10 min read
Optimizing Kafka Performance: Best Practices for High Throughput and Low Latency
Cover image for Optimizing Kafka Performance: Best Practices for High Throughput and Low Latency

Optimizing Kafka Performance: Best Practices for High Throughput and Low Latency

Comments
7 min read
Fixing Type Hints for Callable Objects with Custom Signatures in Dagster

Fixing Type Hints for Callable Objects with Custom Signatures in Dagster

2
Comments
3 min read
Apache Spark সহজভাবে জানি

Apache Spark সহজভাবে জানি

1
Comments
1 min read
Building a Test Data Platform After Watching Teams Secretly Use Production for Years
Cover image for Building a Test Data Platform After Watching Teams Secretly Use Production for Years

Building a Test Data Platform After Watching Teams Secretly Use Production for Years

1
Comments
3 min read
Chinese DBA's Story: Sui Haifeng - Grasp the two most important five-year periods of your career

Chinese DBA's Story: Sui Haifeng - Grasp the two most important five-year periods of your career

Comments
5 min read
Kafka

Kafka

3
Comments
10 min read
loading...