Forem

# dataengineering

Posts

👋 Sign in for the ability to sort posts by relevant, latest, or top.
The Offline Data Engineer: Building Resilient API Pipelines that Work on an Airplane
Cover image for The Offline Data Engineer: Building Resilient API Pipelines that Work on an Airplane

The Offline Data Engineer: Building Resilient API Pipelines that Work on an Airplane

Comments
5 min read
Introduction to the Confluent REST Proxy

Introduction to the Confluent REST Proxy

2
Comments
4 min read
Why We Need Schema Registry in Kafka

Why We Need Schema Registry in Kafka

2
Comments
17 min read
Azure Synapse Analytics

Azure Synapse Analytics

Comments
5 min read
Debugging Windows Race Conditions in Dagster

Debugging Windows Race Conditions in Dagster

Comments
3 min read
6 Different Data Formats Commonly Used in Data Analytics

6 Different Data Formats Commonly Used in Data Analytics

Comments
3 min read
Part 1: Snowflake's Autonomous Future
Cover image for Part 1: Snowflake's Autonomous Future

Part 1: Snowflake's Autonomous Future

Comments
8 min read
Scaling Customer Analytics: Designing ML Pipelines for Millions of Users

Scaling Customer Analytics: Designing ML Pipelines for Millions of Users

Comments
7 min read
Apache Dev Mail Digest: Iceberg & Polaris (Nov 12–17, 2025)
Cover image for Apache Dev Mail Digest: Iceberg & Polaris (Nov 12–17, 2025)

Apache Dev Mail Digest: Iceberg & Polaris (Nov 12–17, 2025)

Comments
4 min read
Why Parquet Is Everywhere - And What Makes It Actually Fast?
Cover image for Why Parquet Is Everywhere - And What Makes It Actually Fast?

Why Parquet Is Everywhere - And What Makes It Actually Fast?

2
Comments
3 min read
How to Get Filtered Amazon Reviews into a Pandas DataFrame in Under 50 Lines of Python

How to Get Filtered Amazon Reviews into a Pandas DataFrame in Under 50 Lines of Python

Comments
3 min read
Why Your Enterprise Data Platform Is No Longer Just for Analytics
Cover image for Why Your Enterprise Data Platform Is No Longer Just for Analytics

Why Your Enterprise Data Platform Is No Longer Just for Analytics

Comments 1
11 min read
Why Your Snowflake Bill is High and How to Fix It with a Hybrid Approach

Why Your Snowflake Bill is High and How to Fix It with a Hybrid Approach

1
Comments
14 min read
Star vs. Snowflake Schema
Cover image for Star vs. Snowflake Schema

Star vs. Snowflake Schema

Comments
4 min read
A real-world example of CsvPath schemas
Cover image for A real-world example of CsvPath schemas

A real-world example of CsvPath schemas

Comments
5 min read
Data Engineer — Người Kiến Tạo “Dòng Chảy Dữ Liệu” Trong Kỷ Nguyên Số

Data Engineer — Người Kiến Tạo “Dòng Chảy Dữ Liệu” Trong Kỷ Nguyên Số

Comments
2 min read
Building a Modern Data Platform to Track Kenya’s Food Prices — A Data Engineering Case Study
Cover image for Building a Modern Data Platform to Track Kenya’s Food Prices — A Data Engineering Case Study

Building a Modern Data Platform to Track Kenya’s Food Prices — A Data Engineering Case Study

Comments
5 min read
Temperature, Tokens, and Context Windows: The Three Pillars of LLM Control

Temperature, Tokens, and Context Windows: The Three Pillars of LLM Control

1
Comments
13 min read
Final Project Report 1: Schema Evolution Support on Apache SeaTunnel Flink Engine

Final Project Report 1: Schema Evolution Support on Apache SeaTunnel Flink Engine

Comments
4 min read
Building Intelligent, Metadata-Driven Pipelines with Azure Data Factory

Building Intelligent, Metadata-Driven Pipelines with Azure Data Factory

3
Comments 1
6 min read
From Pandas to Upstream Control: The Evolution PyData Needs Next

From Pandas to Upstream Control: The Evolution PyData Needs Next

Comments
6 min read
Building Reliable Legal AI: Never Missing a Supreme Court Case
Cover image for Building Reliable Legal AI: Never Missing a Supreme Court Case

Building Reliable Legal AI: Never Missing a Supreme Court Case

2
Comments
26 min read
Statistics Day 2: Correlation Isn’t Causation — Here’s Why It Matters!
Cover image for Statistics Day 2: Correlation Isn’t Causation — Here’s Why It Matters!

Statistics Day 2: Correlation Isn’t Causation — Here’s Why It Matters!

5
Comments
4 min read
Kafka consumer lag—Measure and reduce

Kafka consumer lag—Measure and reduce

Comments
5 min read
Understanding Kafka Consumer Lag: Causes, Risks, and How to Fix It
Cover image for Understanding Kafka Consumer Lag: Causes, Risks, and How to Fix It

Understanding Kafka Consumer Lag: Causes, Risks, and How to Fix It

Comments
3 min read
loading...