Forem

# dataengineering

Posts

👋 Sign in for the ability to sort posts by relevant, latest, or top.
Automating EL pipeline using Azure Functions(Python)

Automating EL pipeline using Azure Functions(Python)

Comments
4 min read
How to Get Filtered Amazon Reviews into a Pandas DataFrame in Under 50 Lines of Python

How to Get Filtered Amazon Reviews into a Pandas DataFrame in Under 50 Lines of Python

Comments
3 min read
Comparing CsvPath and SodaCL
Cover image for Comparing CsvPath and SodaCL

Comparing CsvPath and SodaCL

Comments
4 min read
Why Your Snowflake Bill is High and How to Fix It with a Hybrid Approach

Why Your Snowflake Bill is High and How to Fix It with a Hybrid Approach

1
Comments
14 min read
Star vs. Snowflake Schema
Cover image for Star vs. Snowflake Schema

Star vs. Snowflake Schema

Comments
4 min read
Data Engineer — Người Kiến Tạo “Dòng Chảy Dữ Liệu” Trong Kỷ Nguyên Số

Data Engineer — Người Kiến Tạo “Dòng Chảy Dữ Liệu” Trong Kỷ Nguyên Số

Comments
2 min read
Building a Modern Data Platform to Track Kenya’s Food Prices — A Data Engineering Case Study
Cover image for Building a Modern Data Platform to Track Kenya’s Food Prices — A Data Engineering Case Study

Building a Modern Data Platform to Track Kenya’s Food Prices — A Data Engineering Case Study

Comments
5 min read
AWS Glue ETL Jobs: Transform Your Data at Scale
Cover image for AWS Glue ETL Jobs: Transform Your Data at Scale

AWS Glue ETL Jobs: Transform Your Data at Scale

1
Comments
4 min read
Final Project Report 1: Schema Evolution Support on Apache SeaTunnel Flink Engine

Final Project Report 1: Schema Evolution Support on Apache SeaTunnel Flink Engine

Comments
4 min read
From Pandas to Upstream Control: The Evolution PyData Needs Next

From Pandas to Upstream Control: The Evolution PyData Needs Next

Comments
6 min read
Building Reliable Legal AI: Never Missing a Supreme Court Case
Cover image for Building Reliable Legal AI: Never Missing a Supreme Court Case

Building Reliable Legal AI: Never Missing a Supreme Court Case

2
Comments
26 min read
Statistics Day 2: Correlation Isn’t Causation — Here’s Why It Matters!
Cover image for Statistics Day 2: Correlation Isn’t Causation — Here’s Why It Matters!

Statistics Day 2: Correlation Isn’t Causation — Here’s Why It Matters!

5
Comments
4 min read
Kafka consumer lag—Measure and reduce

Kafka consumer lag—Measure and reduce

Comments
5 min read
Understanding Kafka Consumer Lag: Causes, Risks, and How to Fix It
Cover image for Understanding Kafka Consumer Lag: Causes, Risks, and How to Fix It

Understanding Kafka Consumer Lag: Causes, Risks, and How to Fix It

Comments
3 min read
Building a Real-Time Crypto Data Pipeline with Debezium CDC
Cover image for Building a Real-Time Crypto Data Pipeline with Debezium CDC

Building a Real-Time Crypto Data Pipeline with Debezium CDC

Comments
5 min read
Undestanding Kafka Lag, Why It Happens and How To Fix It.
Cover image for Undestanding Kafka Lag, Why It Happens and How To Fix It.

Undestanding Kafka Lag, Why It Happens and How To Fix It.

2
Comments
4 min read
The State of Apache Iceberg, Polaris, and Arrow: November 5-11
Cover image for The State of Apache Iceberg, Polaris, and Arrow: November 5-11

The State of Apache Iceberg, Polaris, and Arrow: November 5-11

Comments
5 min read
Understanding Kafka Lag: Why It Happens and How to Fix It

Understanding Kafka Lag: Why It Happens and How to Fix It

Comments
4 min read
Right Approach to JSON Log Analysis: A Hands-on Guide to Efficient Practices with Alibaba Cloud SLS

Right Approach to JSON Log Analysis: A Hands-on Guide to Efficient Practices with Alibaba Cloud SLS

Comments
7 min read
Building a Real-Time Data Lake on AWS: S3, Glue, and Athena in Production
Cover image for Building a Real-Time Data Lake on AWS: S3, Glue, and Athena in Production

Building a Real-Time Data Lake on AWS: S3, Glue, and Athena in Production

1
Comments
5 min read
Understanding reasons behind Kafka lag and how to minimize it.

Understanding reasons behind Kafka lag and how to minimize it.

Comments
3 min read
Reducing Consumer Lag in Apache Kafka
Cover image for Reducing Consumer Lag in Apache Kafka

Reducing Consumer Lag in Apache Kafka

5
Comments
3 min read
5 Data Pipeline Mistakes That Cost Me Weeks of Debugging

5 Data Pipeline Mistakes That Cost Me Weeks of Debugging

5
Comments
6 min read
🚀 Day 1: Introduction to Apache Spark
Cover image for 🚀 Day 1: Introduction to Apache Spark

🚀 Day 1: Introduction to Apache Spark

1
Comments
2 min read
Building a Data Platform on AWS: Essential Design Considerations for Power BI

Building a Data Platform on AWS: Essential Design Considerations for Power BI

3
Comments
5 min read
loading...