Forem

# dataengineering

Posts

👋 Sign in for the ability to sort posts by relevant, latest, or top.
Simulating An Event-Driven Python Shopping App with Kafka on AWS For Real-Time Processing.
Cover image for Simulating An Event-Driven Python Shopping App with Kafka on AWS For Real-Time Processing.

Simulating An Event-Driven Python Shopping App with Kafka on AWS For Real-Time Processing.

4
Comments 1
8 min read
Real-Time Cryptocurrency Data Pipeline
Cover image for Real-Time Cryptocurrency Data Pipeline

Real-Time Cryptocurrency Data Pipeline

Comments
5 min read
Real-Time Crypto Data Pipeline
Cover image for Real-Time Crypto Data Pipeline

Real-Time Crypto Data Pipeline

Comments
3 min read
Decommissioning the Dinosaur: A 4-Phase Playbook for Migrating Your Legacy Data Warehouse to Databricks
Cover image for Decommissioning the Dinosaur: A 4-Phase Playbook for Migrating Your Legacy Data Warehouse to Databricks

Decommissioning the Dinosaur: A 4-Phase Playbook for Migrating Your Legacy Data Warehouse to Databricks

Comments
4 min read
Crypto Real-Time Data Pipeline
Cover image for Crypto Real-Time Data Pipeline

Crypto Real-Time Data Pipeline

Comments
4 min read
Cryptocurrency Data Pipeline Project

Cryptocurrency Data Pipeline Project

Comments
4 min read
Set up an open-source AI analyst for PostgreSQL in 2 minutes

Set up an open-source AI analyst for PostgreSQL in 2 minutes

1
Comments
5 min read
The Semantic Gap in Data Quality: Why Your Monitoring is Lying to You
Cover image for The Semantic Gap in Data Quality: Why Your Monitoring is Lying to You

The Semantic Gap in Data Quality: Why Your Monitoring is Lying to You

1
Comments 1
7 min read
Building an Automated Data Pipeline: Injuries vs Performance in the Premier League

Building an Automated Data Pipeline: Injuries vs Performance in the Premier League

Comments
6 min read
2025-2026 Guide to Learning about Apache Iceberg, Data Lakehouse & Agentic AI
Cover image for 2025-2026 Guide to Learning about Apache Iceberg, Data Lakehouse & Agentic AI

2025-2026 Guide to Learning about Apache Iceberg, Data Lakehouse & Agentic AI

Comments
9 min read
Evolution of Processing: SPL One-Click Acceleration for Log-to-Metric Conversion

Evolution of Processing: SPL One-Click Acceleration for Log-to-Metric Conversion

Comments
6 min read
Apache Doris 4.0: One Engine for Analytics, Full-Text Search, and Vector Search

Apache Doris 4.0: One Engine for Analytics, Full-Text Search, and Vector Search

4
Comments
7 min read
Hands-On ACID in PostgreSQL : Part-1

Hands-On ACID in PostgreSQL : Part-1

Comments
3 min read
My First Data Engineering Project: Building a Real-Time IoT Pipeline on Azure

My First Data Engineering Project: Building a Real-Time IoT Pipeline on Azure

Comments
6 min read
Apache Doris 4.0: One Engine for Analytics, Full-Text Search, and Vector Search

Apache Doris 4.0: One Engine for Analytics, Full-Text Search, and Vector Search

Comments
22 min read
Data Engineering 102: Understanding Transactions, ACID, and Isolation in PostgreSQL
Cover image for Data Engineering 102: Understanding Transactions, ACID, and Isolation in PostgreSQL

Data Engineering 102: Understanding Transactions, ACID, and Isolation in PostgreSQL

2
Comments
5 min read
Join OSA CON 2025: Two Days of Open‑Source Analytics and AI (Nov. 4–5)
Cover image for Join OSA CON 2025: Two Days of Open‑Source Analytics and AI (Nov. 4–5)

Join OSA CON 2025: Two Days of Open‑Source Analytics and AI (Nov. 4–5)

Comments
3 min read
Benchmarking Multimodal AI Workloads: Daft vs Spark vs Ray Data

Benchmarking Multimodal AI Workloads: Daft vs Spark vs Ray Data

Comments
1 min read
AWS Glue for ETL

AWS Glue for ETL

Comments
5 min read
🐝 Why Hive Exists - And Why Its Complexity Is Actually Necessary
Cover image for 🐝 Why Hive Exists - And Why Its Complexity Is Actually Necessary

🐝 Why Hive Exists - And Why Its Complexity Is Actually Necessary

2
Comments
3 min read
The "Shift-Left" Imperative: Implementing Data Contracts in CI/CD Pipeline
Cover image for The "Shift-Left" Imperative: Implementing Data Contracts in CI/CD Pipeline

The "Shift-Left" Imperative: Implementing Data Contracts in CI/CD Pipeline

Comments
4 min read
Building a 75,000-Product Image Feature Dataset for the Amazon ML Challenge 2025

Building a 75,000-Product Image Feature Dataset for the Amazon ML Challenge 2025

1
Comments
4 min read
An Exploration of the Commercial Iceberg Catalog Ecosystem
Cover image for An Exploration of the Commercial Iceberg Catalog Ecosystem

An Exploration of the Commercial Iceberg Catalog Ecosystem

Comments
14 min read
Getting Started Building a Data Platform

Getting Started Building a Data Platform

Comments
3 min read
Building a Universal Lakehouse Catalog: Beyond Iceberg Tables
Cover image for Building a Universal Lakehouse Catalog: Beyond Iceberg Tables

Building a Universal Lakehouse Catalog: Beyond Iceberg Tables

Comments
10 min read
loading...