Forem

# dataengineering

Posts

👋 Sign in for the ability to sort posts by relevant, latest, or top.
Apache Doris 4.0: One Engine for Analytics, Full-Text Search, and Vector Search

Apache Doris 4.0: One Engine for Analytics, Full-Text Search, and Vector Search

4
Comments
22 min read
From 8 Minutes to 40 Seconds: Solving Data Pipeline Deployment Bottlenecks with Git Sparse Checkout

From 8 Minutes to 40 Seconds: Solving Data Pipeline Deployment Bottlenecks with Git Sparse Checkout

Comments
5 min read
Create a Microsoft Fabric Lakehouse
Cover image for Create a Microsoft Fabric Lakehouse

Create a Microsoft Fabric Lakehouse

5
Comments
6 min read
Core Concepts of Kafka

Core Concepts of Kafka

Comments
8 min read
From Kafka to Clean Tables: Building a Confluent Snowflake Pipeline with Streams & Tasks.

From Kafka to Clean Tables: Building a Confluent Snowflake Pipeline with Streams & Tasks.

Comments
9 min read
Apache Kafka: ZooKeeper vs. KRaft — A Complete Comparison of Approaches

Apache Kafka: ZooKeeper vs. KRaft — A Complete Comparison of Approaches

Comments
6 min read
Introduction to Apache Airflow
Cover image for Introduction to Apache Airflow

Introduction to Apache Airflow

1
Comments
4 min read
Building a Production-Ready Data Lake: PostgreSQL to S3 with AWS DMS, Glue, and Athena using CDK

Building a Production-Ready Data Lake: PostgreSQL to S3 with AWS DMS, Glue, and Athena using CDK

2
Comments
8 min read
Real-Time Cryptocurrency Data Pipeline

Real-Time Cryptocurrency Data Pipeline

Comments
12 min read
Synthetic Data for RAG: Safe Generation, Deduplication, and Drift-Aware Curation in 2025

Synthetic Data for RAG: Safe Generation, Deduplication, and Drift-Aware Curation in 2025

2
Comments
10 min read
Personal Picks: Data Product News (October 1, 2025)

Personal Picks: Data Product News (October 1, 2025)

Comments
7 min read
SQL: is there a better way to code this?
Cover image for SQL: is there a better way to code this?

SQL: is there a better way to code this?

Comments 1
1 min read
Building Real-Time Data Pipelines from PostgreSQL Using Flink CDC

Building Real-Time Data Pipelines from PostgreSQL Using Flink CDC

Comments
5 min read
How to Convert Excel to CSV in Python using Spire.XLS for Python

How to Convert Excel to CSV in Python using Spire.XLS for Python

Comments
4 min read
Building a Sales Database in PostgreSQL — Schema, Data & JOIN Examples

Building a Sales Database in PostgreSQL — Schema, Data & JOIN Examples

3
Comments
6 min read
Git Integration in Microsoft Fabric

Git Integration in Microsoft Fabric

3
Comments
3 min read
Beyond the Browser: Crafting a Robust Web Scraping Pipeline for Dynamic Sports Data

Beyond the Browser: Crafting a Robust Web Scraping Pipeline for Dynamic Sports Data

Comments 1
3 min read
Get Started with Fastest SQL Query Engine - Presto C++ (Prestissimo): Beginner Friendly Setup Guide with Docker.
Cover image for Get Started with Fastest SQL Query Engine - Presto C++ (Prestissimo): Beginner Friendly Setup Guide with Docker.

Get Started with Fastest SQL Query Engine - Presto C++ (Prestissimo): Beginner Friendly Setup Guide with Docker.

Comments
5 min read
10 Best Platforms to Learn Data Analytics in 2026
Cover image for 10 Best Platforms to Learn Data Analytics in 2026

10 Best Platforms to Learn Data Analytics in 2026

1
Comments
4 min read
Apache Zookeeper: O coordenador de sistemas distribuídos

Apache Zookeeper: O coordenador de sistemas distribuídos

Comments
8 min read
Data Ingestion Types Explained: Finding the Right Model for Your Data Pipeline

Data Ingestion Types Explained: Finding the Right Model for Your Data Pipeline

2
Comments
3 min read
Debezium: Capturando mudanças de dados em tempo real

Debezium: Capturando mudanças de dados em tempo real

Comments
3 min read
Change Data Capture (CDC): Capturando mudanças em tempo real

Change Data Capture (CDC): Capturando mudanças em tempo real

Comments
4 min read
Streams de Dados: Processamento de Informações em Tempo Real

Streams de Dados: Processamento de Informações em Tempo Real

Comments
3 min read
Designing Data-Intensive Applications — Chapter 1: Reliable, Scalable, and Maintainable Applications

Designing Data-Intensive Applications — Chapter 1: Reliable, Scalable, and Maintainable Applications

5
Comments
4 min read
loading...