Forem

# dataengineering

Posts

👋 Sign in for the ability to sort posts by relevant, latest, or top.
Building an Automated YouTube Analytics Dashboard with Airflow, PySpark, MinIO, PostgreSQL & Grafana
Cover image for Building an Automated YouTube Analytics Dashboard with Airflow, PySpark, MinIO, PostgreSQL & Grafana

Building an Automated YouTube Analytics Dashboard with Airflow, PySpark, MinIO, PostgreSQL & Grafana

7
Comments
5 min read
Composable Analytics with Agents: Leveraging Virtual Datasets and the Semantic Layer
Cover image for Composable Analytics with Agents: Leveraging Virtual Datasets and the Semantic Layer

Composable Analytics with Agents: Leveraging Virtual Datasets and the Semantic Layer

1
Comments
3 min read
When to Choose Scala Over Python for Apache Spark: A Performance-Driven Analysis

When to Choose Scala Over Python for Apache Spark: A Performance-Driven Analysis

1
Comments
4 min read
⚽ The Data XI: Building a Modern Football Data Platform — Chapter 1: Taming the Data Beast

⚽ The Data XI: Building a Modern Football Data Platform — Chapter 1: Taming the Data Beast

2
Comments 1
3 min read
📊 Understanding 6 Common Data Formats in Data Analytics

📊 Understanding 6 Common Data Formats in Data Analytics

Comments
4 min read
Self-Adapting Data Pipelines: The Intelligent Future of Data Engineering
Cover image for Self-Adapting Data Pipelines: The Intelligent Future of Data Engineering

Self-Adapting Data Pipelines: The Intelligent Future of Data Engineering

5
Comments
17 min read
Introduction to Apache Kafka for Beginners
Cover image for Introduction to Apache Kafka for Beginners

Introduction to Apache Kafka for Beginners

1
Comments
5 min read
Apache Kafka — Deep Dive: Core Concepts, Data-Engineering Applications, and Real-World Production Practices

Apache Kafka — Deep Dive: Core Concepts, Data-Engineering Applications, and Real-World Production Practices

1
Comments
4 min read
Apache Iceberg Dev List Digest August 25-29

Apache Iceberg Dev List Digest August 25-29

Comments
5 min read
How I Built a MongoDB Archiving System for Crawled Data

How I Built a MongoDB Archiving System for Crawled Data

1
Comments 2
7 min read
Complete Guide: Dockerizing Spark, Kafka, and Jupyter for YouTube Pipeline
Cover image for Complete Guide: Dockerizing Spark, Kafka, and Jupyter for YouTube Pipeline

Complete Guide: Dockerizing Spark, Kafka, and Jupyter for YouTube Pipeline

Comments
9 min read
Dockerized Spark and Kafka: YouTube Data Pipeline Implementation
Cover image for Dockerized Spark and Kafka: YouTube Data Pipeline Implementation

Dockerized Spark and Kafka: YouTube Data Pipeline Implementation

Comments
7 min read
RIP Amazon Data Firehose Change Data Capture
Cover image for RIP Amazon Data Firehose Change Data Capture

RIP Amazon Data Firehose Change Data Capture

7
Comments 3
4 min read
Event-Driven Architectures on AWS: Beyond Lambda

Event-Driven Architectures on AWS: Beyond Lambda

4
Comments
2 min read
🔄 ETL vs ELT: What’s the Difference and Why It Matters?
Cover image for 🔄 ETL vs ELT: What’s the Difference and Why It Matters?

🔄 ETL vs ELT: What’s the Difference and Why It Matters?

Comments
2 min read
YouTube Data Processing Pipeline
Cover image for YouTube Data Processing Pipeline

YouTube Data Processing Pipeline

1
Comments
4 min read
CDC in AWS: Content Data Capture from AWS RDS MySQL into AWS MSK Kafka topic using Debezium
Cover image for CDC in AWS: Content Data Capture from AWS RDS MySQL into AWS MSK Kafka topic using Debezium

CDC in AWS: Content Data Capture from AWS RDS MySQL into AWS MSK Kafka topic using Debezium

1
Comments
5 min read
LLPY-03: Extracción y Procesamiento Inteligente de Datos Legales
Cover image for LLPY-03: Extracción y Procesamiento Inteligente de Datos Legales

LLPY-03: Extracción y Procesamiento Inteligente de Datos Legales

Comments
21 min read
🏗️ The Role of a Data Engineer: Beyond Pipelines

🏗️ The Role of a Data Engineer: Beyond Pipelines

Comments
2 min read
Beyond Flat Tables: Model Hierarchical Data in Supabase with Recursive Queries
Cover image for Beyond Flat Tables: Model Hierarchical Data in Supabase with Recursive Queries

Beyond Flat Tables: Model Hierarchical Data in Supabase with Recursive Queries

2
Comments
7 min read
🌍 The Journey of Data: From Raw Logs to Insights
Cover image for 🌍 The Journey of Data: From Raw Logs to Insights

🌍 The Journey of Data: From Raw Logs to Insights

Comments
2 min read
🎯 The Challenge: Processing TBs of S3 Data Without Breaking the Bank

🎯 The Challenge: Processing TBs of S3 Data Without Breaking the Bank

Comments
5 min read
Why Apache Iceberg is needed?

Why Apache Iceberg is needed?

1
Comments
6 min read
Do Caos à Orquestração: Como o DataOps Está Transformando Dados em Valor
Cover image for Do Caos à Orquestração: Como o DataOps Está Transformando Dados em Valor

Do Caos à Orquestração: Como o DataOps Está Transformando Dados em Valor

Comments
1 min read
International SaaS Nightmare: Timezone Edge Cases (And How to Solve Them Once and For All)
Cover image for International SaaS Nightmare: Timezone Edge Cases (And How to Solve Them Once and For All)

International SaaS Nightmare: Timezone Edge Cases (And How to Solve Them Once and For All)

Comments
2 min read
loading...