Forem

# dataengineering

Posts

👋 Sign in for the ability to sort posts by relevant, latest, or top.
Building A Universal Data Agent in 15 Minutes with LlamaIndex and Apache Gravitino (incubating)

Building A Universal Data Agent in 15 Minutes with LlamaIndex and Apache Gravitino (incubating)

1
Comments
9 min read
Introduction to REST Catalogs for Apache Iceberg

Introduction to REST Catalogs for Apache Iceberg

1
Comments
8 min read
Capture Slowly Changing Attributes in SQL - SCD Type 2
Cover image for Capture Slowly Changing Attributes in SQL - SCD Type 2

Capture Slowly Changing Attributes in SQL - SCD Type 2

Comments
8 min read
ACID, Isolation Levels, and MVCC: Architecture and Execution in Relational Databases
Cover image for ACID, Isolation Levels, and MVCC: Architecture and Execution in Relational Databases

ACID, Isolation Levels, and MVCC: Architecture and Execution in Relational Databases

2
Comments
10 min read
15 Core Data Engineering Concepts Every Developer Should Know
Cover image for 15 Core Data Engineering Concepts Every Developer Should Know

15 Core Data Engineering Concepts Every Developer Should Know

Comments
6 min read
AI-Powered Data Engineering Pipelines: Smarter, Faster, Scalable
Cover image for AI-Powered Data Engineering Pipelines: Smarter, Faster, Scalable

AI-Powered Data Engineering Pipelines: Smarter, Faster, Scalable

Comments
2 min read
Testando com Monkey Patching

Testando com Monkey Patching

Comments
4 min read
Automated Google News Search

Automated Google News Search

3
Comments
1 min read
Aggregation Strategies for Scalable Data Insights: A Technical Perspective
Cover image for Aggregation Strategies for Scalable Data Insights: A Technical Perspective

Aggregation Strategies for Scalable Data Insights: A Technical Perspective

1
Comments
5 min read
🚀Git + Databricks: Why Both Are Essential for Modern Data Engineering
Cover image for 🚀Git + Databricks: Why Both Are Essential for Modern Data Engineering

🚀Git + Databricks: Why Both Are Essential for Modern Data Engineering

Comments
2 min read
🚀 Synthetic Data: The Next Frontier for Data Engineers

🚀 Synthetic Data: The Next Frontier for Data Engineers

Comments
2 min read
Pytest: Como Testar Módulos Python com Configuração no Nível Superior

Pytest: Como Testar Módulos Python com Configuração no Nível Superior

Comments
5 min read
Databend Monthly Report: July 2025
Cover image for Databend Monthly Report: July 2025

Databend Monthly Report: July 2025

Comments
3 min read
How We Use OpenAI and Gemini Batch APIs to Qualify Thousands of Sales Leads

How We Use OpenAI and Gemini Batch APIs to Qualify Thousands of Sales Leads

Comments
7 min read
Scaling Databases with ClickHouse Sharding (Hands-On Simulation)
Cover image for Scaling Databases with ClickHouse Sharding (Hands-On Simulation)

Scaling Databases with ClickHouse Sharding (Hands-On Simulation)

2
Comments
2 min read
Building AI-Powered Data Pipelines: Where Data Engineering Meets Machine Learning
Cover image for Building AI-Powered Data Pipelines: Where Data Engineering Meets Machine Learning

Building AI-Powered Data Pipelines: Where Data Engineering Meets Machine Learning

Comments
2 min read
Mastering MLflow: Managing the Full ML Lifecycle
Cover image for Mastering MLflow: Managing the Full ML Lifecycle

Mastering MLflow: Managing the Full ML Lifecycle

2
Comments
9 min read
wget vs. curl: when to use which?

wget vs. curl: when to use which?

Comments
2 min read
Where We Encounter Delimited Data and How We Handle It
Cover image for Where We Encounter Delimited Data and How We Handle It

Where We Encounter Delimited Data and How We Handle It

1
Comments
6 min read
Why Apache Airflow is the Cornerstone of Modern Data Engineering

Why Apache Airflow is the Cornerstone of Modern Data Engineering

Comments
5 min read
Zero-Downtime Database Migration: The Complete Engineering Guide
Cover image for Zero-Downtime Database Migration: The Complete Engineering Guide

Zero-Downtime Database Migration: The Complete Engineering Guide

11
Comments 4
53 min read
🚀 How PySpark Helps Handle Terabytes of Data Easily
Cover image for 🚀 How PySpark Helps Handle Terabytes of Data Easily

🚀 How PySpark Helps Handle Terabytes of Data Easily

Comments
2 min read
Apache Arrow dev list digest (Aug 25–29 2025)

Apache Arrow dev list digest (Aug 25–29 2025)

Comments
4 min read
Anomaly Detection in Financial Transactions: Algorithms and Applications
Cover image for Anomaly Detection in Financial Transactions: Algorithms and Applications

Anomaly Detection in Financial Transactions: Algorithms and Applications

2
Comments
10 min read
Apache Kafka Deep Dive: Core Concepts, Data Engineering Applications, and Real-World Production Practices
Cover image for Apache Kafka Deep Dive: Core Concepts, Data Engineering Applications, and Real-World Production Practices

Apache Kafka Deep Dive: Core Concepts, Data Engineering Applications, and Real-World Production Practices

Comments 1
6 min read
loading...