Forem

# dataengineering

Posts

đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.
When an AI Suggests DataFrame.append: Missing Pandas Deprecations in Generated Code

When an AI Suggests DataFrame.append: Missing Pandas Deprecations in Generated Code

Comments 1
3 min read
The Great Table Format Debate: A Deep Dive into Apache Iceberg, Delta Lake, and Apache Hudi
Cover image for The Great Table Format Debate: A Deep Dive into Apache Iceberg, Delta Lake, and Apache Hudi

The Great Table Format Debate: A Deep Dive into Apache Iceberg, Delta Lake, and Apache Hudi

1
Comments
18 min read
Amazon Kinesis vs Amazon MSK: The Complete Guide for Stream Processing on AWS
Cover image for Amazon Kinesis vs Amazon MSK: The Complete Guide for Stream Processing on AWS

Amazon Kinesis vs Amazon MSK: The Complete Guide for Stream Processing on AWS

Comments
29 min read
Day 8: Accelerating Spark Joins - Broadcast, Shuffle Optimization & Skew Handling
Cover image for Day 8: Accelerating Spark Joins - Broadcast, Shuffle Optimization & Skew Handling

Day 8: Accelerating Spark Joins - Broadcast, Shuffle Optimization & Skew Handling

Comments
2 min read
Mastering Serverless Data Pipelines: AWS Step Functions Best Practices for 2026
Cover image for Mastering Serverless Data Pipelines: AWS Step Functions Best Practices for 2026

Mastering Serverless Data Pipelines: AWS Step Functions Best Practices for 2026

Comments
5 min read
A Stranger In a New Town: CsvPath metadata fields
Cover image for A Stranger In a New Town: CsvPath metadata fields

A Stranger In a New Town: CsvPath metadata fields

Comments
6 min read
Interesting links - November 2025

Interesting links - November 2025

Comments
19 min read
đź’€ RIP Copy-Paste: Google NotebookLM Just Killed Manual Data Entry

đź’€ RIP Copy-Paste: Google NotebookLM Just Killed Manual Data Entry

Comments
3 min read
Unified Data Fabric: Serverless Spark on ROSA Integrating with AWS Glue Catalog

Unified Data Fabric: Serverless Spark on ROSA Integrating with AWS Glue Catalog

8
Comments 1
39 min read
dupl

dupl

Comments
1 min read
Apache Dev List Digest: Iceberg, Polaris, Arrow & Parquet (Nov 18–24, 2025)
Cover image for Apache Dev List Digest: Iceberg, Polaris, Arrow & Parquet (Nov 18–24, 2025)

Apache Dev List Digest: Iceberg, Polaris, Arrow & Parquet (Nov 18–24, 2025)

Comments
5 min read
How to Sync Data from an Oracle Table to Elasticsearch using Kafka Connect

How to Sync Data from an Oracle Table to Elasticsearch using Kafka Connect

1
Comments 1
5 min read
From Raw to Refined: Data Pipeline Architecture at Scale

From Raw to Refined: Data Pipeline Architecture at Scale

Comments
12 min read
Agent Cost Optimization: A Data Engineer's Guide

Agent Cost Optimization: A Data Engineer's Guide

Comments
13 min read
INTRODUCTION TO DBT(Data Build Tool)

INTRODUCTION TO DBT(Data Build Tool)

1
Comments
2 min read
đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.