Forem

# dataengineering

Posts

👋 Sign in for the ability to sort posts by relevant, latest, or top.
Context Engineering (Part 1): The Architecture of Recall

Context Engineering (Part 1): The Architecture of Recall

Comments 1
3 min read
Day 9: Spark SQL Deep Dive - Temp Views, Query Execution & Optimization Tips for Data Engineers
Cover image for Day 9: Spark SQL Deep Dive - Temp Views, Query Execution & Optimization Tips for Data Engineers

Day 9: Spark SQL Deep Dive - Temp Views, Query Execution & Optimization Tips for Data Engineers

Comments
2 min read
AWSChallenge - Week 2
Cover image for AWSChallenge - Week 2

AWSChallenge - Week 2

Comments
4 min read
Day 10: Partitioning vs Bucketing - The Spark Optimization Guide Every Data Engineer Needs
Cover image for Day 10: Partitioning vs Bucketing - The Spark Optimization Guide Every Data Engineer Needs

Day 10: Partitioning vs Bucketing - The Spark Optimization Guide Every Data Engineer Needs

Comments
2 min read
Deepening My Roots in the Data Ecosystem - Choosing Depth Over Breadth

Deepening My Roots in the Data Ecosystem - Choosing Depth Over Breadth

Comments
2 min read
Automate Python Manual Extraction: Build End-to-End PDF -> LLM -> SQL Flows with CocoIndex, Ollama, and Postgres
Cover image for Automate Python Manual Extraction: Build End-to-End PDF -> LLM -> SQL Flows with CocoIndex, Ollama, and Postgres

Automate Python Manual Extraction: Build End-to-End PDF -> LLM -> SQL Flows with CocoIndex, Ollama, and Postgres

Comments
3 min read
The Boring Debug Checklist That Fixes Most “RAG Failures”
Cover image for The Boring Debug Checklist That Fixes Most “RAG Failures”

The Boring Debug Checklist That Fixes Most “RAG Failures”

Comments
2 min read
AWS Lambda and AWS Glue Python Shell in the Context of Lightweight ETL

AWS Lambda and AWS Glue Python Shell in the Context of Lightweight ETL

3
Comments
7 min read
SQL: Doing GROUP BY in CsvPath
Cover image for SQL: Doing GROUP BY in CsvPath

SQL: Doing GROUP BY in CsvPath

Comments
5 min read
🔥 Day 3: RDDs - The Foundation of Spark
Cover image for 🔥 Day 3: RDDs - The Foundation of Spark

🔥 Day 3: RDDs - The Foundation of Spark

Comments
2 min read
🔥 Day 4: RDD Internals - Partitions, Shuffles & Repartitioning Demystified
Cover image for 🔥 Day 4: RDD Internals - Partitions, Shuffles & Repartitioning Demystified

🔥 Day 4: RDD Internals - Partitions, Shuffles & Repartitioning Demystified

Comments
2 min read
The Developer's Guide to Normalizing Historical Airline Flight Data for Machine Learning
Cover image for The Developer's Guide to Normalizing Historical Airline Flight Data for Machine Learning

The Developer's Guide to Normalizing Historical Airline Flight Data for Machine Learning

Comments
6 min read
Overview of Real-Time Data Synchronization from MySQL to VeloDB

Overview of Real-Time Data Synchronization from MySQL to VeloDB

5
Comments
5 min read
Stop Writing df.describe(): Automate EDA with D-Tale (The Lazy Engineer's Way)

Stop Writing df.describe(): Automate EDA with D-Tale (The Lazy Engineer's Way)

Comments
3 min read
Build a Local Lead Gen Machine: Scraping Google Maps with n8n (Reliably)

Build a Local Lead Gen Machine: Scraping Google Maps with n8n (Reliably)

Comments
3 min read
👋 Sign in for the ability to sort posts by relevant, latest, or top.