Forem

# dataengineering

Posts

👋 Sign in for the ability to sort posts by relevant, latest, or top.
Deepening My Roots in the Data Ecosystem - Choosing Depth Over Breadth

Deepening My Roots in the Data Ecosystem - Choosing Depth Over Breadth

Comments
2 min read
Automate Python Manual Extraction: Build End-to-End PDF -> LLM -> SQL Flows with CocoIndex, Ollama, and Postgres
Cover image for Automate Python Manual Extraction: Build End-to-End PDF -> LLM -> SQL Flows with CocoIndex, Ollama, and Postgres

Automate Python Manual Extraction: Build End-to-End PDF -> LLM -> SQL Flows with CocoIndex, Ollama, and Postgres

Comments
3 min read
Function Calling and Tool Use: Turning LLMs into Action-Taking Agents

Function Calling and Tool Use: Turning LLMs into Action-Taking Agents

Comments
18 min read
How to Build Reliable Data Pipelines for Analytics

How to Build Reliable Data Pipelines for Analytics

Comments
1 min read
Lightweight big data processing technology

Lightweight big data processing technology

5
Comments
9 min read
SQL: Doing GROUP BY in CsvPath
Cover image for SQL: Doing GROUP BY in CsvPath

SQL: Doing GROUP BY in CsvPath

Comments
5 min read
🔥 Day 3: RDDs - The Foundation of Spark
Cover image for 🔥 Day 3: RDDs - The Foundation of Spark

🔥 Day 3: RDDs - The Foundation of Spark

Comments
2 min read
🔥 Day 4: RDD Internals - Partitions, Shuffles & Repartitioning Demystified
Cover image for 🔥 Day 4: RDD Internals - Partitions, Shuffles & Repartitioning Demystified

🔥 Day 4: RDD Internals - Partitions, Shuffles & Repartitioning Demystified

Comments
2 min read
The 16GB RAM Hell (And Why You Don’t Need a Cluster to Escape It)
Cover image for The 16GB RAM Hell (And Why You Don’t Need a Cluster to Escape It)

The 16GB RAM Hell (And Why You Don’t Need a Cluster to Escape It)

Comments
20 min read
The Developer's Guide to Normalizing Historical Airline Flight Data for Machine Learning
Cover image for The Developer's Guide to Normalizing Historical Airline Flight Data for Machine Learning

The Developer's Guide to Normalizing Historical Airline Flight Data for Machine Learning

Comments
6 min read
Overview of Real-Time Data Synchronization from MySQL to VeloDB

Overview of Real-Time Data Synchronization from MySQL to VeloDB

5
Comments
5 min read
Stop Writing df.describe(): Automate EDA with D-Tale (The Lazy Engineer's Way)

Stop Writing df.describe(): Automate EDA with D-Tale (The Lazy Engineer's Way)

Comments
3 min read
Data Cataloguing in AWS
Cover image for Data Cataloguing in AWS

Data Cataloguing in AWS

Comments
5 min read
CHW Monthly Activity Aggregation: Turning Visit Logs into Insight

CHW Monthly Activity Aggregation: Turning Visit Logs into Insight

Comments
5 min read
🔥 Day 2: Understanding Spark Architecture - How Spark Executes Your Code Internally
Cover image for 🔥 Day 2: Understanding Spark Architecture - How Spark Executes Your Code Internally

🔥 Day 2: Understanding Spark Architecture - How Spark Executes Your Code Internally

Comments
2 min read
HackerRank 'The Pads' MySQL

HackerRank 'The Pads' MySQL

Comments
3 min read
Building a Production-Ready Data Pipeline on AWS: A Hands-On Guide for Data Engineers
Cover image for Building a Production-Ready Data Pipeline on AWS: A Hands-On Guide for Data Engineers

Building a Production-Ready Data Pipeline on AWS: A Hands-On Guide for Data Engineers

1
Comments
3 min read
Medallion Architecture On AWS
Cover image for Medallion Architecture On AWS

Medallion Architecture On AWS

Comments
4 min read
Comparing Great Expectations and CsvPath Framework
Cover image for Comparing Great Expectations and CsvPath Framework

Comparing Great Expectations and CsvPath Framework

Comments
8 min read
Financial Transaction Data Reconciler PayPal

Financial Transaction Data Reconciler PayPal

Comments
5 min read
Introducing dremioframe - A Pythonic DataFrame Interface for Dremio
Cover image for Introducing dremioframe - A Pythonic DataFrame Interface for Dremio

Introducing dremioframe - A Pythonic DataFrame Interface for Dremio

Comments
9 min read
Stifel Modern Data Platform

Stifel Modern Data Platform

Comments
4 min read
Core Microsoft Fabric Concepts
Cover image for Core Microsoft Fabric Concepts

Core Microsoft Fabric Concepts

1
Comments
3 min read
Implementing a CDC pipeline with Debezium
Cover image for Implementing a CDC pipeline with Debezium

Implementing a CDC pipeline with Debezium

Comments
8 min read
LogInSight: A Lightweight CloudWatch Log Analytics Tool for Faster Debugging and Real-Time Insights
Cover image for LogInSight: A Lightweight CloudWatch Log Analytics Tool for Faster Debugging and Real-Time Insights

LogInSight: A Lightweight CloudWatch Log Analytics Tool for Faster Debugging and Real-Time Insights

2
Comments
3 min read
loading...