Forem

# dataengineering

Posts

👋 Sign in for the ability to sort posts by relevant, latest, or top.
ClickHouse Has a Free Column-Oriented Database — Query Billions of Rows in Milliseconds

ClickHouse Has a Free Column-Oriented Database — Query Billions of Rows in Milliseconds

Comments
2 min read
How Linux is Used in Real-World Data Engineering
Cover image for How Linux is Used in Real-World Data Engineering

How Linux is Used in Real-World Data Engineering

Comments
3 min read
100 Spark Interview Questions for Data Engineer
Cover image for 100 Spark Interview Questions for Data Engineer

100 Spark Interview Questions for Data Engineer

1
Comments
11 min read
Flowfile v0.8.0 — Your Flows Can Run Themselves Now

Flowfile v0.8.0 — Your Flows Can Run Themselves Now

Comments
4 min read
# Apache Data Lakehouse Weekly: March 20–27, 2026

# Apache Data Lakehouse Weekly: March 20–27, 2026

Comments
7 min read
Frosty : 150 + AI Open Source Sub- Agents to Automate Snowflake
Cover image for Frosty : 150 + AI Open Source Sub- Agents to Automate Snowflake

Frosty : 150 + AI Open Source Sub- Agents to Automate Snowflake

Comments
2 min read
When Synthetic Data Lies: A Hidden Correlation Problem I Didn’t Expect
Cover image for When Synthetic Data Lies: A Hidden Correlation Problem I Didn’t Expect

When Synthetic Data Lies: A Hidden Correlation Problem I Didn’t Expect

3
Comments
3 min read
Building & Monitoring Data Backends: Tools, Architecture, and Observability

Building & Monitoring Data Backends: Tools, Architecture, and Observability

Comments
4 min read
Issues of Multi-GB Spreadsheets in Data Lakes

Issues of Multi-GB Spreadsheets in Data Lakes

Comments
4 min read
Asset-Based Data Orchestration: Lessons from Building a Multi-State Social Data Platform
Cover image for Asset-Based Data Orchestration: Lessons from Building a Multi-State Social Data Platform

Asset-Based Data Orchestration: Lessons from Building a Multi-State Social Data Platform

1
Comments
6 min read
The Backyard Quarry, Part 2: Designing a Schema for Physical Objects

The Backyard Quarry, Part 2: Designing a Schema for Physical Objects

2
Comments
5 min read
The Vinted Arbitrage War: Building a Scraper That Doesn't Get IP-Banned
Cover image for The Vinted Arbitrage War: Building a Scraper That Doesn't Get IP-Banned

The Vinted Arbitrage War: Building a Scraper That Doesn't Get IP-Banned

Comments
9 min read
Is AWS Glue Data Catalog Sufficient as a Data Catalog? Organizing Its Design, Limitations, and Complementary Strategies

Is AWS Glue Data Catalog Sufficient as a Data Catalog? Organizing Its Design, Limitations, and Complementary Strategies

6
Comments
10 min read
I built pq - the jq of Parquet. Here's why data engineers need a better CLI

I built pq - the jq of Parquet. Here's why data engineers need a better CLI

1
Comments
1 min read
How to Build a Scalable Serverless Social Media Ingestion & Analytics Pipeline on AWS
Cover image for How to Build a Scalable Serverless Social Media Ingestion & Analytics Pipeline on AWS

How to Build a Scalable Serverless Social Media Ingestion & Analytics Pipeline on AWS

1
Comments
4 min read
👋 Sign in for the ability to sort posts by relevant, latest, or top.