Forem

# bigdata

Posts

👋 Sign in for the ability to sort posts by relevant, latest, or top.
Data Quality at Scale: Why Your Pipeline Needs More Than Green Checkmarks

Data Quality at Scale: Why Your Pipeline Needs More Than Green Checkmarks

Comments
8 min read
From Raw to Refined: Data Pipeline Architecture at Scale

From Raw to Refined: Data Pipeline Architecture at Scale

Comments
12 min read
Building Real-Time Lakehouse with S3 Tables, AWS Glue, and Apache Doris

Building Real-Time Lakehouse with S3 Tables, AWS Glue, and Apache Doris

Comments
3 min read
Starting My Dev.to Journey: Learning, Building & Sharing

Starting My Dev.to Journey: Learning, Building & Sharing

Comments
1 min read
10x Query Performance Improvement: The Design and Implementation of the New Unique Key

10x Query Performance Improvement: The Design and Implementation of the New Unique Key

Comments
30 min read
How Does Apache SeaTunnel Convert CDC Streams to Append-Only Mode?

How Does Apache SeaTunnel Convert CDC Streams to Append-Only Mode?

Comments
4 min read
6 Essential Data Formats in Cloud Analytics: A Complete Guide with Examples

6 Essential Data Formats in Cloud Analytics: A Complete Guide with Examples

Comments
5 min read
Why Parquet Is Everywhere - And What Makes It Actually Fast?
Cover image for Why Parquet Is Everywhere - And What Makes It Actually Fast?

Why Parquet Is Everywhere - And What Makes It Actually Fast?

2
Comments
3 min read
Final Project Report 2| Apache SeaTunnel Adds Metalake Support

Final Project Report 2| Apache SeaTunnel Adds Metalake Support

Comments
4 min read
Final Project Report 1: Schema Evolution Support on Apache SeaTunnel Flink Engine

Final Project Report 1: Schema Evolution Support on Apache SeaTunnel Flink Engine

Comments
4 min read
Enabling Continuous Deployment with Amazon Elastic Container Service and Infrastructure as Code
Cover image for Enabling Continuous Deployment with Amazon Elastic Container Service and Infrastructure as Code

Enabling Continuous Deployment with Amazon Elastic Container Service and Infrastructure as Code

Comments
6 min read
From DataWareHouses to BigData Systems: What and Why - Questions that nobody asks, but you should!
Cover image for From DataWareHouses to BigData Systems: What and Why - Questions that nobody asks, but you should!

From DataWareHouses to BigData Systems: What and Why - Questions that nobody asks, but you should!

Comments
6 min read
Migration Case: From Azkaban to DolphinScheduler

Migration Case: From Azkaban to DolphinScheduler

Comments
4 min read
1 billion JSON records, 1-second query response: Apache Doris vs. ClickHouse, Elasticsearch, and PostgreSQL

1 billion JSON records, 1-second query response: Apache Doris vs. ClickHouse, Elasticsearch, and PostgreSQL

5
Comments
7 min read
The data lakehouse evolution

The data lakehouse evolution

Comments
11 min read
How to build real-time user-facing analytics with Kafka + Flink + Doris

How to build real-time user-facing analytics with Kafka + Flink + Doris

4
Comments
9 min read
Apache DolphinScheduler 3.3.2 Released! Major Updates in Performance and Stability

Apache DolphinScheduler 3.3.2 Released! Major Updates in Performance and Stability

Comments
3 min read
📊🔍 OpenSearch Dashboards: Optimizing Massive Data Queries (Big Data) with Asynchronous Search
Cover image for 📊🔍 OpenSearch Dashboards: Optimizing Massive Data Queries (Big Data) with Asynchronous Search

📊🔍 OpenSearch Dashboards: Optimizing Massive Data Queries (Big Data) with Asynchronous Search

3
Comments
2 min read
🐝 Why Hive Exists - And Why Its Complexity Is Actually Necessary
Cover image for 🐝 Why Hive Exists - And Why Its Complexity Is Actually Necessary

🐝 Why Hive Exists - And Why Its Complexity Is Actually Necessary

2
Comments
3 min read
Building a Universal Lakehouse Catalog: Beyond Iceberg Tables
Cover image for Building a Universal Lakehouse Catalog: Beyond Iceberg Tables

Building a Universal Lakehouse Catalog: Beyond Iceberg Tables

Comments
10 min read
(1) Emerging Data Lakehouse Handbook (2025): Concepts and Design of Data Warehouse Layering

(1) Emerging Data Lakehouse Handbook (2025): Concepts and Design of Data Warehouse Layering

Comments
5 min read
Tutorial: Intro to Apache Iceberg with Apache Polaris and Apache Spark
Cover image for Tutorial: Intro to Apache Iceberg with Apache Polaris and Apache Spark

Tutorial: Intro to Apache Iceberg with Apache Polaris and Apache Spark

Comments
20 min read
Apache Spark সহজভাবে জানি

Apache Spark সহজভাবে জানি

1
Comments
1 min read
Fueling the Future: How Big Data and AI are Unlocking Green Hydrogen's Potential

Fueling the Future: How Big Data and AI are Unlocking Green Hydrogen's Potential

5
Comments
6 min read
Code Green: How Big Data and AI are Engineering a Sustainable Planet

Code Green: How Big Data and AI are Engineering a Sustainable Planet

Comments
8 min read
loading...