Forem

# bigdata

Posts

👋 Sign in for the ability to sort posts by relevant, latest, or top.
The two versions of Parquet

The two versions of Parquet

2
Comments
5 min read
How to Load Datasets Efficiently in Pandas: A Complete Guide
Cover image for How to Load Datasets Efficiently in Pandas: A Complete Guide

How to Load Datasets Efficiently in Pandas: A Complete Guide

8
Comments 2
4 min read
Vector search using Alibaba Cloud inference API and semantic text
Cover image for Vector search using Alibaba Cloud inference API and semantic text

Vector search using Alibaba Cloud inference API and semantic text

Comments
10 min read
Reliability in Data-Intensive Applications
Cover image for Reliability in Data-Intensive Applications

Reliability in Data-Intensive Applications

3
Comments 1
3 min read
Using Apache Parquet to Optimize Data Handling in a Real-Time Ad Exchange Platform

Using Apache Parquet to Optimize Data Handling in a Real-Time Ad Exchange Platform

2
Comments
3 min read
Mastering SQL for Data Engineering: Advanced Queries, Optimization, and Data Modeling Best Practices

Mastering SQL for Data Engineering: Advanced Queries, Optimization, and Data Modeling Best Practices

Comments
4 min read
MapReduce Simplified: Understand Distributed Processing with the Same Logic as SQL
Cover image for MapReduce Simplified: Understand Distributed Processing with the Same Logic as SQL

MapReduce Simplified: Understand Distributed Processing with the Same Logic as SQL

1
Comments
4 min read
How to Calculate the Return on Investment for Data Analytics
Cover image for How to Calculate the Return on Investment for Data Analytics

How to Calculate the Return on Investment for Data Analytics

1
Comments
5 min read
5 Game-Changing Habits to Master Your Data Science Journey
Cover image for 5 Game-Changing Habits to Master Your Data Science Journey

5 Game-Changing Habits to Master Your Data Science Journey

6
Comments
4 min read
Object Storage as Primary Storage: The MinIO Story
Cover image for Object Storage as Primary Storage: The MinIO Story

Object Storage as Primary Storage: The MinIO Story

2
Comments
7 min read
Rethinking distributed systems: Composability, scalability

Rethinking distributed systems: Composability, scalability

Comments
5 min read
Run PySpark Local Python Windows Notebook

Run PySpark Local Python Windows Notebook

1
Comments
3 min read
Compression algorithms in Parquet Java

Compression algorithms in Parquet Java

3
Comments 2
7 min read
Top 10 Web Scraping Tools in 2025 (Free & Paid Options)
Cover image for Top 10 Web Scraping Tools in 2025 (Free & Paid Options)

Top 10 Web Scraping Tools in 2025 (Free & Paid Options)

4
Comments 4
5 min read
When to use Apache Xtable or Delta Lake Uniform for Data Lakehouse Interoperability
Cover image for When to use Apache Xtable or Delta Lake Uniform for Data Lakehouse Interoperability

When to use Apache Xtable or Delta Lake Uniform for Data Lakehouse Interoperability

Comments
5 min read
Goodbye Kafka: Build a Low-Cost User Analysis System
Cover image for Goodbye Kafka: Build a Low-Cost User Analysis System

Goodbye Kafka: Build a Low-Cost User Analysis System

Comments
5 min read
The Columnar Approach: A Deep Dive into Efficient Data Storage for Analytics 🚀

The Columnar Approach: A Deep Dive into Efficient Data Storage for Analytics 🚀

1
Comments
4 min read
Introduction to Hadoop:)
Cover image for Introduction to Hadoop:)

Introduction to Hadoop:)

6
Comments
10 min read
Big Data Trends That Will Impact Your Business In 2025
Cover image for Big Data Trends That Will Impact Your Business In 2025

Big Data Trends That Will Impact Your Business In 2025

5
Comments
6 min read
The Heart of DolphinScheduler: In-Depth Analysis of the Quartz Scheduling Framework

The Heart of DolphinScheduler: In-Depth Analysis of the Quartz Scheduling Framework

8
Comments
3 min read
SQL Filtering and Sorting with Real-life Examples

SQL Filtering and Sorting with Real-life Examples

1
Comments
4 min read
Query 1B Rows in PostgreSQL >25x Faster with Squirrels!
Cover image for Query 1B Rows in PostgreSQL >25x Faster with Squirrels!

Query 1B Rows in PostgreSQL >25x Faster with Squirrels!

1
Comments 8
5 min read
Big Data

Big Data

Comments
1 min read
Introduction to Data lakes: The future of big data storage

Introduction to Data lakes: The future of big data storage

5
Comments
2 min read
Construyendo una aplicación con Change Data Capture (CDC) utilizando Debezium, Kafka y NiFi

Construyendo una aplicación con Change Data Capture (CDC) utilizando Debezium, Kafka y NiFi

1
Comments
3 min read
loading...