Forem

# dataengineering

Posts

👋 Sign in for the ability to sort posts by relevant, latest, or top.
Software OR Hardware Raid: What's Better In 2024?
Cover image for Software OR Hardware Raid: What's Better In 2024?

Software OR Hardware Raid: What's Better In 2024?

4
Comments
7 min read
Aggregation in GROUP BY vs. Window Functions Using OVER()
Cover image for Aggregation in GROUP BY vs. Window Functions Using OVER()

Aggregation in GROUP BY vs. Window Functions Using OVER()

4
Comments
3 min read
Azure Synapse Analytics Security: Access Control

Azure Synapse Analytics Security: Access Control

3
Comments
7 min read
การนำเข้าข้อมูลจากไฟล์ CSV เข้ามาใน Posstgres : ทักษะเบื้องต้นของ Data Engineer
Cover image for การนำเข้าข้อมูลจากไฟล์ CSV เข้ามาใน Posstgres : ทักษะเบื้องต้นของ Data Engineer

การนำเข้าข้อมูลจากไฟล์ CSV เข้ามาใน Posstgres : ทักษะเบื้องต้นของ Data Engineer

Comments
1 min read
Databases Deconstructed: The Value of Data Lakehouses and Table Formats
Cover image for Databases Deconstructed: The Value of Data Lakehouses and Table Formats

Databases Deconstructed: The Value of Data Lakehouses and Table Formats

4
Comments
8 min read
Understanding RAID Levels: A Comprehensive Guide to RAID 0, 1, 5, 6, 10, and Beyond
Cover image for Understanding RAID Levels: A Comprehensive Guide to RAID 0, 1, 5, 6, 10, and Beyond

Understanding RAID Levels: A Comprehensive Guide to RAID 0, 1, 5, 6, 10, and Beyond

8
Comments
9 min read
BigQuery Schema Generation Made Easier with PyPI’s bigquery-schema-generator
Cover image for BigQuery Schema Generation Made Easier with PyPI’s bigquery-schema-generator

BigQuery Schema Generation Made Easier with PyPI’s bigquery-schema-generator

5
Comments 2
2 min read
Embrace simple tech stacks and code generation in DevOps and data engineering

Embrace simple tech stacks and code generation in DevOps and data engineering

2
Comments
6 min read
Apache Doris for log and time series data analysis in NetEase, why not Elasticsearch and InfluxDB?

Apache Doris for log and time series data analysis in NetEase, why not Elasticsearch and InfluxDB?

1
Comments
9 min read
MapReduce Vs Tez
Cover image for MapReduce Vs Tez

MapReduce Vs Tez

6
Comments
2 min read
Azure Synapse Analytics Security: Data Protection

Azure Synapse Analytics Security: Data Protection

3
Comments
6 min read
Leveraging PySpark.Pandas for Efficient Data Pipelines
Cover image for Leveraging PySpark.Pandas for Efficient Data Pipelines

Leveraging PySpark.Pandas for Efficient Data Pipelines

Comments
3 min read
Why Apache Doris is the Best Open Source Alternative to Rockset

Why Apache Doris is the Best Open Source Alternative to Rockset

3
Comments
3 min read
Apache Spark-Structured Streaming :: Cab Aggregator Use-case
Cover image for Apache Spark-Structured Streaming :: Cab Aggregator Use-case

Apache Spark-Structured Streaming :: Cab Aggregator Use-case

1
Comments
4 min read
Introduction to Apache Hadoop & MapReduce
Cover image for Introduction to Apache Hadoop & MapReduce

Introduction to Apache Hadoop & MapReduce

5
Comments
3 min read
Analytics don't want duplicated data, so get it exactly-once with Flink/Kafka

Analytics don't want duplicated data, so get it exactly-once with Flink/Kafka

Comments
3 min read
Metadata for win — Apache Parquet

Metadata for win — Apache Parquet

Comments
5 min read
Remove unwanted partition data in Azure Synapse (SQL DW)

Remove unwanted partition data in Azure Synapse (SQL DW)

1
Comments
6 min read
Replacing Saas ETL with Python dlt: A painless experience for Yummy.eu

Replacing Saas ETL with Python dlt: A painless experience for Yummy.eu

2
Comments
3 min read
Simplifying SDMX Data Integration with Python

Simplifying SDMX Data Integration with Python

2
Comments
3 min read
Unlocking the Power of Large Language Models (LLMs): Your Ultimate Guide

Unlocking the Power of Large Language Models (LLMs): Your Ultimate Guide

6
Comments
1 min read
Clustering vs Partitioning your Apache Iceberg Tables
Cover image for Clustering vs Partitioning your Apache Iceberg Tables

Clustering vs Partitioning your Apache Iceberg Tables

7
Comments
10 min read
From Messy Data to Super Mario Pipeline: My First Adventure in Data Engineering
Cover image for From Messy Data to Super Mario Pipeline: My First Adventure in Data Engineering

From Messy Data to Super Mario Pipeline: My First Adventure in Data Engineering

1
Comments
12 min read
The Data Professions
Cover image for The Data Professions

The Data Professions

1
Comments
3 min read
Database generated events: LiveSync’s database connector vs CDC
Cover image for Database generated events: LiveSync’s database connector vs CDC

Database generated events: LiveSync’s database connector vs CDC

4
Comments
5 min read
loading...