Forem

# dataengineering

Posts

👋 Sign in for the ability to sort posts by relevant, latest, or top.
When Small Parquet Files Become a Big Problem (and How I Ended Up Writing a Compactor in PyArrow)

When Small Parquet Files Become a Big Problem (and How I Ended Up Writing a Compactor in PyArrow)

17
Comments 2
5 min read
Peer Review 2: Data Warehousing, Transformation, and Reproducibility in tfl-data-visualization (Part 2)
Cover image for Peer Review 2: Data Warehousing, Transformation, and Reproducibility in tfl-data-visualization (Part 2)

Peer Review 2: Data Warehousing, Transformation, and Reproducibility in tfl-data-visualization (Part 2)

Comments
3 min read
Peer Review 2: TfL Station Footfall Data Analysis Pipeline (Part 1)
Cover image for Peer Review 2: TfL Station Footfall Data Analysis Pipeline (Part 1)

Peer Review 2: TfL Station Footfall Data Analysis Pipeline (Part 1)

Comments
3 min read
🚀 Setting Up Presto : A Step by Step Installation Guide to Run SQL Queries.
Cover image for 🚀 Setting Up Presto : A Step by Step Installation Guide to Run SQL Queries.

🚀 Setting Up Presto : A Step by Step Installation Guide to Run SQL Queries.

Comments
3 min read
Big Data Processing - Case Study 4 (Hadoop) 02:36

Big Data Processing - Case Study 4 (Hadoop)

1
Comments
1 min read
[Snowflake's New Feature]Snowflake Programmatic Access Tokens: Easy Authentication for BI Tools like Tableau & Power BI

[Snowflake's New Feature]Snowflake Programmatic Access Tokens: Easy Authentication for BI Tools like Tableau & Power BI

2
Comments
4 min read
Apache Iceberg: A Comprehensive Guide

Apache Iceberg: A Comprehensive Guide

1
Comments
4 min read
Building and Deploying My First Python ETL Package to PyPI
Cover image for Building and Deploying My First Python ETL Package to PyPI

Building and Deploying My First Python ETL Package to PyPI

1
Comments 1
3 min read
Peer Review 1: Poland's Real Estate Market Dashboards and Insights with Streamlit (Part 2)
Cover image for Peer Review 1: Poland's Real Estate Market Dashboards and Insights with Streamlit (Part 2)

Peer Review 1: Poland's Real Estate Market Dashboards and Insights with Streamlit (Part 2)

Comments
3 min read
Peer Review 1: Analyzing Poland's Real Estate Market (Part 1)
Cover image for Peer Review 1: Analyzing Poland's Real Estate Market (Part 1)

Peer Review 1: Analyzing Poland's Real Estate Market (Part 1)

Comments
3 min read
InsightFlow Part 9: Workflow Orchestration with Kestra
Cover image for InsightFlow Part 9: Workflow Orchestration with Kestra

InsightFlow Part 9: Workflow Orchestration with Kestra

Comments
4 min read
InsightFlow Part 8: Setting Up AWS Athena for Data Analysis in InsightFlow
Cover image for InsightFlow Part 8: Setting Up AWS Athena for Data Analysis in InsightFlow

InsightFlow Part 8: Setting Up AWS Athena for Data Analysis in InsightFlow

Comments
3 min read
InsightFlow Part 7: Data Quality Implementation & Best Practices for InsightFlow
Cover image for InsightFlow Part 7: Data Quality Implementation & Best Practices for InsightFlow

InsightFlow Part 7: Data Quality Implementation & Best Practices for InsightFlow

Comments
3 min read
InsightFlow Part 6: Implementing ETL Processes with AWS Glue for InsightFlow
Cover image for InsightFlow Part 6: Implementing ETL Processes with AWS Glue for InsightFlow

InsightFlow Part 6: Implementing ETL Processes with AWS Glue for InsightFlow

Comments
3 min read
InsightFlow Part 5: Designing the Data Model & Schema with dbt for InsightFlow
Cover image for InsightFlow Part 5: Designing the Data Model & Schema with dbt for InsightFlow

InsightFlow Part 5: Designing the Data Model & Schema with dbt for InsightFlow

Comments
3 min read
InsightFlow Part 4: Data Exploration & Understanding the Datasets
Cover image for InsightFlow Part 4: Data Exploration & Understanding the Datasets

InsightFlow Part 4: Data Exploration & Understanding the Datasets

Comments
3 min read
InsightFlow Part 3: Building the Data Ingestion Layer with AWS Batch
Cover image for InsightFlow Part 3: Building the Data Ingestion Layer with AWS Batch

InsightFlow Part 3: Building the Data Ingestion Layer with AWS Batch

1
Comments
4 min read
InsightFlow Part 2: Setting Up the Cloud Infrastructure with Terraform
Cover image for InsightFlow Part 2: Setting Up the Cloud Infrastructure with Terraform

InsightFlow Part 2: Setting Up the Cloud Infrastructure with Terraform

Comments
3 min read
Weather Monitoring System Using IoT
Cover image for Weather Monitoring System Using IoT

Weather Monitoring System Using IoT

1
Comments
2 min read
The Complete Guide to Setting Up Postgresql on Windows 11 and WSL2
Cover image for The Complete Guide to Setting Up Postgresql on Windows 11 and WSL2

The Complete Guide to Setting Up Postgresql on Windows 11 and WSL2

14
Comments 4
6 min read
Big Data Processing - Case Study 4 (Spark) 02:36

Big Data Processing - Case Study 4 (Spark)

Comments
1 min read
Big Data Processing - Case Study 3 (Databricks) 01:53

Big Data Processing - Case Study 3 (Databricks)

Comments 2
1 min read
How I Automated Crypto Price Tracking with Apache Airflow & CoinGecko

How I Automated Crypto Price Tracking with Apache Airflow & CoinGecko

4
Comments 3
2 min read
Personal Picks: Data Product News (March 19, 2025)

Personal Picks: Data Product News (March 19, 2025)

Comments
5 min read
Big Data Processing - Case Study 3 (Spark) 02:35

Big Data Processing - Case Study 3 (Spark)

Comments
1 min read
loading...