Forem

# dataengineering

Posts

đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.
The Columnar Approach: A Deep Dive into Efficient Data Storage for Analytics 🚀

The Columnar Approach: A Deep Dive into Efficient Data Storage for Analytics 🚀

Comments
4 min read
Optimizing Data Pipelines for Fiix Dating App

Optimizing Data Pipelines for Fiix Dating App

2
Comments
3 min read
What kind of Data Team should I join?

What kind of Data Team should I join?

Comments
1 min read
Tech Interviews: The Hustle Behind Tech Interview Prep

Tech Interviews: The Hustle Behind Tech Interview Prep

Comments
1 min read
Why Feature Scaling Should Be Done After Splitting Your Dataset into Training and Test Sets

Why Feature Scaling Should Be Done After Splitting Your Dataset into Training and Test Sets

Comments
3 min read
Exploring OSM changesets via DuckDB

Exploring OSM changesets via DuckDB

Comments
9 min read
I built a data pipeline tool in Go

I built a data pipeline tool in Go

Comments
5 min read
Data Warehousing Architectures

Data Warehousing Architectures

Comments
5 min read
Cultivating a Data-Centric Culture at Work

Cultivating a Data-Centric Culture at Work

Comments
2 min read
Can AI finally generate best practice code? I think so.

Can AI finally generate best practice code? I think so.

2
Comments
6 min read
How to Migrate Massive Data in Record Time—Without a Single Minute of Downtime 🕑

How to Migrate Massive Data in Record Time—Without a Single Minute of Downtime 🕑

Comments
4 min read
Why Data Quality Dimensions Are the Secret Ingredient for Data-Driven Success

Why Data Quality Dimensions Are the Secret Ingredient for Data-Driven Success

1
Comments
3 min read
Easily Integrate Databend Test Environment with Testcontainers

Easily Integrate Databend Test Environment with Testcontainers

4
Comments
4 min read
Choosing the right, real-time, Postgres CDC platform

Choosing the right, real-time, Postgres CDC platform

Comments
8 min read
Seaborn Cheat Sheet

Seaborn Cheat Sheet

Comments
2 min read
Should I add Data Science or Analytics to my skills?

Should I add Data Science or Analytics to my skills?

Comments
1 min read
Innowise is open for internships for Data Engineers and Data Analytics

Innowise is open for internships for Data Engineers and Data Analytics

Comments
1 min read
Query 1B Rows in PostgreSQL >25x Faster with Squirrels!

Query 1B Rows in PostgreSQL >25x Faster with Squirrels!

Comments 8
5 min read
10 Future Apache Iceberg Developments to Look forward to in 2025

10 Future Apache Iceberg Developments to Look forward to in 2025

Comments
13 min read
Creating Stripe Test Data in Python

Creating Stripe Test Data in Python

2
Comments
4 min read
đź“Š AI Dashboard Builder: Create Insightful Dashboards just Droppping your Data

đź“Š AI Dashboard Builder: Create Insightful Dashboards just Droppping your Data

Comments
2 min read
Setting up memory for Flink - Configuration

Setting up memory for Flink - Configuration

Comments
3 min read
Are AWS Certifications Worth It in 2025?

Are AWS Certifications Worth It in 2025?

7
Comments
2 min read
Mastering Dynamic Allocation in Apache Spark: A Practical Guide with Real-World Insights

Mastering Dynamic Allocation in Apache Spark: A Practical Guide with Real-World Insights

Comments
3 min read
Talend vs. Apache Kafka: Which Data Tool Drives Better Business Insights?

Talend vs. Apache Kafka: Which Data Tool Drives Better Business Insights?

Comments
6 min read
LightningChart Python 1.0

LightningChart Python 1.0

Comments
1 min read
Introduction to Data lakes: The future of big data storage

Introduction to Data lakes: The future of big data storage

5
Comments
2 min read
Explorer l'API de 360Learning : de l'agilité de Power Query à la robustesse de la Modern Data Stack

Explorer l'API de 360Learning : de l'agilité de Power Query à la robustesse de la Modern Data Stack

6
Comments
12 min read
Data Pipeline Filters 101: Choosing Between Static and Dynamic Approaches

Data Pipeline Filters 101: Choosing Between Static and Dynamic Approaches

Comments
1 min read
The Apache Iceberg™ Small File Problem

The Apache Iceberg™ Small File Problem

5
Comments
3 min read
Ensuring Data Quality: Best Practices and Automation

Ensuring Data Quality: Best Practices and Automation

Comments
6 min read
Data Science Simplified: Tips for Aspiring Data Scientists in 2025

Data Science Simplified: Tips for Aspiring Data Scientists in 2025

1
Comments
4 min read
2025 Guide to Architecting an Iceberg Lakehouse

2025 Guide to Architecting an Iceberg Lakehouse

9
Comments
14 min read
Dremio, Apache Iceberg and their role in AI-Ready Data

Dremio, Apache Iceberg and their role in AI-Ready Data

Comments
7 min read
Data Engineer as a Real-Time Algo Trader – Turning Pipelines into Profit (or at Least Trying)!

Data Engineer as a Real-Time Algo Trader – Turning Pipelines into Profit (or at Least Trying)!

1
Comments
13 min read
Leveraging Python's Pattern Matching and Comprehensions for Data Analytics

Leveraging Python's Pattern Matching and Comprehensions for Data Analytics

Comments
12 min read
One Off to One Data Platform: Design with Intent [Part 2]

One Off to One Data Platform: Design with Intent [Part 2]

2
Comments
5 min read
Case Study: Creating an ETL Data Pipeline using AWS Services - Real-World Problem

Case Study: Creating an ETL Data Pipeline using AWS Services - Real-World Problem

Comments
2 min read
Understanding Star Schema vs. Snowflake Schema

Understanding Star Schema vs. Snowflake Schema

Comments
1 min read
ChatGPT Launches Pro: What's it Mean for Data Professionals?

ChatGPT Launches Pro: What's it Mean for Data Professionals?

2
Comments
4 min read
Introduction to Apache Kafka

Introduction to Apache Kafka

3
Comments 1
3 min read
Mastering Workflow Automation with Apache Airflow for Data Engineering

Mastering Workflow Automation with Apache Airflow for Data Engineering

Comments
6 min read
Mastering Twitter Data Collection: A Comprehensive Guide to Efficient Scraping Solutions

Mastering Twitter Data Collection: A Comprehensive Guide to Efficient Scraping Solutions

Comments
3 min read
Optimizing Large-Scale Data Processing in Python: A Guide to Parallelizing CSV Operations

Optimizing Large-Scale Data Processing in Python: A Guide to Parallelizing CSV Operations

1
Comments
3 min read
Jupyter Notebooks in Docker

Jupyter Notebooks in Docker

4
Comments 1
3 min read
🚀 Beyond Data Ingestion: Advanced Strategies for Optimizing API Data Pipelines

🚀 Beyond Data Ingestion: Advanced Strategies for Optimizing API Data Pipelines

4
Comments 1
3 min read
SQL "SELECT INTO" vs "INSERT INTO SELECT" statements.

SQL "SELECT INTO" vs "INSERT INTO SELECT" statements.

Comments
1 min read
ACID Properties in Databases: What Happens Without Them?

ACID Properties in Databases: What Happens Without Them?

5
Comments
6 min read
🕵️ OSINT: link company acronyms to Standard Occupation Classification w. Open Source LLMs

🕵️ OSINT: link company acronyms to Standard Occupation Classification w. Open Source LLMs

1
Comments 8
6 min read
Data Architecture Best Practices

Data Architecture Best Practices

1
Comments
6 min read
My Journey into Data AI and Machine Learning

My Journey into Data AI and Machine Learning

Comments
1 min read
🚀 Unlock the Power of ORC File Format 📊

🚀 Unlock the Power of ORC File Format 📊

5
Comments
1 min read
The Ultimate Data Engineering Roadmap: From Beginner to Pro

The Ultimate Data Engineering Roadmap: From Beginner to Pro

6
Comments 1
8 min read
Designing robust and scalable relational databases: A series of best practices.

Designing robust and scalable relational databases: A series of best practices.

10
Comments 5
17 min read
From Data to Decisions: How Machine Learning Works in 2025

From Data to Decisions: How Machine Learning Works in 2025

3
Comments
3 min read
Why Data Security is Broken and How to Fix it?

Why Data Security is Broken and How to Fix it?

1
Comments
5 min read
From ETL and ELT to Reverse ETL

From ETL and ELT to Reverse ETL

Comments
4 min read
OLAP (Online Analytical Processing)

OLAP (Online Analytical Processing)

5
Comments
3 min read
The Future of Agentic Systems Podcast 1:42:26

The Future of Agentic Systems Podcast

6
Comments 1
1 min read
What is Data Engineering?

What is Data Engineering?

Comments
1 min read
loading...