Skip to content
Navigation menu
Search
Powered by Algolia
Search
Log in
Create account
Forem
Close
#
dataengineering
Follow
Hide
Posts
Left menu
đź‘‹
Sign in
for the ability to sort posts by
relevant
,
latest
, or
top
.
Right menu
Refactoring a Mature Airflow Project: A Practical Guide to Scaling from Solo Development to Team Collaboration
Ajit Kumar
Ajit Kumar
Ajit Kumar
Follow
Dec 8 '25
Refactoring a Mature Airflow Project: A Practical Guide to Scaling from Solo Development to Team Collaboration
#
airflow
#
python
#
datascience
#
dataengineering
Comments
Add Comment
4 min read
Apache Dev List Digest: Iceberg, Polaris, Arrow & Parquet (Nov 24-Dec 8, 2025)
Alex Merced
Alex Merced
Alex Merced
Follow
Dec 8 '25
Apache Dev List Digest: Iceberg, Polaris, Arrow & Parquet (Nov 24-Dec 8, 2025)
#
dataengineering
#
opensource
#
architecture
#
community
Comments
Add Comment
6 min read
2025 Year in Review: Apache Iceberg, Polaris, Parquet, and Arrow
Alex Merced
Alex Merced
Alex Merced
Follow
Dec 29 '25
2025 Year in Review: Apache Iceberg, Polaris, Parquet, and Arrow
#
architecture
#
bigdata
#
opensource
#
dataengineering
Comments
Add Comment
6 min read
Context Engineering (Part 1): The Architecture of Recall
Imran Siddique
Imran Siddique
Imran Siddique
Follow
Jan 12
Context Engineering (Part 1): The Architecture of Recall
#
dataengineering
#
rags
#
informationretrieval
#
vectorsearch
Comments
1
 comment
3 min read
Day 9: Spark SQL Deep Dive - Temp Views, Query Execution & Optimization Tips for Data Engineers
Sandeep
Sandeep
Sandeep
Follow
Dec 9 '25
Day 9: Spark SQL Deep Dive - Temp Views, Query Execution & Optimization Tips for Data Engineers
#
python
#
dataengineering
#
spark
#
bigdata
Comments
Add Comment
2 min read
AWSChallenge - Week 2
Andres
Andres
Andres
Follow
Dec 5 '25
AWSChallenge - Week 2
#
ai
#
dataengineering
#
aws
#
programming
Comments
Add Comment
4 min read
Day 10: Partitioning vs Bucketing - The Spark Optimization Guide Every Data Engineer Needs
Sandeep
Sandeep
Sandeep
Follow
Dec 9 '25
Day 10: Partitioning vs Bucketing - The Spark Optimization Guide Every Data Engineer Needs
#
python
#
dataengineering
#
spark
#
bigdata
Comments
Add Comment
2 min read
Deepening My Roots in the Data Ecosystem - Choosing Depth Over Breadth
Steven Hur
Steven Hur
Steven Hur
Follow
Dec 4 '25
Deepening My Roots in the Data Ecosystem - Choosing Depth Over Breadth
#
career
#
dataengineering
#
devjournal
Comments
Add Comment
2 min read
Automate Python Manual Extraction: Build End-to-End PDF -> LLM -> SQL Flows with CocoIndex, Ollama, and Postgres
Linghua Jin
Linghua Jin
Linghua Jin
Follow
Dec 4 '25
Automate Python Manual Extraction: Build End-to-End PDF -> LLM -> SQL Flows with CocoIndex, Ollama, and Postgres
#
python
#
llm
#
dataengineering
#
etl
Comments
Add Comment
3 min read
The Boring Debug Checklist That Fixes Most “RAG Failures”
Anindya Obi
Anindya Obi
Anindya Obi
Follow
Dec 5 '25
The Boring Debug Checklist That Fixes Most “RAG Failures”
#
rag
#
dataengineering
#
llm
#
architecture
Comments
Add Comment
2 min read
Function Calling and Tool Use: Turning LLMs into Action-Taking Agents
Vinicius Fagundes
Vinicius Fagundes
Vinicius Fagundes
Follow
Dec 4 '25
Function Calling and Tool Use: Turning LLMs into Action-Taking Agents
#
ai
#
llm
#
mcp
#
dataengineering
Comments
Add Comment
18 min read
dremioframe & iceberg: Pythonic interfaces for Dremio and Apache Iceberg
Alex Merced
Alex Merced
Alex Merced
Follow
Dec 5 '25
dremioframe & iceberg: Pythonic interfaces for Dremio and Apache Iceberg
#
python
#
tooling
#
database
#
dataengineering
Comments
Add Comment
8 min read
AWS Lambda and AWS Glue Python Shell in the Context of Lightweight ETL
Aki
Aki
Aki
Follow
for
AWS Community Builders
Jan 8
AWS Lambda and AWS Glue Python Shell in the Context of Lightweight ETL
#
aws
#
dataengineering
#
etl
3
 reactions
Comments
Add Comment
7 min read
SQL: Doing GROUP BY in CsvPath
David Kershaw
David Kershaw
David Kershaw
Follow
Dec 4 '25
SQL: Doing GROUP BY in CsvPath
#
sql
#
dataengineering
#
csv
#
csvpath
Comments
Add Comment
5 min read
🔥 Day 3: RDDs - The Foundation of Spark
Sandeep
Sandeep
Sandeep
Follow
Dec 3 '25
🔥 Day 3: RDDs - The Foundation of Spark
#
distributedsystems
#
interview
#
dataengineering
#
tutorial
Comments
Add Comment
2 min read
đź‘‹
Sign in
for the ability to sort posts by
relevant
,
latest
, or
top
.
We're a blogging-forward open source social network where we learn from one another
Log in
Create account