Forem

Xin Xu profile picture

Xin Xu

404 bio not found

Joined Joined on 
Project: Building "Mini-C4" — A Production-Grade LLM Pre-training Pipeline 🏗️

Project: Building "Mini-C4" — A Production-Grade LLM Pre-training Pipeline 🏗️

Comments
3 min read
Recaptioning: Upgrading Your Image-Text Data for Better Model Alignment 🚀

Recaptioning: Upgrading Your Image-Text Data for Better Model Alignment 🚀

1
Comments
3 min read
Image-Text Pairs: The Fuel for Multi-modal Large Language Models 🖼️✍️

Image-Text Pairs: The Fuel for Multi-modal Large Language Models 🖼️✍️

Comments
3 min read
Tokenization & Serialization: The Unsung Heroes of LLM Development 🤖

Tokenization & Serialization: The Unsung Heroes of LLM Development 🤖

Comments
3 min read
Why 80% of Data Engineering is Cleaning (and How to Do It Right)

Why 80% of Data Engineering is Cleaning (and How to Do It Right)

Comments
3 min read
High-Performance Data Processing: A Practical Guide from the Data Engineering Book

High-Performance Data Processing: A Practical Guide from the Data Engineering Book

1
Comments
3 min read
From Kimball to Lakehouse: The Evolution of Data Storage (with Python Demo)

From Kimball to Lakehouse: The Evolution of Data Storage (with Python Demo)

1
Comments
3 min read
How to Build Scalable Data Pipelines: Lessons from the Data Engineering Book

How to Build Scalable Data Pipelines: Lessons from the Data Engineering Book

2
Comments
3 min read
The Modern Data Stack: A Guide from the Open-Source Data Engineering Book

The Modern Data Stack: A Guide from the Open-Source Data Engineering Book

Comments
3 min read
Data Engineering for LLMs: A Comprehensive Open-Source Guide 🚀
Cover image for Data Engineering for LLMs: A Comprehensive Open-Source Guide 🚀

Data Engineering for LLMs: A Comprehensive Open-Source Guide 🚀

1
Comments
2 min read
loading...