DEV Community

Cover image for Building and Deploying My First Python ETL Package to PyPI
Denzel Kanyeki
Denzel Kanyeki

Posted on • Edited on

Building and Deploying My First Python ETL Package to PyPI

Introduction

Creating your own Python package is one of the most satisfying steps as a developer/engineer. For me, it all started with a goal to simplify ETL (Extract, Transform, Load) operations using tools I already used for ETL pipeline development, pandas, requests, and SQLAlchemy. In this post, I'll walk you through how I created eazyetl, a lightweight and modular ETL package for data projects, and published it to both TestPyPI and the official PyPI.

Like many data analysts and engineers, I often repeat similar steps or write many lines of codes for reading from APIs, cleaning data, and loading it into databases or files. Rather than rewriting the same code, I built a simple, reusable library with:

  • Extract: Get data from CSV, JSON, APIs, or databases

  • Transform: Clean, rename, convert, and format your data

  • Load: Export to CSV, JSON, Excel, or PostgreSQL

All of it wrapped in a Pythonic, static-method class structure.

Sample Usage.

Installation

pip install eazyetl
Enter fullscreen mode Exit fullscreen mode

Extracting data from various sources.

from eazyetl import Extract, Transform, Load

df = Extract.read_csv('data/data.csv')
df_api = Extract.read_api('https://www.fantasypremierleague.com/api')
df_db = Extract.read_db(database='employees', user='postgres', password='postgressuperuser', host='localhost', port='5432')
Enter fullscreen mode Exit fullscreen mode

Transforming and cleaning of data

df = Transform.drop_na(df, columns=["name", "price"])
df = Transform.to_datetime(df, "release_date")
df = Transform.rename(df, columns={"old_name": "new_name"})
Enter fullscreen mode Exit fullscreen mode

Loading data into various data sources.

Load.load_csv(df, "cleaned_data.csv", overwrite=True)
Load.load_to_excel(df, 'weather_data.xlsx', overwrite=False)
Load.load_to_db(df, name="salaries", url="postgresql://user:pass@localhost:5432/mydb")
Enter fullscreen mode Exit fullscreen mode

Testing and uploading package to PyPI.

a. Building package and Testing

I created a pyproject.toml file which contained project details, setup and build settings for my package and installed build, setuptools and wheel which will be vital for package build and testing.

pip install build setuptools wheel
Enter fullscreen mode Exit fullscreen mode

After installing the required packages, I built my package for local testing.

python -m build
Enter fullscreen mode Exit fullscreen mode

This process created dist/ and eazyetl.egg.info folders. The dist/ folder contains a eazyetl-0.1.5-py3-none-any.whl file which we will use for local package testing. To install the package locally on your machine, run:

pip install "/Users/user/projects/eazyetl-0.1.5-py3-none-any.whl"
# pip install <filepath to .whl file>
Enter fullscreen mode Exit fullscreen mode

Note: Make sure that you are installing packages inside your project's dedicated virtual environment to avoid conflicts

The file is now installed into another project and ready for use!

Sample usage:

Extracting

Loading

b. Uploading to PyPI.

Install the twine package which will help us to upload our package to PyPI.

pip install twine
Enter fullscreen mode Exit fullscreen mode

Then, create an account on PyPI, enable two-factor authentication, create an API token, name the project your package name and copy the API token.

In your terminal, in the same root diretory as dist/ and package_name.egg.info run

twine upload dist/* --verbose
Enter fullscreen mode Exit fullscreen mode

--verbose is used to log messages into the terminal for efficient error logging and debugging using twine

You will be prompted for your API token as a password, and the package will be uploaded to PyPI for global installation and can be installed using pip install.

Conclusion.

Building my first Python package easyetl helped me in understanding the fundamentals of OOP in Python for example encapsulation, inheritance, package development and deployment to PyPI.

If you would want to check out the code on GitHub, you can find it here. Any suggestions, contributions and collaborations will be highly appreciated.

Check out the package on PyPI here.

Top comments (1)

Collapse
 
emmanuel_kiriinya_416fc40 profile image
Emmanuel Kiriinya

This is nice