DEV Community

Cover image for How to Dockerize a Python Script for Data Processing from a CSV File: A Step-by-Step Guide.
Oluchukwu Edeh
Oluchukwu Edeh

Posted on

How to Dockerize a Python Script for Data Processing from a CSV File: A Step-by-Step Guide.

Dockerizing a Python script allows you to create a consistent and isolated environment for your code, regardless of where it's run.
In this guide, we will cover the basics of Dockerizing a Python script, from creating a simple Python script that processes data from a CSV file using the pandas' library to building and running it within a Docker container.

Containers are essential in modern software development, especially in DevOps, where we aim to streamline deployment and make applications portable and scalable. Docker is the go-to tool for containerization.

Prerequisites:

  • Basic understanding of Docker and Python.
  • Docker installed on your machine (Install Docker).
  • A sample Python script to work with.

1. Create your working environment.

mkdir docker-project-2
cd docker-project-2


2. Setting Up a Simple Python Script.

nano process_data.py

Image description


3. Create a CVS file.

nano data.CVS

Image description


4. Create a Dockerfile.

The Dockerfile is a text document that contains all the commands needed to assemble the Docker image. This file tells Docker how to set up the environment in which our script will run.

Create a file named Dockerfile in the same directory process_data.py
 
nano dockerfile

Image description

Here's a breakdown of each line:

  • FROM: Specifies the base image to use for the container. We're using Python 3.9 (the slim version is a smaller, minimal image).
  • WORKDIR: Sets the working directory in the container to /app.
  • RUN: Installs Python dependencies
  • COPY: Copys the project files into the container
  • CMD: Defines the command that runs when the container startsā€Š-ā€Šin this case, it will run. process_data.py

5. Create a requirements.txt file.

The requirements.txt file specifies the Python dependencies required by the script:

nano requirements.txt

Image description

Explanation:

The above ensures the pandas' library is installed when building the Docker image.


6. Building the Docker Image.

The Dockerfile is ready, let's build our Docker image. In your terminal, navigate to the directory containing the Dockerfile and process_data.py file, and run:

docker build-t python-script

Explanation:

  • -t python-script tags the image with a name (python-script).
  • The . specifies the current directory as the build context.

After the build process is completed, you should see a success message indicating that the image was built.


7. Running the Docker Container.

Once the image is built, we can create a container to run our script. Use the following command:

docker run -v $(pwd)/data:/app/data python-script

Explanation:

  • docker run: Runs a Docker container.
  • -v $(pwd)/data:/app/data: Mounts the data directory from your host machine to /app/data in the container.
  • python-script: The name of the Docker image to run.

You should see the output:

Image description

data. cvs contents printed.


8. Managing Docker Images and Containers.

  • To view the list of images, use:

docker images

  • To see currently running containers:

docker ps

  • To view all containers (including stopped ones):

docker ps -a

  • To remove a container:

docker rm (container's ID)

  • To remove the image and free up space, use:

docker rmi python-script


Conclusion:

You learned to create a Dockerfile, a txt file, and a CVS file, build and run a Docker image. With these skills, you're well on your way to containerizing applications in any environment.

Dockerizing applications is a powerful skill that boosts the scalability and portability of your projectsā€Š-ā€Švaluable traits in any DevOps role.

Heroku

Tired of jumping between terminals, dashboards, and code?

Check out this demo showcasing how tools like Cursor can connect to Heroku through the MCP, letting you trigger actions like deployments, scaling, or provisioning—all without leaving your editor.

Learn More

Top comments (0)

Tiger Data image

🐯 šŸš€ Timescale is now TigerData: Building the Modern PostgreSQL for the Analytical and Agentic Era

We’ve quietly evolved from a time-series database into the modern PostgreSQL for today’s and tomorrow’s computing, built for performance, scale, and the agentic future.

So we’re changing our name: from Timescale to TigerData. Not to change who we are, but to reflect who we’ve become. TigerData is bold, fast, and built to power the next era of software.

Read more

šŸ‘‹ Kindness is contagious

Explore this insightful write-up embraced by the inclusive DEV Community. Tech enthusiasts of all skill levels can contribute insights and expand our shared knowledge.

Spreading a simple "thank you" uplifts creators—let them know your thoughts in the discussion below!

At DEV, collaborative learning fuels growth and forges stronger connections. If this piece resonated with you, a brief note of thanks goes a long way.

Okay