Jesse Williams for Jozu

Posted on May 29 • Originally published at jozu.com

Build Bulletproof ML Pipelines with Automated Model Versioning

#programming #ai #tutorial #devops

Reproducibility is one of the most frustrating problems in machine learning. A model works one day and fails the next. You rerun the same notebook, using the same data and code, yet still get different results.

This issue slows down experiments, blocks releases, and makes it hard to explain decisions to leadership or regulators. Once you lose track of what changed, you lose the ability to debug, revert, or even trust your pipeline.

In this guide, we'll walk through a practical approach to solving this problem by versioning your models and automating rollbacks. We'll show how this improves traceability, speeds up deployment, and helps teams move faster without losing control.

Why Machine Learning Projects Have a Reproducibility Problem

Reproducibility may seem simple—just run the same machine learning code and get the same result every time. But in reality, it's far more complex.

Machine learning projects are easy to break and hard to trace because every ML project depends on three volatile parts: code, data, and environment. If any of these changes, your results become inconsistent. Let's look at each one.

1. Code

Your model lives in the codebase, but the code changes constantly. One minor tweak to a hyperparameter or a forgotten random seed can affect the output. Deep learning and reinforcement learning algorithms introduce even more unpredictability.

Unless you manually lock everything down, stochastic gradient descent, Monte Carlo sampling, and similar techniques produce different results across runs. Even then, it's easy to miss something. If you're not tracking your code version and experiment setup, you won't know what produced the model in production.

2. Data

Unlike traditional software, ML systems depend deeply on the data they are trained on. But data gets added to your project regularly—it grows, gets cleaned, and sometimes gets mislabeled. Therefore, you need to retain your preprocessing code, otherwise you can't tell what data produced yesterday's results.

3. Environment

When it comes to the environment, it boils down to your packages, dependencies, CUDA configurations, hardware, and even the subtle updates in the ML libraries used. Frameworks like TensorFlow or PyTorch come up with releases, APIs get updated, and performance optimizations introduce inconsistencies.

Differences in hardware (like GPU architectures) can subtly alter your results. To control and know what gave what output, every change in the environment must be logged; otherwise, you are introducing variability you can't trace.

How to Achieve Reproducibility in Your Machine Learning Project

To make your ML projects reproducible, you need a system that can track, package, and roll back everything your model depends on. That includes your training code, datasets, configuration files, and environment setup.

We'll use KitOps to define and package the entire project, and Jozu Hub to store and manage versioned model artifacts. This setup lets us trace every model version, compare iterations, and pull older versions when something breaks.

Here's how the process works:

1. Define Your Project Structure

You start by creating a Kitfile. This is a simple YAML manifest that describes your project: which model to include, what scripts it runs, what data it was trained on, and any configuration details that matter. It's the blueprint for your ModelKit.

2. Package Your Model and Its Context

Once the Kitfile is ready, you use KitOps to package everything into a ModelKit (a versioned, self-contained bundle that includes all your critical artifacts). This makes the project portable, testable, and easy to share across your team or CI pipeline.

3. Push to a Versioned Model Registry

You push the ModelKit to Jozu Hub, where it's stored as an immutable version. Each push is tagged and tracked. You can inspect what's inside, compare it to previous versions, and promote it to staging or production as needed.

4. Roll Back When Needed

If something goes wrong in a later version, you can pull an earlier ModelKit from Jozu and unpack it locally. Since everything is tracked (code, data, config), you return to a working state without patching things manually.

This gives you a repeatable way to move through experiments, monitor progress, and recover when a change breaks your pipeline. No guesswork, no rebuilds from scratch, no digging through Notion pages or Slack threads to remember what worked.

In the next section, we'll show what this looks like by building a simple movie recommendation model. You'll train the first version, create a second one with changes, and then roll back, all using KitOps and Jozu as part of a reproducible ML workflow.

Use Case: Building, Versioning, and Rolling Back a Recommendation Model

Why a recommendation system use case? Recommendation models are frequently updated, especially for subscription service businesses like Netflix. These models must be continually updated to ensure the accuracy of the model's predictions is helpful to users, provides a personalized experience, and retains users.

Real-time A/B testing and safe rollbacks play a key role in providing reliable, seamless user experiences. You can learn how Netflix handles its recommendation systems using these reproducibility measures.

Tools You'll Use

Jozu ML to version, deploy, and manage machine learning models
KitOps to simplify MLOps workflows for rapid and efficient deployment
VS Code to write your code and build your model

Prerequisites

To follow along with this tutorial, you will need:

KitOps: Learn how to install KitOps
Jozu ML: Create an account on this SaaS registry platform
Basic knowledge of Python, pandas, and scikit-learn
MovieLens datasets: a public repository of movie datasets collected and managed by GroupLens research at the University of Minnesota

About the Data

Our dataset consists of four CSV files. However, we will just use two:

movies.csv has metadata like the title and genre for each movie
ratings.csv has the user-provided ratings for each movie by a user

Tutorial Overview

Create a simple recommendation model using Python
Version and deploy the model using JozuML and KitOps
Train a new model (to simulate a model update in the real world), and version it
Roll back to the first model

Step 1: Building a Recommendation Model

Download the MovieLens ml-latest-small dataset from the MovieLens datasets website.
Save it in the datasets folder within your code editor project folder.
Create a Python file, build, and save your ML model user_similarity_model.pkl.

Note: Not sure what model to use? Our comprehensive article explains how to pick the right model for your project.

import numpy as np
import pandas as pd
import joblib
from sklearn.metrics.pairwise import cosine_similarity
import os

# Load datasets
ratings = pd.read_csv('./datasets/ratings.csv')    
movies = pd.read_csv('./datasets/movies.csv')  

# Prepare user-item matrix
user_movie_matrix = ratings.pivot_table(
    index='userId', 
    columns='movieId', 
    values='rating'
).fillna(0)

# Compute user similarity matrix
user_similarity = cosine_similarity(user_movie_matrix)
similarity_df = pd.DataFrame(
    user_similarity, 
    index=user_movie_matrix.index, 
    columns=user_movie_matrix.index
)

# Save the model
model_dir = './saved_model'
os.makedirs(model_dir, exist_ok=True)
model_path = os.path.join(model_dir, 'user_similarity_model.pkl')
joblib.dump(similarity_df, model_path)
print(f"Model saved to {model_path}")

# Load the saved model
similarity_df = joblib.load(model_path)

# Function to recommend movies
def recommend_movies(user_id, top_n=5):
    if user_id not in user_movie_matrix.index:
        print(f"User {user_id} not found.")
        return []

    similar_users = similarity_df[user_id].sort_values(ascending=False)[1:6]
    weighted_ratings = user_movie_matrix.loc[similar_users.index].T.dot(similar_users) / similar_users.sum()
    user_rated = user_movie_matrix.loc[user_id]
    recommendations = weighted_ratings[user_rated == 0].sort_values(ascending=False).head(top_n)
    return movies[movies['movieId'].isin(recommendations.index)][['title']]

if __name__ == "__main__":
    user_id = 5
    print(f"\nTop movie recommendations for user {user_id}:\n")
    print(recommend_movies(user_id=user_id, top_n=5))

Run your Python file in your terminal with the command below. The final saved model will be in the saved_model directory of your project folder.

python movie_recommender.py

Step 2: Package and Deploy the Model with KitOps and ModelKit

First, install KitOps.
Verify the KitOps version:

kit version

kit login jozu.ml

Create a Kitfile within your directory and paste the information below:

manifestVersion: "1.0"
package:
  name: movie-recommend
  version: 0.0.1
  authors: [Benny Ifeanyi]
model:
  name: movie-recommendation-model-v1
  path: ./saved_model/user_similarity_model.pkl
  description: Movie recommendation model using Surprise
code:
  - path: ./movie_recommender.py
    description: Movie recommendation script
datasets:
  - name: ratings-data
    path: ./datasets/ratings.csv
    description: Ratings dataset
  - name: movies-data
    path: ./datasets/movies.csv
    description: Movies metadata

Package your artifacts into a ModelKit:

kit pack . -t jozu.ml/bennykillua/movie-recommend:v1.0.0

Verify your ModelKit by running the command below to check if your kit was successfully created:

kit list

Finally, push the ModelKit to your Jozu Hub repository. Let's call it v1.0.0. It is important to tag your ModelKit, as this will make the versioning easy to track. Here is a comprehensive article on strategies for tagging ModelKit.

kit push jozu.ml/bennykillua/movie-recommend:v1.0.0

Head over to Jozu Hub to see your model:

Great! Now that you have packaged and pushed your model to Jozu Hub, you can test its versioning and rollback capabilities to see how it supports reproducibility.

Step 3: Model Versioning and Rollbacks with Jozu

To see how Jozu's model versioning and rollback capabilities work, let's make a change to your Python file, push this new version to the Jozu Hub, and then pull the older version to verify that the rollback worked.

Rewrite your Python file with the code below. Initially, you used a user-based collaborative filtering model, which gives recommendations based on how similar their ratings were to other users (cosine similarity). For this second file, you will use a matrix factorization model like Singular Value Decomposition (SVD). The SVD model recommends new movies by finding hidden patterns that explain user preferences and movie characteristics based on the features in your data.

import numpy as np
import pandas as pd
import joblib
from sklearn.decomposition import TruncatedSVD
from sklearn.metrics import mean_squared_error
import os

# Load datasets
ratings = pd.read_csv('./datasets/ratings.csv')    
movies = pd.read_csv('./datasets/movies.csv')  

# Prepare user-item matrix
user_movie_matrix = ratings.pivot_table(
    index='userId', 
    columns='movieId', 
    values='rating'
).fillna(0)

# Apply Singular Value Decomposition (SVD)
svd = TruncatedSVD(n_components=50, random_state=42)
svd_matrix = svd.fit_transform(user_movie_matrix)

# Reconstruct the original matrix (approximated)
reconstructed_matrix = np.dot(svd_matrix, svd.components_)

# Save the model
model_dir = './saved_model'
os.makedirs(model_dir, exist_ok=True)
model_path = os.path.join(model_dir, 'user_similarity_model.pkl')
joblib.dump(svd, model_path)
print(f"Model saved to {model_path}")

# Load the saved model
svd = joblib.load(model_path)

# Function to recommend movies
def recommend_movies(user_id, top_n=5):
    if user_id not in user_movie_matrix.index:
        print(f"User {user_id} not found.")
        return []

    # Predict ratings for the user
    user_index = user_movie_matrix.index.get_loc(user_id)
    predicted_ratings = reconstructed_matrix[user_index, :]

    # Filter out movies the user has already rated
    user_rated = user_movie_matrix.loc[user_id]
    predicted_ratings[user_rated > 0] = 0  

    # Get the top N recommendations
    top_movies_indices = predicted_ratings.argsort()[-top_n:][::-1]
    top_movie_ids = user_movie_matrix.columns[top_movies_indices]

    return movies[movies['movieId'].isin(top_movie_ids)][['title']]

if __name__ == "__main__":
    user_id = 5
    print(f"\nTop movie recommendations for user {user_id}:\n")
    print(recommend_movies(user_id=user_id, top_n=5))

Run your Python file in your terminal:

python movie_recommender.py

Update your Kitfile:

manifestVersion: "1.0"
package:
  name: movie-recommend
  version: 0.0.2
  authors: ["Benny Ifeanyi"]
model:
  name: movie-recommendation-model-v2
  path: ./saved_model/user_similarity_model.pkl
  description: SVD-based movie recommendation model
datasets:
  - description: Ratings dataset
    name: ratings-data
    path: ./datasets/ratings.csv
  - description: Movies metadata
    name: movies-data
    path: ./datasets/movies.csv
code:
  - description: SVD model training and recommendation scripts
    path: ./movie_recommender.py

Package your artifacts into a ModelKit and push the updated version to Jozu Hub. Let's call it v2.0.0:

kit pack . -t jozu.ml/bennykillua/movie-recommend:v2.0.0
kit push jozu.ml/bennykillua/movie-recommend:v2.0.0

Head over to Jozu Hub to see your model:

Verify the new push using the command below to list all the tags:

kit list jozu.ml/bennykillua/movie-recommend

Rolling Back to Version 1.0.0

Pull the previous version:

kit pull jozu.ml/bennykillua/movie-recommend:v1.0.0

Unpack version 1.0.0 files by extracting your artifacts using the unpack command. This will unpack the pulled ModelKit package into ./movie-recommend-v1, a new directory within your project folder:

kit unpack jozu.ml/bennykillua/movie-recommend:v1.0.0 -d ./movie-recommend-v1

Head over to VS Code to take a look at your file:

Run tree /f within Visual Studio to see your project's directory structure:

The output of the directory command will show:

The root files: the Kitfile, movie_recommender.py (main Python script), and the datasets folder containing movies.csv, ratings.csv, and tags.csv
The saved_model directory holds the trained recommendation system model file, which we called user_similarity_model.pkl
The movie-recommend-v1 folder contains the pulled versioned project (v1.0.0), with its own Kitfile, movie_recommender.py, a subset of the original datasets, and its saved_model directory

Conclusion

With KitOps, you can push and track as you build, pull, and roll back to previous versions of your project, all from your local environment. By embracing these practices, you can keep track of your experiments while ensuring they are reproducible.

To get started, create a Jozu Hub account to push your project. You can also contact our engineering team if you encounter any issues. Remember, reproducibility isn't just a good habit—it's about building resilient and reproducible systems.

Gen AI apps are built with MongoDB Atlas

MongoDB Atlas is the developer-friendly database for building, scaling, and running gen AI & LLM apps—no separate vector DB needed. Enjoy native vector search, 115+ regions, and flexible document modeling. Build AI faster, all in one place.

Start Free

Top comments (0)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.