Reproducibility is one of the most frustrating problems in machine learning. A model works one day and fails the next. You rerun the same notebook, using the same data and code, yet still get different results.
This issue slows down experiments, blocks releases, and makes it hard to explain decisions to leadership or regulators. Once you lose track of what changed, you lose the ability to debug, revert, or even trust your pipeline.
In this guide, we'll walk through a practical approach to solving this problem by versioning your models and automating rollbacks. We'll show how this improves traceability, speeds up deployment, and helps teams move faster without losing control.
Why Machine Learning Projects Have a Reproducibility Problem
Reproducibility may seem simple—just run the same machine learning code and get the same result every time. But in reality, it's far more complex.
Machine learning projects are easy to break and hard to trace because every ML project depends on three volatile parts: code, data, and environment. If any of these changes, your results become inconsistent. Let's look at each one.
1. Code
Your model lives in the codebase, but the code changes constantly. One minor tweak to a hyperparameter or a forgotten random seed can affect the output. Deep learning and reinforcement learning algorithms introduce even more unpredictability.
Unless you manually lock everything down, stochastic gradient descent, Monte Carlo sampling, and similar techniques produce different results across runs. Even then, it's easy to miss something. If you're not tracking your code version and experiment setup, you won't know what produced the model in production.
2. Data
Unlike traditional software, ML systems depend deeply on the data they are trained on. But data gets added to your project regularly—it grows, gets cleaned, and sometimes gets mislabeled. Therefore, you need to retain your preprocessing code, otherwise you can't tell what data produced yesterday's results.
3. Environment
When it comes to the environment, it boils down to your packages, dependencies, CUDA configurations, hardware, and even the subtle updates in the ML libraries used. Frameworks like TensorFlow or PyTorch come up with releases, APIs get updated, and performance optimizations introduce inconsistencies.
Differences in hardware (like GPU architectures) can subtly alter your results. To control and know what gave what output, every change in the environment must be logged; otherwise, you are introducing variability you can't trace.
How to Achieve Reproducibility in Your Machine Learning Project
To make your ML projects reproducible, you need a system that can track, package, and roll back everything your model depends on. That includes your training code, datasets, configuration files, and environment setup.
We'll use KitOps to define and package the entire project, and Jozu Hub to store and manage versioned model artifacts. This setup lets us trace every model version, compare iterations, and pull older versions when something breaks.
Here's how the process works:
1. Define Your Project Structure
You start by creating a Kitfile. This is a simple YAML manifest that describes your project: which model to include, what scripts it runs, what data it was trained on, and any configuration details that matter. It's the blueprint for your ModelKit.
2. Package Your Model and Its Context
Once the Kitfile is ready, you use KitOps to package everything into a ModelKit (a versioned, self-contained bundle that includes all your critical artifacts). This makes the project portable, testable, and easy to share across your team or CI pipeline.
3. Push to a Versioned Model Registry
You push the ModelKit to Jozu Hub, where it's stored as an immutable version. Each push is tagged and tracked. You can inspect what's inside, compare it to previous versions, and promote it to staging or production as needed.
4. Roll Back When Needed
If something goes wrong in a later version, you can pull an earlier ModelKit from Jozu and unpack it locally. Since everything is tracked (code, data, config), you return to a working state without patching things manually.
This gives you a repeatable way to move through experiments, monitor progress, and recover when a change breaks your pipeline. No guesswork, no rebuilds from scratch, no digging through Notion pages or Slack threads to remember what worked.
In the next section, we'll show what this looks like by building a simple movie recommendation model. You'll train the first version, create a second one with changes, and then roll back, all using KitOps and Jozu as part of a reproducible ML workflow.
Use Case: Building, Versioning, and Rolling Back a Recommendation Model
Why a recommendation system use case? Recommendation models are frequently updated, especially for subscription service businesses like Netflix. These models must be continually updated to ensure the accuracy of the model's predictions is helpful to users, provides a personalized experience, and retains users.
Real-time A/B testing and safe rollbacks play a key role in providing reliable, seamless user experiences. You can learn how Netflix handles its recommendation systems using these reproducibility measures.
Tools You'll Use
- Jozu ML to version, deploy, and manage machine learning models
- KitOps to simplify MLOps workflows for rapid and efficient deployment
- VS Code to write your code and build your model
Prerequisites
To follow along with this tutorial, you will need:
- KitOps: Learn how to install KitOps
- Jozu ML: Create an account on this SaaS registry platform
- Basic knowledge of Python, pandas, and scikit-learn
- MovieLens datasets: a public repository of movie datasets collected and managed by GroupLens research at the University of Minnesota
About the Data
Our dataset consists of four CSV files. However, we will just use two:
-
movies.csv
has metadata like the title and genre for each movie -
ratings.csv
has the user-provided ratings for each movie by a user
Tutorial Overview
- Create a simple recommendation model using Python
- Version and deploy the model using JozuML and KitOps
- Train a new model (to simulate a model update in the real world), and version it
- Roll back to the first model
Step 1: Building a Recommendation Model
Download the MovieLens ml-latest-small dataset from the MovieLens datasets website.
Save it in the
datasets
folder within your code editor project folder.Create a Python file, build, and save your ML model
user_similarity_model.pkl
.
Note: Not sure what model to use? Our comprehensive article explains how to pick the right model for your project.
import numpy as np
import pandas as pd
import joblib
from sklearn.metrics.pairwise import cosine_similarity
import os
# Load datasets
ratings = pd.read_csv('./datasets/ratings.csv')
movies = pd.read_csv('./datasets/movies.csv')
# Prepare user-item matrix
user_movie_matrix = ratings.pivot_table(
index='userId',
columns='movieId',
values='rating'
).fillna(0)
# Compute user similarity matrix
user_similarity = cosine_similarity(user_movie_matrix)
similarity_df = pd.DataFrame(
user_similarity,
index=user_movie_matrix.index,
columns=user_movie_matrix.index
)
# Save the model
model_dir = './saved_model'
os.makedirs(model_dir, exist_ok=True)
model_path = os.path.join(model_dir, 'user_similarity_model.pkl')
joblib.dump(similarity_df, model_path)
print(f"Model saved to {model_path}")
# Load the saved model
similarity_df = joblib.load(model_path)
# Function to recommend movies
def recommend_movies(user_id, top_n=5):
if user_id not in user_movie_matrix.index:
print(f"User {user_id} not found.")
return []
similar_users = similarity_df[user_id].sort_values(ascending=False)[1:6]
weighted_ratings = user_movie_matrix.loc[similar_users.index].T.dot(similar_users) / similar_users.sum()
user_rated = user_movie_matrix.loc[user_id]
recommendations = weighted_ratings[user_rated == 0].sort_values(ascending=False).head(top_n)
return movies[movies['movieId'].isin(recommendations.index)][['title']]
if __name__ == "__main__":
user_id = 5
print(f"\nTop movie recommendations for user {user_id}:\n")
print(recommend_movies(user_id=user_id, top_n=5))
- Run your Python file in your terminal with the command below. The final saved model will be in the
saved_model
directory of your project folder.
python movie_recommender.py
Step 2: Package and Deploy the Model with KitOps and ModelKit
First, install KitOps.
Verify the KitOps version:
kit version
- Log in to your Jozu Hub registry with your username and password:
kit login jozu.ml
- Create a
Kitfile
within your directory and paste the information below:
manifestVersion: "1.0"
package:
name: movie-recommend
version: 0.0.1
authors: [Benny Ifeanyi]
model:
name: movie-recommendation-model-v1
path: ./saved_model/user_similarity_model.pkl
description: Movie recommendation model using Surprise
code:
- path: ./movie_recommender.py
description: Movie recommendation script
datasets:
- name: ratings-data
path: ./datasets/ratings.csv
description: Ratings dataset
- name: movies-data
path: ./datasets/movies.csv
description: Movies metadata
- Package your artifacts into a ModelKit:
kit pack . -t jozu.ml/bennykillua/movie-recommend:v1.0.0
- Verify your ModelKit by running the command below to check if your kit was successfully created:
kit list
- Finally, push the ModelKit to your Jozu Hub repository. Let's call it v1.0.0. It is important to tag your ModelKit, as this will make the versioning easy to track. Here is a comprehensive article on strategies for tagging ModelKit.
kit push jozu.ml/bennykillua/movie-recommend:v1.0.0
- Head over to Jozu Hub to see your model:
Great! Now that you have packaged and pushed your model to Jozu Hub, you can test its versioning and rollback capabilities to see how it supports reproducibility.
Step 3: Model Versioning and Rollbacks with Jozu
To see how Jozu's model versioning and rollback capabilities work, let's make a change to your Python file, push this new version to the Jozu Hub, and then pull the older version to verify that the rollback worked.
- Rewrite your Python file with the code below. Initially, you used a user-based collaborative filtering model, which gives recommendations based on how similar their ratings were to other users (cosine similarity). For this second file, you will use a matrix factorization model like Singular Value Decomposition (SVD). The SVD model recommends new movies by finding hidden patterns that explain user preferences and movie characteristics based on the features in your data.
import numpy as np
import pandas as pd
import joblib
from sklearn.decomposition import TruncatedSVD
from sklearn.metrics import mean_squared_error
import os
# Load datasets
ratings = pd.read_csv('./datasets/ratings.csv')
movies = pd.read_csv('./datasets/movies.csv')
# Prepare user-item matrix
user_movie_matrix = ratings.pivot_table(
index='userId',
columns='movieId',
values='rating'
).fillna(0)
# Apply Singular Value Decomposition (SVD)
svd = TruncatedSVD(n_components=50, random_state=42)
svd_matrix = svd.fit_transform(user_movie_matrix)
# Reconstruct the original matrix (approximated)
reconstructed_matrix = np.dot(svd_matrix, svd.components_)
# Save the model
model_dir = './saved_model'
os.makedirs(model_dir, exist_ok=True)
model_path = os.path.join(model_dir, 'user_similarity_model.pkl')
joblib.dump(svd, model_path)
print(f"Model saved to {model_path}")
# Load the saved model
svd = joblib.load(model_path)
# Function to recommend movies
def recommend_movies(user_id, top_n=5):
if user_id not in user_movie_matrix.index:
print(f"User {user_id} not found.")
return []
# Predict ratings for the user
user_index = user_movie_matrix.index.get_loc(user_id)
predicted_ratings = reconstructed_matrix[user_index, :]
# Filter out movies the user has already rated
user_rated = user_movie_matrix.loc[user_id]
predicted_ratings[user_rated > 0] = 0
# Get the top N recommendations
top_movies_indices = predicted_ratings.argsort()[-top_n:][::-1]
top_movie_ids = user_movie_matrix.columns[top_movies_indices]
return movies[movies['movieId'].isin(top_movie_ids)][['title']]
if __name__ == "__main__":
user_id = 5
print(f"\nTop movie recommendations for user {user_id}:\n")
print(recommend_movies(user_id=user_id, top_n=5))
- Run your Python file in your terminal:
python movie_recommender.py
- Update your Kitfile:
manifestVersion: "1.0"
package:
name: movie-recommend
version: 0.0.2
authors: ["Benny Ifeanyi"]
model:
name: movie-recommendation-model-v2
path: ./saved_model/user_similarity_model.pkl
description: SVD-based movie recommendation model
datasets:
- description: Ratings dataset
name: ratings-data
path: ./datasets/ratings.csv
- description: Movies metadata
name: movies-data
path: ./datasets/movies.csv
code:
- description: SVD model training and recommendation scripts
path: ./movie_recommender.py
- Package your artifacts into a ModelKit and push the updated version to Jozu Hub. Let's call it v2.0.0:
kit pack . -t jozu.ml/bennykillua/movie-recommend:v2.0.0
kit push jozu.ml/bennykillua/movie-recommend:v2.0.0
- Head over to Jozu Hub to see your model:
- Verify the new push using the command below to list all the tags:
kit list jozu.ml/bennykillua/movie-recommend
Rolling Back to Version 1.0.0
- Pull the previous version:
kit pull jozu.ml/bennykillua/movie-recommend:v1.0.0
- Unpack version 1.0.0 files by extracting your artifacts using the unpack command. This will unpack the pulled ModelKit package into
./movie-recommend-v1
, a new directory within your project folder:
kit unpack jozu.ml/bennykillua/movie-recommend:v1.0.0 -d ./movie-recommend-v1
- Head over to VS Code to take a look at your file:
- Run
tree /f
within Visual Studio to see your project's directory structure:
The output of the directory command will show:
- The root files: the Kitfile, movie_recommender.py (main Python script), and the datasets folder containing movies.csv, ratings.csv, and tags.csv
- The saved_model directory holds the trained recommendation system model file, which we called
user_similarity_model.pkl
- The movie-recommend-v1 folder contains the pulled versioned project (v1.0.0), with its own Kitfile, movie_recommender.py, a subset of the original datasets, and its saved_model directory
Conclusion
With KitOps, you can push and track as you build, pull, and roll back to previous versions of your project, all from your local environment. By embracing these practices, you can keep track of your experiments while ensuring they are reproducible.
To get started, create a Jozu Hub account to push your project. You can also contact our engineering team if you encounter any issues. Remember, reproducibility isn't just a good habit—it's about building resilient and reproducible systems.
Top comments (0)
Some comments may only be visible to logged-in visitors. Sign in to view all comments.