Forem: Nicolas Jaccard

How to Build a Machine Learning Demo in 2022

Nicolas Jaccard — Sun, 16 Jan 2022 19:35:48 +0000

Learn why you should build demos for your Machine Learning models in 2022, and how to do it in a way that fits your needs, skills, and audience.

✨✨✨
If you like this content, please consider a follow and a ❤️ as I am trying to grow my audience on dev.to! Note that this post and more are available on my personal website (nicjac.dev). Thanks! 🤗
✨✨✨

Why Demos Are Now Essential

Interactive demos of machine learning models are getting increasingly popular. After all, just like a picture paints a thousand words, nothing beats letting others interact directly with your model to generate interest. If you are interested in keeping up with recent trends or are looking for inspiration for your own demos, I highly recommend following the @ak92501 twitter account.

One example among many is a demo for a model transforming a portrait photograph into an illustration in the style used by the Netflix show Arcane. The ease of use of the demo and ability to quickly test the model with different input photographs was part of the reason why this project became so popular so quickly.

Interactive demo for ArcaneGAN (using [Gradio](#gradio)). Screenshot by author, demo located at https://huggingface.co/spaces/akhaliq/ArcaneGAN"

There are many reasons why you might want to build an interactive demo:

Getting a model tested by colleagues during development
Prototype to seek investment in a new idea
Disseminate research, potentially as a companion to a research paper
Build a portfolio of work

Whatever your reason is, this article will provide some pointers and recommendations to make the most of this opportunity.

How To Build An Interactive Demo in 2022

There are a variety of ways to build an interactive demo for your Machine Learning model in 2022. Which one you pick will depend on:

Your target audience
Your software engineering skills
Your monetary budget
Your time budget

This article will cover three types of approaches: public notebook sharing, full-stack, and app libraries.

TLDR - Which Approach Should I Use?

The rest of the article will cover in details what you need to know about these three approaches. If you want a quick answer, the following table should help!

	Google Colab	Full-Stack	App Libraries
Flexibility	Low	High	Moderate to High
Target Audience	Technical	Any	Any
Interactivity	Low by Default	High	High
Non-ML dev work required	None	Significant	Minimal
Deployment / Sharing	Easy	Difficult	Easy
Costs	Free	Low to High	Free
Time commitment	Small	High	Small to Moderate
Productization	No	Yes	Case by Case

Rule of thumb:

Sharing with a technical audience where code visibility is useful and interactivity not critical? Use Google Colab (or equivalent)
High likelihood for the demo to become a fully fledged product? Going the Full-Stack approach might save you time in the long run
If none of the above, go with App Libraries and Hugging Face Spaces hosting

Public Notebook Sharing (Google Colab)

Jupyter notebooks (and iPython before it) played a big part in cementing Python as the leading programming language for machine learning. While not without their faults, by enabling interactive data exploration and iterative development, notebooks quickly became an essential tool for machine learning enthusiasts. However, setting up a Jupyter environment can be challenging and potentially costly if hosting is required for remote access.

Google Colab was a significant disruptor in this area -- it democratized machine learning by offering a fully managed notebook experience without any setup or maintenance required, and provided free access to otherwise costly compute resources.

Colab made it trivial to share a notebook and have others interact with it, even if it required GPU acceleration. For example, the fastai documentation or even the recent fastai book are made available as Colab notebooks, allowing one to run and modify code as they go through the material.

Screenshot of the quick start notebook for fastai in Google Colab. Screenshot by author, notebook located at https://colab.research.google.com/github/fastai/fastai/blob/master/nbs/quick_start.ipynb

Most notebooks can be made to work on Colab without much work required, and some widgets can be used to let users input their own data. If the notebook is versioned on a public Github repo, all it takes to share it is share a link with the following format:

https://colab.research.google.com/github/$ACCOUNT/$REPO/blob/$BRANCH/$PATH_TO_NOTEBOOK

where $ACCOUNT is the Github account, $REPO the repository, $BRANCH the branch, and $PATH_TO_NOTEBOOK the full path the the ipynb file.

Colab notebooks are not the best option if you are interested in sharing your work with the general public. However, they are very powerful tools to easily convey ideas to technical peers. So much so that pretty much all new developments in machine learning tend to come with a companion Colab notebook as standard. As an example, I explored a brand new approach to patch-based self-supervision using the Colab provided with my own input data (in this case, photographs of my cat Jasmine). It allowed me to get a much better understanding of this new research, at no cost to me.

Screenshot of a Google Colab for a new self-supervision method featuring my cat Jasmine, notebook located at https://colab.research.google.com/github/facebookresearch/mae/blob/main/demo/mae_visualize.ipynb

Overall, interactivity is pretty limited. A possible way around this is the use of a library such as Gradio, which allows the creation of basic user interfaces directly in a notebook. Gradio will be covered in detail in the app libraries section.

For completeness' sake, I should mention that Amazon Web Services announced SageMaker Studio Lab, which is similar to Google Colab with some advantages such as persistent storage. I haven't had a chance to explore it yet, but it could in theory be used in a similar way to Google Colab.

Pros of Colab notebooks:

Free access to GPU compute
Tight integration with Git to facilitate sharing
Fully managed, no setup or maintenance

Cons of Colab notebooks:

Limited to technical audiences, no suitable for a lay audience
Limited interactivity by default, can be improved with libraries such as Gradio
GPU availability can be hit or miss
Some external storage required (e.g. to store model artefacts)

Full-stack

This approach to creating an interactive demo is the most demanding, but potentially one that will pay off in the long run. It is full-stack, because it involves two components:

A back end responsible for loading and serving the model as a REST API
A front end to provide UI elements to interact with the back-end

The obvious downside is that it requires to be comfortable working on both those components, or at least be willing to learn. However, this approach is the most flexible, and can be used as a stepping stone to deploy a fully fledged production environment without starting from scratch.

Before diving into the back and front end components below, let's have a look at the pros and cons of the full-stack approach.

Pros of the full-stack approach:

As flexible as needed
Can include additional features such as authentication
Can be used as a basis for production deployment without starting from scratch
Can be optimized for performance

Cons of the full-stack approach:

Knowledge in back and front end development required
Time consuming development and deployment
Requires infrastructure for deployment

Back End

An exhaustive discussion of the different technology stacks for back end development is out of scope of this article. However, given that most machine learning engineers working on machine learning applications are at least familiar with Python, we will focus on Python-based back end solutions.

Different tools for back end development: general-purpose web frameworks, serving libraries, and framework-specific serving libraries. Image by the author.

The goal of the back end is to act as a wrapper for a model so that it can be queried via HTTP requests from the front end, something referred to as model serving. In order to do so, one would typically use a web framework. For a long time, Flask was the standard for Python-based web frameworks, and is indeed still very popular. However, FastaAPI is quickly becoming the new favorite, thanks to impressive performance and native support for asynchronous operations. This article is a good starting point to understand how a simple model can be deployed using FastAPI, while this tutorial provides a complete overview of all the steps required to serve a PyTorch model with GPU support.

Using a general-purpose framework such as FastAPI involves writing a lot of boilerplate code just to get your API endpoint up and running. If deploying a model for a demo is the only thing you are interested in and you do not mind losing some flexibility, you might want to use a specialized serving framework instead. One example is BentoML, which will allow you to get an optimized serving endpoint for your model up and running much faster and with less overhead than a generic web framework. Framework-specific serving solutions such as Tensorflow Serving and TorchServe typically offer optimized performance but can only be used to serve models trained using Tensorflow or PyTorch, respectively.

Front End

The front end is responsible for providing a user interface to interact with the back end serving the model. In most cases, it will be a mean to input data (such as text for natural language processing, or images for computer vision) and to display model outputs. Having this user interface live in a web browser makes your demo accessible without additional dependencies.

Front end development is where you will likely have to leave Python behind. While libraries such as Skulpt and Brython enable the use of Python in the browser, I highly recommend using Javascript as the very large community means that tutorials are numerous, and it will be much easier to seek help if needed. The two most popular libraries to build user interfaces in Javascript are React (tutorial for ML demo) and Vue.js (tutorial for ML demo). Using a general-purpose framework will give the flexibility needed to tailor the UI to your exact requirements.

Deployment

Once your back and front end components are ready, they must be deployed somewhere publicly accessible. Again, flexibility is the name of the game here. Services like Heroku offer a managed (and free, depending on usage) experience for deployment of applications. Public cloud providers such as Amazon Web Services, Azure, or Google Cloud could be an option and a small demo would likely fit nicely within their free tier offerings.

Whatever path you decide to take, I recommend you consider containerization of your demo using Docker. This way, the same exact container image is used for local testing during development and for deployment on your hosting provider, helping avoid bad surprises due to changes in environment.

App Libraries

So what if you want something almost as flexible as what is possible with the full-stack approach, but without the development requirements? Well, you are in luck because the past few years have seen the emergence of Python libraries that allow the creation of impressively interactive demos with only a few lines of code. In this article, we are going to focus on two of the most promising libraries: Gradio and Streamlit. There are notable differences between the two that will be explored below, but the high level idea is the same: eliminate most of the painful back and front end work outlined in the full-stack section, albeit at the cost of some flexibility.

Gradio

Gradio was already mentioned in the Google Colab section as it can be used to add interactive elements to notebooks. As shown in the library's getting started page, building an interface only takes a few lines of Python code.

import gradio as gr

def greet(name):
  return "Hello " + name + "!!"

iface = gr.Interface(fn=greet, inputs="text", outputs="text")
iface.launch()

If you are working in a notebook, the UI will show right there and then. If running from a script, a browser window will open and point to http://localhost:7860. The reason why this works is that Gradio essentially runs an API server in the background for you, thus taking care of much of the work discussed in the full-stack section. It used to leverage Flask to create the local API server, but it very recently switched to using FastAPI.

In the code snippet above, the greet function would be replaced with the inference code for your model. Note that both inputs and outputs are set to text, so the UI will automatically default to the widgets necessary to handle text-based tasks. Similar widgets exist for most common use-cases, including computer vision and audio processing. In addition, Gradio offers very handy features such as the ability to take a screenshot or for the user to flag if the output shown is unexpected (e.g. if processing failed).

If you want to share your user interface with the world, using the share=True argument in the launch method will provide you with a gradio.app URL that points to your demo. Note that this is only forwarding request to your machine, so it will only work as long as your script or notebook is running, and the link automatically expires after 72 hours. See the section about hosting below as a way around those limitations.

Default Gradio UI for text inputs and outputs produced by a few lines of code. Screenshot by the author.

Gradio is laser-focused on building user interfaces for machine learning models, and this focus means that it will take care of almost everything for you and will work out of the box with very little configuration required.

Pros of Gradio:

Quick and easy setup out of the box
Runs directly in notebooks
Absolutely no knowledge of web development required
Apps are easily shared
Good selection of built-in UI elements
Features such as screenshots or output flagging are very handy for demos

Cons of Gradio:

Limited control over the UI layout
Not suitable for complex apps (e.g. state management, caching)

Streamlit

Streamlit is a library to build and share data apps. Their curated gallery showcases examples of data visualization apps, dashboards, interactive tutorials, and, of course, machine learning demos.

Streamlit can be used to build complex applications, which comes at the cost of a higher barrier of entry compared to Gradio. For example, it cannot be run directly in a notebook -- a command line tool is used to start the application from a script. A live reload approach is adopted, whereby changes made to the code are automatically reflected in the application running in the browser, allowing for quick iterations.

Streamlit comes with advanced features such as caching, which can help prevent long running tasks (for example downloading and preparing a model for inference) to not be ran multiple times unnecessarily and the ability to build stateful applications (where information is preserved for the duration of a user session). Those features enable use-cases beyond simple machine learning demos. On the UI side of things, the library has a large number of built-in widgets and can be further extended through the support of third-party components.

A streamlit app converting portrait photographs to comic book characters. Screenshot by author, demo located at https://share.streamlit.io/nathannguyen-dev/comic_me_v1/main.py

Streamlit offers a managed service for app sharing called Streamlit Cloud. As of this writing, one private (requiring authentication) and unlimited public apps can be deployed using Streamlit Cloud with the free plan. Alternatively, Streamlit apps are easily containerized and deployed using Docker.

If you are interested in deploying both Gradio and Streamlit apps, Hugging Face Spaces might be the way to go.

Pros of Streamlit:

Quick setup
Advanced features such as caching and state management allow for complex apps to be built
Large selection of built-in UI widgets
Highly customizable UI layout
Extensible through support for custom third-party components

Cons of Streamlit:

Sharing applications is not as trivial as it is using Gradio
Complex apps require some understanding of advanced web development concepts
Not compatible with notebooks
Lacking some basic built-in features for ML demos (e.g. flagging of unexpected inputs/outputs)

The Hugging Face Spaces Revolution

Deploying apps developed using either Gradio or Streamlit got a whole lot easier when Hugging Face added Spaces to their ecosystem of ML products and tools. Spaces are similar to Github Pages -- code is committed to a repository, and the app is automatically build and served. When creating a space, you pick between Streamlit, Gradio, and Static (which more or less replicates Github Pages' static website hosting capabilities). The Space is then automatically setup to accommodate your library of choice. Useful features such as versioning and the ability for users to like a space make it a great experience for the deployment of a public machine learning demo.

Similarly to how Google Colab democratized access to compute resources required for state-of-the-art machine learning models, Hugging Face Spaces allow anyone to host a demo for the world to check out. This means that the entire machine learning workflow, from model training to deployment of an interactive demo, can now be carried out for free and almost entirely in Python.

Good practices for neural network training: identify, save, and document best models

Nicolas Jaccard — Tue, 04 Jan 2022 10:59:14 +0000

This article was originally posted on my personal blog

Preamble

In this post, we are going to:

Introduce the concept of best model
Discuss how the best model can be identified, saved, and documented during training
Explore how to leverage fastai and Weights & Biases to do all of this (almost) automatically and effortlessly

What is a best model and why should I care?

Model training can be seen as the generation of subsequent versions of a model -- after each batch, the model weights are adjusted, and as a result, a new version of the model is created. Each new version will have varying levels of performance (as evaluated against a validation set).

If everything goes well, training and validation loss will decrease with the number of training epochs. However, the best performing version of a model (here abbreviated as best model) is rarely the one obtained at the end of the training process.

A somewhat typical example of overfitting training curves.

Take a typical overfitting case -- at first, both training and validation losses decrease as training progresses. At some point, the validation loss might start increasing, even though the training loss continues to decrease; from this point on, subsequent model versions produced during the training process are overfitting the training data. These model versions are less likely to generalize well to unseen data. In this case, the best model would be the one obtained at the point where the validation loss started to diverge.

Overfitting is a convenient example, but similar observations would also apply to other dynamics of model training, such as the presence of local maxima or minima.

Identifying and saving the best model during training

The naive approach to the problem would be to save our model after every epoch during training and to retrospectively select the optimal version based on the training curves. This approach is associated with a couple of notable drawbacks:

Storage space: large models tend to occupy a significant amount of storage when saved, with file sizes reaching 100s of MBs to GBs. Multiply this by the number of epochs, and you end up with a pretty sizeable amount of storage dedicated to saving all versions of a model. This can quickly become problematic, especially when training models remotely.
Computational impact: saving a model after every epoch will impact overall training time -- serialization / export of some models can be computationally intensive and slow.

Naive approach where all model versions are persisted versus the best model only approach, where only the most recent best performing model is persisted.

As an alternative to this brute force approach where all model versions are saved during training, one can be more selective. We know from the above that our best model is likely one associated with a low validation loss. We can thus formulate a criterion for the selection of our best model: it must have a lower validation loss than the previous candidate. As pseudo-code:

if current validation loss lower than candidate validation loss:
    save model to disk overwriting previous candidate
    set candidate validation loss to current validation loss

The key advantages of this approach is that a) a new model is exported only when the validation loss is improved over the previous best candidate, and b) we only ever have one model version persisted to storage at any given time. As thus, we successfully addressed the two drawbacks of the naive approach.

Maybe more importantly, only saving the best model also encourages good practices by requiring the performance evaluation methodology to be decided before training starts, and it removes the temptation to retroactively evaluate other versions of the model on a separate test dataset.

A note on validation loss, alternative metrics, and model documentation

Up to this point, we used validation loss as our target metric to identify the best model during training. Why validation loss you might ask? The fact that it is almost always computed during training made it a convenient example to illustrate the concepts discussed in this article.

However, validation loss might not be that relevant to your particular use-case or domain, and any other metric can be used instead. For classification tasks, accuracy could be a good choice. Similarly, you can choose the target metric to ensure that the best model is also one that will generalize well to unseen data, for example by using Matthews Correlation Coefficient when dealing with severely imbalanced datasets.

Whatever target metric you decide to use, it is also important to document other aspects of this particular version of the model. Typically, this would include all performance metrics tracked during training. Persisting this information together with the actual model artifact can be very useful later on, for example to rank models obtained from a hyperparameter search or to carry out integration testing when deploying in production (more on this in a future article!).

Effortlessly save the best model during training with fastai

Implementation of best model saving requires alteration of the training loop in order to monitor the target metric and to trigger model saving when an improvement is detected. Many modern frameworks come with this capability built-in. Here, we will be focusing on a fastai implementation, but similar capabilities are likely available for your library of choice. You can still follow along to get an idea how this can be implemented in practice.

If you are unaware of what fastai is, its official description is:

fastai simplifies training fast and accurate neural nets using modern best practices

The fastai training loop can be modified and extended using callback methods that are called at specific stages of training, for example after an epoch is completed, or at the end of training. Conveniently, the SaveModelCallback happens to do (almost) exactly what we need.

Using the callback couldn't be easier:

learner.fit_one_cycle(... ,cbs=[..., SaveModelCallback(monitor='valid_loss')])

where learner is a standard fastai Learner object. By default, the callback will track the validation loss to determine when to save a new best model. Use the monitor argument to set it to any other metric tracked by your learner object. Following each epoch during training, the current value for the target metric is compared to the previous best value - if it is an improvement, the model is persisted in the models directory (and overwriting the previous best candidate, if present).

Behind the scene, the callback tries to figure whether an improvement is a smaller value (if the target metric contains loss or error) or a larger value (everything else). This behavior can be overridden using the comp argument. The model is persisted using fastai's save_model function, which is a wrapper for Pytorch's native torch.save.

The reason why the built-in callback is not exactly what we need is that it will only log the target metric used to identify the best model, and nothing else. It will not log other metrics (for example accuracy, if the best model is determined based on validation loss). This might be fine, but given that our best model might end up being used as part of a product somewhere, it would be a good idea to characterize it as much as possible. I put together a custom version of the SaveModelCallback that will log all metrics tracked by fastai during training. The code for can be found here.

This custom version of the callback can be used as a drop-in replacement. All it really does is to internally keep track of a dictionary of metrics (last_saved_metadata) associated with the best model. How to make use of this? All is to be revealed in the next section!

Automatically document the best model with Weights & Biases

Saving the best model locally is a good start, but it can quickly become unwieldy if you work remotely, or carry out large number of experiments. So how to keep track of the models created, and of their associated metrics? This is where Weights & Biases comes in. W&B is one of those tools that make you wonder how you have ever been able to properly function without them. While officially described as "The developer-first MLOps platform", I prefer to refer to it as the swiss army knife of MLOps.

W&B is very useful to track and compare experiments. However, for the purpose of this article, we are mainly interested in its almost universal versioning capabilities. In the W&B ecosystem, artifacts are components that can be versioned, possibly together with their lineage. Models can be versioned as artifacts.

Conveniently, fastai has a built-in callback to integrate with W&B, aptly named WandbCallback. To use it, one need to initialize a W&B run, and to add the callback to the learner object like so:

# Import W&B package
import wandb

# Initialize W&B run (can potentially set project name, run name, etc...)
wandb.init()

# Add Callback to learner to track training metrics and log best models
learn = learner(..., cbs=WandbCallback())

The main purpose of the callback is to log useful telemetry regarding the training process to your W&B account, including environment information and metrics. The magic happens when it is used in combination with the SaveModelCallback -- at the end of the training process, the best performing model will be automatically logged as an artifact of the W&B run.

There is one major issue with the default WandbCallback: the metadata associated with the model is recorded at the end of the run and not at the epoch when the best model was saved. In other words, the metadata does not correspond to the saved model at all, and can be misleading (for example when the tracked metric diverged towards the end of training due to overfitting).

This is where the custom SaveModelCallback that was discussed in the previous section comes in. It will save all the information needed to associate the model with its actual metadata. To take advantage of this, it is also necessary to use a custom version of WandbCallback, which can be found here.

The changes made in the custom callback are highlighted here:

def after_fit(self):
    if self.log_model:
        if self.save_model.last_saved_path is None:
            print('WandbCallback could not retrieve a model to upload')
        else:
            log_model(self.save_model.last_saved_path, metadata=self.save_model.last_saved_metadata)

            for metadata_key in self.save_model.last_saved_metadata:
                wandb.run.summary[f'best_{metadata_key}'] = self.save_model.last_saved_metadata[metadata_key]``

As a result, the following will automatically happen:

The model logged to the W&B run is associated with metadata containing the correct metric values
The values for all metrics for the best model are added to the run summary, with the prefix best_. This allows runs to be sorted and compared based on the performance of their respective best model

Best model logged in Weights and Biases. (Left) Model metadata including key metrics; (Right) Models in a W&B project sorted by the best_matthews_corrcoef metadata associated with their respective best models

Wrapping up

So, what have we learned in this article?

Only saving the best model during training is efficient and encourages good practices
The metadata, including key metrics associated with the best model, is almost as important as the model artifact itself
Using fastai and Weights & Biases, saving and documenting the best model can be done automatically for you. Two custom callback functions were described to make this process even better (SaveModelCallback and WandbCallback).