Forem: DhruvThu

Implementing Stable Diffusion in Rust

DhruvThu — Sun, 26 Feb 2023 18:11:18 +0000

Recently, we have seen the boom of many machine learning algorithms such as stable diffusion which can be used to create digital artworks or NFTs. One of the primary issues with stable diffusion is the possibility of memory leaks in Python, which can result in excessive use of GPU and CPU resources. To prevent this type of scenario, we at Qolaba chose to explore with Stable Diffusion developed in Rust in order to benefit from Rust’s memory management. This article will walk you through the process of building stable diffusion in Rust.

What is Rust and its advantage?

Rust is a comparatively recent programming language that gained popularity swiftly due to its capacity to build dependable, memory-efficient, and high-performance program. The syntax of this statically typed programming language is comparable to C++. It doesn’t have run time or garbage collection. Rust thus offers solutions to numerous C++ problems, such as concurrency and memory management problems.

What is Stable-diffusion?

A machine learning system called Stable Diffusion uses diffusion to produce visuals from text. It functions as both Text to Image and Image to Image. The Stable Diffusion model is used to generate the majority of contemporary AI art that can be found online. With simply a word prompt and an open-source application, anyone can easily produce amazing art images. We prefer to run the model on the GPU because it is quite time-consuming to run on the CPU. The new version of Stable diffusion offers a number of features, which may be found on the stability ai blog at Stable Diffusion 2.0 Release — Stability AI.

Implementation of Rust based Stable-Diffusion

To execute the Rust-based stable diffusion, we must first clone the Rust-based stable diffusion implementation from GitHub.

git clone https://github.com/LaurentMazare/diffusers-rs.git

Next we need to install Rust. We can utilize the Ubuntu commands listed below. But, for Windows, we may follow the procedure outlined in this Link.

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
source $HOME/.cargo/env

Following that, we need to obtain the model’s weights from hugging face model hub. Hugging face model hub is based on Git LFS, and we can store massive files (more than 10 GB), such as ML model checkpoints there. It operates on the Git LFS, so we can use it in the same way as a Git repository.
The following command may be used to install git LFS on Ubuntu. The command discussed in this article can be used on different platforms. This will require the Git pre installed.

curl -s https://packagecloud.io/install/repositories/github/git-lfs/script.deb.sh | sudo bash
sudo apt-get install git-lfs
git lfs install

Once Git LFS is configured, we must clone the repo from Hugging face model hub. We can accomplish this by using the command listed below.

git clone https://huggingface.co/lmz/rust-stable-diffusion-v2-1

After obtaining the weight, we must place the weights folder into the diffusers-rs repo that we already cloned.

cd rust-stable-diffusion-v2-1/
mv weights data
mv data/ ../diffusers-rs/
cd ..

To execute the Rust-based stable diffusion with GPU, we must first configure Libtorch. The command may be obtained from the official Pytorch installation page, which is also referenced below. We need to unzip the folder after collecting the Libtorch.

wget https://download.pytorch.org/libtorch/cu117/libtorch-cxx11-abi-shared-with-deps-1.13.1%2Bcu117.zip
unzip libtorch-cxx11-abi-shared-with-deps-1.13.1+cu117.zip

Once the preceding step is accomplished, we must add the Llibtorch folder to the path in the environment variable so that it can be used by the rust programme. We may directly write the environment variable using the export command, or we can put it in the bashrc file and it will be run every time we restart the terminal.

vi .bashrc

export LIBTORCH=/path/to/libtorch
export LD_LIBRARY_PATH=${LIBTORCH}/lib:$LD_LIBRARY_PATH

When we have finished modifying the bashrc file, we must restart the terminal so that the aforementioned command may be run.
Following that, we need execute the rust-based application using the cargo command, which is detailed below.

cd diffusers-rs/

cargo run --example stable-diffusion --features clap -- --prompt "the cutest chibi sloth you'll ever see dancing wearing a 3 piece suit holding a bunch of flowers pixar style"

The total time needed for generating one image for Rust-based stable diffusion is more than that necessary for Python-based stable diffusion. With more efficient code, there is a likelihood that total response time in Rust-based Stable diffusion will be lowered in the near future.

Thankyou🙏

References

Stable Diffusion Inference using FastAPI and load testing using Locust

DhruvThu — Sun, 26 Feb 2023 16:57:27 +0000

Digital art or NFT has become incredibly valuable as the metaverse growing. To take advantage of this opportunity, We at Qolaba have chosen to investigate the methods of developing API endpoints and load testing for various numbers of concurrent users. In this article, We will go through this experiment and discover the numerous conclusions.

What is Stable-diffusion?

A machine learning system called Stable Diffusion uses diffusion to produce visuals from text. It functions as both Text to Image and Image to Image. The Stable Diffusion model is used to generate the majority of contemporary AI art that can be found online. With simply a word prompt and an open-source application, anyone can easily produce amazing art images. The new version of Stable diffusion offers a number of features, which may be found on the stability ai blog at Stable Diffusion 2.0 Release — Stability AI .

Inference of Stable-Diffusion using FastApi

I used the workstation with the specifications listed below to carry out the experiment.

The necessary Python packages for stable diffusion have to be installed before we can begin the inference procedure. We can do it by following the instructions provided in this Link. Make sure Pytorch is accessible in accordance with a CUDA version before beginning the installation.

pip install diffusers transformers accelerate scipy safetensors

Once installation of required packages is completed, we can go ahead with inferencing of stable diffusion using the FastAPI.

from fastapi import FastAPI
from typing import List, Optional, Union
import io, uvicorn, gc
from fastapi.responses import StreamingResponse
import torch
import time
from diffusers import StableDiffusionPipeline, DPMSolverMultistepScheduler
from concurrent.futures import ThreadPoolExecutor

app = FastAPI()
app.POOL: ThreadPoolExecutor = None

@app.on_event("startup")
def startup_event():
    app.POOL = ThreadPoolExecutor(max_workers=1)
@app.on_event("shutdown")
def shutdown_event():
    app.POOL.shutdown(wait=False)

model_id = "stabilityai/stable-diffusion-2-1"
pipe_nsd = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16)
pipe_nsd.scheduler = DPMSolverMultistepScheduler.from_config(pipe_nsd.scheduler.config)
pipe_nsd = pipe_nsd.to("cuda")

@app.post("/getimage_nsd")
def get_image_nsd(
    #prompt: Union[str, List[str]],
    prompt: Optional[str] = "dog",
    height: Optional[int] = 512,
    width: Optional[int] = 512,
    num_inference_steps: Optional[int] = 50,
    guidance_scale: Optional[float] = 7.5,
    negative_prompt: Optional[str] = None,):

    image = app.POOL.submit(pipe_nsd,prompt,height,width,num_inference_steps,guidance_scale,negative_prompt).result().images
    gc.collect()
    torch.cuda.empty_cache()
    filtered_image = io.BytesIO()
    image[0].save(filtered_image, "JPEG")
    filtered_image.seek(0)
    return StreamingResponse(filtered_image, media_type="image/jpeg")

if __name__ == "__main__":
    uvicorn.run(app, host="0.0.0.0", port=9000)

To start the local server with Stable diffusion API end point, we can run the above mentioned code.

python <filename>.py

Once API endpoint is started, we can check that using FastAPI Interactive API docs. For that, we can go to http://127.0.0.1:9000/docs. After that, we can specify the input parameters and click on execute to generate the image.

Load testing using Locust

We may utilise the Python-based framework Locust to carry out the load testing task. Testing techniques that simulate a large number of users can be constructed using the locust tool. This will pinpoint the main area of vulnerability in terms of application load management, security, and performance. We can execute the command shown below to install the locust.

pip3 install locust

To run the load testing, we can run the below mentioned code using locust package. As part of the load test, we made an effort to randomly select images with sizes of 512x512, 768x768 or 1024x1024 to mimic the real world scenario.

from locust import HttpUser, task
import random
import urllib

class HelloWorldUser(HttpUser):
    host="http://127.0.0.1:9000"
    @task(1)
    def hello_world(self):
        h_list=[512,768,1024]
        height=random.sample(h_list, 1)
        url="/getimage_nsd?prompt=dog&height="+str(height[0])+"&width="+str(height[0])+"&num_inference_steps=50&guidance_scale=7.5&negative_prompt=%20"
        b=urllib.parse.quote_plus(url)
        self.client.post(url)

locust -f <filename>.py

We can view the Locust WebUI in a browser once the code has been executed. According to the needs, we can define the concurrent user count, spawn rate, and host in the WebUI.

With regards to the present article, we tested the load with both 5 and 10 concurrent users. In both situations, the maximum response times are 146 and 307 seconds, respectively.

Conclusion

The highest response time in load testing was found to be 307s in 10 concurrent users and 146 users in 5 concurrent users, which is ridiculous. To solve this problem, we may try using Docker and the Kubernetes load balancer to generate many endpoints on various GPUs, dividing the overall load and speeding up response time. In addition, we may experiment with FastAPI’s batch requests or several workers so that we can process numerous requests concurrently. Nevertheless, in my opinion, the second idea won’t make a significant difference because the number of iterations per second for a single process will decrease when we run several operations simultaneously on a single GPU for stable diffusion. Individual processes will therefore take longer, and overall response time will also lengthen.

During doing inferencing, I also discovered a further problem: the overall GPU needs are too high due to the reason that we evaluated the load for three distinct image sizes, including 512x512, 768x768 and 1024x1024. Space is allotted in the GPU for each sort of arrangement. We can recreate the stable diffusion pipe before image generation and erase it after that to overcome this issue. Although the total response time may be longer with this technique, the cost will be lower since we can conduct the inferencing on a GPU with lower vram.

Forem: DhruvThu

Implementing Stable Diffusion in Rust

Table of contents:

What is Rust and its advantage?

What is Stable-diffusion?

Implementation of Rust based Stable-Diffusion

Thankyou🙏

References

Stable Diffusion Inference using FastAPI and load testing using Locust

Table of contents:

What is Stable-diffusion?

Inference of Stable-Diffusion using FastApi

Load testing using Locust

Conclusion

Thankyou🙏

References