<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: DhruvThu</title>
    <description>The latest articles on Forem by DhruvThu (@dhruvthu).</description>
    <link>https://forem.com/dhruvthu</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1034259%2F556cfd83-b54d-4f44-a586-e18fad3bbbec.png</url>
      <title>Forem: DhruvThu</title>
      <link>https://forem.com/dhruvthu</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/dhruvthu"/>
    <language>en</language>
    <item>
      <title>Implementing Stable Diffusion in Rust</title>
      <dc:creator>DhruvThu</dc:creator>
      <pubDate>Sun, 26 Feb 2023 18:11:18 +0000</pubDate>
      <link>https://forem.com/dhruvthu/implementing-stable-diffusion-in-rust-5ed3</link>
      <guid>https://forem.com/dhruvthu/implementing-stable-diffusion-in-rust-5ed3</guid>
      <description>&lt;p&gt;Recently, we have seen the boom of many machine learning algorithms such as stable diffusion which can be used to create digital artworks or NFTs. One of the primary issues with stable diffusion is the possibility of &lt;a href="https://huggingface.co/stabilityai/stable-diffusion-2/discussions/49" rel="noopener noreferrer"&gt;memory leaks&lt;/a&gt; in Python, which can result in excessive use of GPU and CPU resources. To prevent this type of scenario, we at &lt;a href="//Qolaba.io"&gt;Qolaba&lt;/a&gt; chose to explore with Stable Diffusion developed in Rust in order to benefit from Rust’s memory management. This article will walk you through the process of building stable diffusion in Rust.&lt;/p&gt;

&lt;h2&gt;
  
  
  Table of contents:
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;What is Rust and its advantage?&lt;/li&gt;
&lt;li&gt;What is Stable-diffusion?&lt;/li&gt;
&lt;li&gt;Implementation of Rust based Stable-Diffusion.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What is Rust and its advantage?
&lt;/h2&gt;

&lt;p&gt;Rust is a comparatively recent programming language that gained popularity swiftly due to its capacity to build dependable, memory-efficient, and high-performance program. The syntax of this statically typed programming language is comparable to C++. It doesn’t have run time or garbage collection. Rust thus offers solutions to numerous C++ problems, such as concurrency and memory management problems.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is Stable-diffusion?
&lt;/h2&gt;

&lt;p&gt;A machine learning system called Stable Diffusion uses diffusion to produce visuals from text. It functions as both Text to Image and Image to Image. The Stable Diffusion model is used to generate the majority of contemporary AI art that can be found online. With simply a word prompt and an open-source application, anyone can easily produce amazing art images. We prefer to run the model on the GPU because it is quite time-consuming to run on the CPU. The new version of Stable diffusion offers a number of features, which may be found on the stability ai blog at &lt;a href="https://stability.ai/blog/stable-diffusion-v2-release" rel="noopener noreferrer"&gt;Stable Diffusion 2.0 Release — Stability AI&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3vrczytl4m776ab8klrb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3vrczytl4m776ab8klrb.png" alt="Stable Diffusion Architecture"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Implementation of Rust based Stable-Diffusion
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;To execute the Rust-based stable diffusion, we must first clone the Rust-based stable diffusion implementation from GitHub.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;git clone https://github.com/LaurentMazare/diffusers-rs.git
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;Next we need to install Rust. We can utilize the Ubuntu commands listed below. But, for Windows, we may follow the procedure outlined in this &lt;a href="https://doc.rust-lang.org/book/ch01-01-installation.html" rel="noopener noreferrer"&gt;Link&lt;/a&gt;.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
source $HOME/.cargo/env
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Following that, we need to obtain the model’s weights from hugging face model hub. Hugging face model hub is based on Git LFS, and we can store massive files (more than 10 GB), such as ML model checkpoints there. It operates on the Git LFS, so we can use it in the same way as a Git repository.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The following command may be used to install git LFS on Ubuntu. The command discussed in this &lt;a href="https://docs.github.com/en/repositories/working-with-files/managing-large-files/installing-git-large-file-storage" rel="noopener noreferrer"&gt;article&lt;/a&gt; can be used on different platforms. This will require the Git pre installed.&lt;br&gt;
&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;curl -s https://packagecloud.io/install/repositories/github/git-lfs/script.deb.sh | sudo bash
sudo apt-get install git-lfs
git lfs install
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;Once Git LFS is configured, we must clone the repo from Hugging face model hub. We can accomplish this by using the command listed below.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;git clone https://huggingface.co/lmz/rust-stable-diffusion-v2-1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;After obtaining the weight, we must place the weights folder into the diffusers-rs repo that we already cloned.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;cd rust-stable-diffusion-v2-1/
mv weights data
mv data/ ../diffusers-rs/
cd ..
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;To execute the Rust-based stable diffusion with GPU, we must first configure Libtorch. The command may be obtained from the official Pytorch installation &lt;a href="https://pytorch.org/get-started/locally/" rel="noopener noreferrer"&gt;page&lt;/a&gt;, which is also referenced below. We need to unzip the folder after collecting the Libtorch.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;wget https://download.pytorch.org/libtorch/cu117/libtorch-cxx11-abi-shared-with-deps-1.13.1%2Bcu117.zip
unzip libtorch-cxx11-abi-shared-with-deps-1.13.1+cu117.zip

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;Once the preceding step is accomplished, we must add the Llibtorch folder to the path in the environment variable so that it can be used by the rust programme. We may directly write the environment variable using the export command, or we can put it in the bashrc file and it will be run every time we restart the terminal.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;vi .bashrc
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;export LIBTORCH=/path/to/libtorch
export LD_LIBRARY_PATH=${LIBTORCH}/lib:$LD_LIBRARY_PATH
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;When we have finished modifying the bashrc file, we must restart the terminal so that the aforementioned command may be run.&lt;/li&gt;
&lt;li&gt;Following that, we need execute the rust-based application using the cargo command, which is detailed below.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;cd diffusers-rs/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;cargo run --example stable-diffusion --features clap -- --prompt "the cutest chibi sloth you'll ever see dancing wearing a 3 piece suit holding a bunch of flowers pixar style"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F30r4ae1ppwwz13bvowxp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F30r4ae1ppwwz13bvowxp.png" alt="Output of Rust SD for v2.1 (Prompt : the cutest chibi sloth you’ll ever see dancing wearing a 3 piece suit holding a bunch of flowers pixar style)"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The total time needed for generating one image for Rust-based stable diffusion is more than that necessary for Python-based stable diffusion. With more efficient code, there is a likelihood that total response time in Rust-based Stable diffusion will be lowered in the near future.&lt;/p&gt;

&lt;h2&gt;
  
  
  Thankyou🙏
&lt;/h2&gt;

&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;&lt;a href="https://kinsta.com/blog/rust-vs-python/#:~:text=Rust%20programs%20are%20more%20efficient,fewer%20distinct%20features%20than%20Rust." rel="noopener noreferrer"&gt;Rust vs Python: Which One Is Best for Your Project?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://jalammar.github.io/illustrated-stable-diffusion/" rel="noopener noreferrer"&gt;The Illustrated Stable Diffusion&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/LaurentMazare/diffusers-rs" rel="noopener noreferrer"&gt;GitHub — LaurentMazare/diffusers-rs: An implementation of the diffusers api in Rust&lt;/a&gt;&lt;/li&gt;
&lt;/ol&gt;

</description>
      <category>rust</category>
      <category>stablediffusion</category>
    </item>
    <item>
      <title>Stable Diffusion Inference using FastAPI and load testing using Locust</title>
      <dc:creator>DhruvThu</dc:creator>
      <pubDate>Sun, 26 Feb 2023 16:57:27 +0000</pubDate>
      <link>https://forem.com/dhruvthu/stable-diffusion-inference-using-fastapi-and-load-testing-using-locust-41pc</link>
      <guid>https://forem.com/dhruvthu/stable-diffusion-inference-using-fastapi-and-load-testing-using-locust-41pc</guid>
      <description>&lt;p&gt;Digital art or NFT has become incredibly valuable as the metaverse growing. To take advantage of this opportunity, We at &lt;a href="https://www.qolaba.io/" rel="noopener noreferrer"&gt;Qolaba&lt;/a&gt; have chosen to investigate the methods of developing API endpoints and load testing for various numbers of concurrent users. In this article, We will go through this experiment and discover the numerous conclusions.&lt;/p&gt;

&lt;h2&gt;
  
  
  Table of contents:
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;What is Stable-diffusion?&lt;/li&gt;
&lt;li&gt;Inference of Stable-Diffusion using FastApi.&lt;/li&gt;
&lt;li&gt;Load testing using Locust&lt;/li&gt;
&lt;li&gt;Conclusion&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What is Stable-diffusion?
&lt;/h2&gt;

&lt;p&gt;A machine learning system called Stable Diffusion uses diffusion to produce visuals from text. It functions as both Text to Image and Image to Image. The Stable Diffusion model is used to generate the majority of contemporary AI art that can be found online. With simply a word prompt and an open-source application, anyone can easily produce amazing art images. The new version of Stable diffusion offers a number of features, which may be found on the stability ai blog at &lt;a href="https://stability.ai/blog/stable-diffusion-v2-release" rel="noopener noreferrer"&gt;Stable Diffusion 2.0 Release — Stability AI&lt;/a&gt; .&lt;/p&gt;

&lt;h2&gt;
  
  
  Inference of Stable-Diffusion using FastApi
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;I used the workstation with the specifications listed below to carry out the experiment.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi2ybk35cyzhtbrm9amrw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi2ybk35cyzhtbrm9amrw.png" alt="Workstation Stats" width="542" height="225"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The necessary Python packages for stable diffusion have to be installed before we can begin the inference procedure. We can do it by following the instructions provided in this &lt;a href="https://huggingface.co/stabilityai/stable-diffusion-2" rel="noopener noreferrer"&gt;Link&lt;/a&gt;. Make sure Pytorch is accessible in accordance with a CUDA version before beginning the installation.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;pip install diffusers transformers accelerate scipy safetensors
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;Once installation of required packages is completed, we can go ahead with inferencing of stable diffusion using the FastAPI.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from fastapi import FastAPI
from typing import List, Optional, Union
import io, uvicorn, gc
from fastapi.responses import StreamingResponse
import torch
import time
from diffusers import StableDiffusionPipeline, DPMSolverMultistepScheduler
from concurrent.futures import ThreadPoolExecutor

app = FastAPI()
app.POOL: ThreadPoolExecutor = None

@app.on_event("startup")
def startup_event():
    app.POOL = ThreadPoolExecutor(max_workers=1)
@app.on_event("shutdown")
def shutdown_event():
    app.POOL.shutdown(wait=False)

model_id = "stabilityai/stable-diffusion-2-1"
pipe_nsd = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16)
pipe_nsd.scheduler = DPMSolverMultistepScheduler.from_config(pipe_nsd.scheduler.config)
pipe_nsd = pipe_nsd.to("cuda")

@app.post("/getimage_nsd")
def get_image_nsd(
    #prompt: Union[str, List[str]],
    prompt: Optional[str] = "dog",
    height: Optional[int] = 512,
    width: Optional[int] = 512,
    num_inference_steps: Optional[int] = 50,
    guidance_scale: Optional[float] = 7.5,
    negative_prompt: Optional[str] = None,):

    image = app.POOL.submit(pipe_nsd,prompt,height,width,num_inference_steps,guidance_scale,negative_prompt).result().images
    gc.collect()
    torch.cuda.empty_cache()
    filtered_image = io.BytesIO()
    image[0].save(filtered_image, "JPEG")
    filtered_image.seek(0)
    return StreamingResponse(filtered_image, media_type="image/jpeg")

if __name__ == "__main__":
    uvicorn.run(app, host="0.0.0.0", port=9000)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;To start the local server with Stable diffusion API end point, we can run the above mentioned code.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;python &amp;lt;filename&amp;gt;.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;Once API endpoint is started, we can check that using FastAPI Interactive API docs. For that, we can go to &lt;a href="http://127.0.0.1:9000/docs" rel="noopener noreferrer"&gt;http://127.0.0.1:9000/docs&lt;/a&gt;. After that, we can specify the input parameters and click on execute to generate the image.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn23orzjlwhesbrugz5ka.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn23orzjlwhesbrugz5ka.jpg" alt="FastAPI Interactive API docs" width="800" height="1768"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Load testing using Locust
&lt;/h2&gt;

&lt;p&gt;We may utilise the Python-based framework Locust to carry out the load testing task. Testing techniques that simulate a large number of users can be constructed using the locust tool. This will pinpoint the main area of vulnerability in terms of application load management, security, and performance. We can execute the command shown below to install the locust.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;pip3 install locust
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To run the load testing, we can run the below mentioned code using locust package. As part of the load test, we made an effort to randomly select images with sizes of 512x512, 768x768 or 1024x1024 to mimic the real world scenario.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from locust import HttpUser, task
import random
import urllib

class HelloWorldUser(HttpUser):
    host="http://127.0.0.1:9000"
    @task(1)
    def hello_world(self):
        h_list=[512,768,1024]
        height=random.sample(h_list, 1)
        url="/getimage_nsd?prompt=dog&amp;amp;height="+str(height[0])+"&amp;amp;width="+str(height[0])+"&amp;amp;num_inference_steps=50&amp;amp;guidance_scale=7.5&amp;amp;negative_prompt=%20"
        b=urllib.parse.quote_plus(url)
        self.client.post(url)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;locust -f &amp;lt;filename&amp;gt;.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We can view the Locust WebUI in a browser once the code has been executed. According to the needs, we can define the concurrent user count, spawn rate, and host in the WebUI.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fg6puxiy9kdotztf5orav.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fg6puxiy9kdotztf5orav.png" alt="WebUI of Locust" width="800" height="377"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;With regards to the present article, we tested the load with both 5 and 10 concurrent users. In both situations, the maximum response times are 146 and 307 seconds, respectively.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpny28ylw061s41pn57fm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpny28ylw061s41pn57fm.png" alt="10 concurrent users" width="800" height="616"&gt;&lt;/a&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm4sjhy2nd0tuv497vk74.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm4sjhy2nd0tuv497vk74.png" alt="5 concurrent users" width="800" height="616"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;The highest response time in load testing was found to be 307s in 10 concurrent users and 146 users in 5 concurrent users, which is ridiculous. To solve this problem, we may try using Docker and the Kubernetes load balancer to generate many endpoints on various GPUs, dividing the overall load and speeding up response time. In addition, we may experiment with FastAPI’s batch requests or several workers so that we can process numerous requests concurrently. Nevertheless, in my opinion, the second idea won’t make a significant difference because the number of iterations per second for a single process will decrease when we run several operations simultaneously on a single GPU for stable diffusion. Individual processes will therefore take longer, and overall response time will also lengthen.&lt;/p&gt;

&lt;p&gt;During doing inferencing, I also discovered a further problem: the overall GPU needs are too high due to the reason that we evaluated the load for three distinct image sizes, including 512x512, 768x768 and 1024x1024. Space is allotted in the GPU for each sort of arrangement. We can recreate the stable diffusion pipe before image generation and erase it after that to overcome this issue. Although the total response time may be longer with this technique, the cost will be lower since we can conduct the inferencing on a GPU with lower vram.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fndsjj2oke2s80an2k5u3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fndsjj2oke2s80an2k5u3.png" alt="GPU Consumption during Load Testing" width="794" height="433"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Thankyou🙏
&lt;/h2&gt;

&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;&lt;a href="https://jalammar.github.io/illustrated-stable-diffusion/" rel="noopener noreferrer"&gt;The Illustrated Stable Diffusion&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://youtube.com/playlist?list=PLJ9A48W0kpRKMCzJARCObgJs3SinOewp5" rel="noopener noreferrer"&gt;Locust Play Series&lt;/a&gt;&lt;/li&gt;
&lt;/ol&gt;

</description>
      <category>microservices</category>
      <category>database</category>
      <category>architecture</category>
      <category>softwaredevelopment</category>
    </item>
  </channel>
</rss>
