Forem: Faiq Ahsan

Build your own AI ChatBot on your machine

Faiq Ahsan — Fri, 03 May 2024 11:23:17 +0000

By now everyone knows and love ChatGPT and GenAI has taken the world by storm but do you know now you can build and run your own custom AI chatbot on your machine.

YES! Let's take a look at the ingredients for this recipe.

Python

If you are someone who is looking to dig deep into the AI / ML then you need to learn Python which is the go to programming language in this space. If you already know it then you are all set here otherwise i would suggest going through a python crash course or whatever suits you best and also make sure that you have python3 installed on your system.

Ollama

Ollama is an awesome open source package which provides a really handy and easy way to run the large language models locally. We would be using this package to download and run the 8B version of Llama3.

Gradio

Gradio is the fastest way to demo your machine learning model with a friendly web interface so that anyone can use it.

Ok so now lets start!!

Step1: Installing the Ollama

Download and install the Ollama package on your machine. Once installed run the below command to pull the Llama3 8B version.

ollama pull llama3

By default it downloads the 8B version if you want to run other version like 70B then simply append it after the name e.g llama3:70b. Check out the complete list here.

Note: You should have at least 8 GB of RAM available to run the 7B models, 16 GB to run the 13B models, and 32 GB to run the 33B models.

Step2: Creating custom model from Llama3

Open up a code editor and create a file name modelfile and paste the below content in it.

FROM llama3

## Set the Temperature

PARAMETER temperature 1

PARAMETER top_p 0.5

PARAMETER top_k 10

PARAMETER mirostat_tau 4.0

## Set the system prompt

SYSTEM """
You are a personal AI assistant named as Ultron created by Tony Stark. Answer and help around all the questions being asked.
"""

Parameters

Parameters dictates how your model responds and learn.

temperature: The temperature of the model. Increasing the temperature will make the model answer more creatively. (Default: 0.8)

top_p: Controls the balance between coherence and diversity of the output. A lower value will result in more focused and coherent text. (Default: 5.0)

top_k: Controls the balance between coherence and diversity of the output. A lower value will result in more focused and coherent text. (Default: 5.0)

mirostat_tau: Controls the balance between coherence and diversity of the output. A lower value will result in more focused and coherent text. (Default: 5.0)

Check out all the available parameters and their purpose here.

System prompt

Here you can play around and give any name and personality to your chatbot.

Now let's create the custom model from the modelfile by running the below command. Provide a name of your choice e.g ultron.

ollama create ultron -f ./Modelfile
ollama run ultron

You would see ultron running and ready to accept input prompt. Ollama has also REST API for running and managing the model so when you run your model it's also available for use on the below endpoint

http://localhost:11434/api/generate

We will be using this api to integrate with our Gradio chatbot UI.

Step3: Create the UI for chatbot

Initialize a python virtual environment by running the below commands.

python3 -m venv env
source env/bin/activate

Now install the required packages

pip install requests gradio

Now create a python file app.py and paste the below code in it.

import requests
import json
import gradio as gr

model_api = "http://localhost:11434/api/generate"

headers = {"Content-Type": "application/json"}

history = []


def generate_response(prompt):
    history.append(prompt)
    final_prompt = "\n".join(history)  # append history
    data = {
        "model": "ultron",
        "prompt": final_prompt,
        "stream": False,
    }
    response = requests.post(model_api, headers=headers, data=json.dumps(data))
    if response.status_code == 200:  # successful
        response = response.text
        data = json.loads(response)
        actual_response = data["response"]
        return actual_response
    else:
        print("error:", response.text)


interface = gr.Interface(
    title="Ultron: Your personal assistant",
    fn=generate_response,
    inputs=gr.Textbox(lines=4, placeholder="How can i help you today?"),
    outputs="text",
)
interface.launch(share=True)

Now let's launch the app, run your python file python3 app.py and your chatbot would be live on the below endpoint or similar. Please note that the response time may vary according to the your system's computing power.

http://127.0.0.1:7860/

There you have it! Your own chatbot running locally on your machine, you can even turn off the internet it would still work. Please share in the comments what other cools apps you are making with AI models.

Beginner's guide to code with Generative AI and LLM

Faiq Ahsan — Thu, 02 May 2024 14:16:47 +0000

Ever since the launch of ChatGPT, every tech company is investing and integrating AI into their products and no surprise that everyone is jumping on the AI bandwagon but majority of the people who are using ChatGPT are not machine learning or AI experts so if you are just an average software engineer like me who is eager to learn more about this hype then by the end of this post you can also become that cool kid in your friend's group.

What are LLMs:

LLMs stands for "Large Language Models." These are advanced artificial intelligence models designed to understand and generate human-like text. Large Language Models are trained on vast amounts of text data, allowing them to learn patterns, syntax, semantics, and context of language. They have the capability to generate coherent and contextually relevant responses to input text or prompts.

ChatGPT is also built upon LLM (GPT 3.5, GPT4 etc) by OpenAI but GPT models are not open source and in order to integrate them in your applications you would have to use paid OpenAI API keys and also you can't further fine tune the GPT models but its not just OpenAI that is building LLMs other companies like Facebook and Google are also building their own models in this AI arms race.

How do LLMs operate?

The foundational task underpinning the training of most cutting-edge LLMs revolves around word prediction, predicting the probability distribution of the next word given a sequence.

For instance, when presented with the sequence "Listen to your ____," potential next words might include: heart, gut, body, parents, grandma, and so forth. This is typically represented as a probability distribution.

Some prominent open source LLMs are:

Llama by Facebook
Gemma by Google
Mistral by Mistral AI

Few key point about LLMs

When coming across LLMs you would often see something like Llama3 70B, Llama3 8b etc. That digits at the end is actually the number of parameters this model has so in the given case its 40 billion and 8 billions, yes BILLIONS!! Thats why they are so efficient with natural language processing.

Model size

The model size is the number of parameters in the LLM. The more parameters a model has, the more complex it is and the more data it can process. So for example Llama3 70B is 40GB while Llama3 8b is 4.5GB. Larger models are also more computationally expensive to train and deploy and you would need high performing GPUs in order to run them.

Training data

The training data is the dataset that the LLM is trained on. The quality and quantity of the training data has a significant impact on the performance of the model.

Hyperparameters

Hyperparameters are settings that control how the LLM is trained. These settings can be fine-tuned to improve the performance of the model according to your specific needs.

HuggingFace

HuggingFace has an AI community and it's a very cool platform where you can find tons of open source models for different categories. On huggingFace you would find:

Models: Open source LLMs
Datasets: Publicly available datasets to custom train your LLM)
Spaces: AI Apps developed by the community and deployed on HuggingFace

Transformers Library

The Transformers library developed by Hugging Face is a powerful and versatile open-source library. Transformers provides APIs and tools to easily download and train pretrained models. This is a very extensive package and offers a ton of functionality out of the box so definitely check it out.

To give you a glimpse of it, below is the python code snippet which will use "microsoft/codereviewer" model from huggingFace and review a javascript code file and share the suggestions.

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, pipeline

model_id = "microsoft/codereviewer"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSeq2SeqLM.from_pretrained(model_id)


def review_javascript_code(file_path):

    pipe = pipeline("text2text-generation", model="microsoft/codereviewer")

    with open(file_path, "r") as file:
        code = file.read()

    result = pipe(code, max_length=512, num_return_sequences=1)[0]["generated_text"]
    print(result)


javascript_file_path = "test.js"
review_javascript_code(javascript_file_path)

What's Next?

In the next articles we will see how we can configure and run a LLM locally on our machine and how do we custom train them for our own specific tasks. Please share what other cool tools you came across in this space.